Dry matter intake (DMI) is an important performance indicator in livestock production and could be used to determine feed efficiency and to optimise feed utilisation. Knowledge of DMI is also crucial to avoid overfeeding, which could lead to undesirable consequences such as metabolic disease in dairy cows, feed inefficiency and associated effects on production economics, as well as problems with calving in overweight animals. However, it is often not possible to measure feed intake at individual level, since dairy cows are often group-fed and since the required equipment is usually not available on farms. In conventional dairy production, milk yield (MY), body weight (BW) and days in milk (DIM) are sometimes used to estimate DMI. However, many modern dairy farms are now participating in milk recording schemes that use mid-infrared spectroscopy (MIRS) analysis of milk samples to measure milk composition (fat, protein and lactose content). This is based on the fact that MIRS on milk samples provides information about the chemical bonds present in the milk, and thus indicates the types of molecules present in the samples.
As milk is individually collected, and composition in addition to yield is related to DMI, there may be possibilities to gain information about feed intake from individual cows through milk collection. The usage of MIRS in predicting DMI could be used as a strategy in selecting cows that have high efficiency in the utilisation of feed nutrients in relation to milk production, for breeding purpose. Therefore, it is important to find a robust strategy incorporating parameters and methods that can be used in predicting DMI. Equations using milk MIRS to predict feed intake-related parameters have been developed by several researchers using different methods (Shetty et al., Reference Shetty, Løvendahl, Lund and Buitenhuis2017; Wallén et al., Reference Wallén, Prestløkken, Meuwissen, McParland and Berry2018; Lahart et al., Reference Lahart, McParland, Kennedy, Boland, Condon, Williams, Galvin, McCarthy and Buckley2019; Grelet et al., Reference Grelet, Froidmont, Foldager, Salavati, Hostens, Ferris, Ingvartsen, Crowe, Sorensen and Pierna2020). Partial least square (PLS) regression is commonly used to develop prediction models, as this method is suitable for multivariate data (Soyeurt et al., Reference Soyeurt, Dardenne, Dehareng, Lognay, Veselko, Marlier, Bertozzi, Mayeres and Gengler2006; Eskildsen et al., Reference Eskildsen, Rasmussen, Engelsen, Larsen, Poulsen and Skov2014; Parrini et al., Reference Parrini, Acciaioli, Franci, Pugliese and Bozzi2019). Other studies have attempted to use various types of machine learning (ML) algorithms (Ghasemi and Tavakoli, Reference Ghasemi and Tavakoli2013; Contla Hernández et al., Reference Contla Hernández, Lopez-Villalobos and Vignes2021; Meza Ramirez et al., Reference Meza Ramirez, Greenop, Ashton and Rehman2021). Among these, support vector machine (SVM) and random forest (RF) are widely used supervised learning models that were initially developed to learn the algorithms of non-linear relationship data in correspondence to certain discrete and continuous output. While the PLS method is most widely used in development of prediction models, it is of interest to compare the performance of different approaches and tools. Therefore, the present study compared three different approaches (PLS, SVM and RF regression) in terms of their ability to predict DMI in Swedish dairy cattle using milk MIRS data.
Materials and methods
Data collection and pre-processing
Data on milk MIRS and on DMI (forage DMI + concentrate DMI) in the years 2017–2021 were collected from cows in the research herd, containing approximately 240 places for lactating cows, at the Swedish Livestock Research Centre, Lövsta, Sweden. All cows were either Swedish Red or Swedish Holstein breed. All cows were attached to an ID system (DelProTM) that logged feed intake and was also linked to the automatic milking system (DeLaval International AB, Tumba, Sweden), which recorded other milking-related parameters (date, time, time since the last milking, yield). The details of the feed intake and milk data records can be found in online Supplementary Text S1.
The DMI data were pre-processed before use in developing and validating models for predicting DMI. Total DMI exceeding 40 kg/d was filtered out from the dataset. The common management practice in the research barn is for cows in mid- to late lactation to be moved to a different part of the barn where forage DMI is not recorded. Therefore, DMI data were available only between 0 and 180 DIM. All cows were also kept partly on pasture during the summer (May–August) and total DMI for the animals was not measured within this period. Daily DMI was averaged over the 3 d immediately preceding the date of MIRS data.
Information about DIM was also included in the dataset, so the cows were categorised according to stage of lactation. Predictive model development was performed with all data (3–180 DIM), and with early (3–30 DIM) and mid- (30–180 DIM) lactation data separately.
Data analyses
Models were developed with three different tools, PLS regression, SVM regression and RF regression, as explained in online Supplementary Text S1. All analyses were performed using R software version 4.2.0 (R Core Team, 2022).
The prediction models were developed with data from 2017 to 2020, which comprised 1323 datalines for the full data. All models were validated using data from the most recent year (2021, 471 datalines). Coefficient of determination (R2), RMSE of prediction (RMSEP) and mean absolute error (MAE) were used to evaluate and compare the performance of the prediction models.
Results
Descriptive statistics on the data used in the analysis are presented in online Supplementary Text S2, Table S1 and Figure S1.
Table 1 shows the performance of the prediction models in predicting DMI with different types of predictors for the PLS, SVM and RF regression, respectively, when using data from 2021 as the external validation dataset (in total corresponding to 26% of all available data).
PLS, Partial Least-Squares regression; SVM, Support Vector Machine regression; RF, Random Forest regression; MIRS, full milk mid-infrared spectra (935 wavenumbers); MY, average daily milk yield; DIM, days in milk; Conc, concentrate DMI; Lact stage, lactation stage; Par, parity; RMSEP, root mean square error of prediction; MAE, mean absolute error.
In all of the three prediction approaches, it can be seen from Table 1, the best predictions were achieved using DIM 3–30 (early lactation) data. The best coefficient of determination (R2) were observed in PLS regression approach (0.65) followed by RF regression (0.62) and SVM regression (0.55).
Generally, it was found that using the full milk MIRS data alone in the model predicting DMI provided low-to-moderately good prediction accuracy (R 2 = 0.07–0.40, MAE = 2.65–3.22). When including more variables together with the milk MIRS data, e.g. MY and concentrate DMI, the R2 of the model improved and the prediction error (RMSE and MAE) were reduced.
Discussion
In the present study, we used the classical PLS method to predict DMI from milk MIRS and compared the prediction accuracy performance with that of two other non-linear ML methods (SVM and RF) that can also be used to predict regression data. The prediction accuracy (R2) is an important measurement in evaluating as well as applying a prediction equation of a trait or parameter. For example, methods and equations to estimate BW in cattle based on prediction accuracy from body size measurements are well established and widely used by farmers and researchers (Heinrichs et al., Reference Heinrichs, Rogers and Cooper1992; Bozkurt, Reference Bozkurt2006). Bozkurt (Reference Bozkurt2006) did show R 2-values of 0.69 for prediction of BW from heart girth measurements, that predictive ability is close to the highest ones in the present study (0.65 when using the PLS method and with MY and concentrate intake as well as MIRS as predictors).
In the present study, among the three approaches, PLS regression provided the best prediction accuracy. The R 2 were also quite good for the RF and SVM regression approaches. This agrees with findings by Ghasemi and Tavakoli (Reference Ghasemi and Tavakoli2013), who concluded that the RF regression tool has potential and gives good prediction accuracy for non-linear multivariate data. However, their plot of RF predicted vs. measured values showed a similar pattern to that seen in the present study, where the range of predicted values tended to be quite narrow compared with those obtained using the PLS regression method. RF regression is much easier to perform than SVM regression, because the SVM approach requires more fine-tuning of the hyperparameters in the model to choose the best values for cost and gamma functions and get good prediction accuracy. Both these non-linear ML approaches have been used successfully for classification of types of data and analysis, e.g. predicting pregnancy status (Brand et al., Reference Brand, Wells, Smith, Denholm, Wall and Coffey2021) and metabolic status (Grelet et al., Reference Grelet, Vanlierde, Hostens, Foldager, Salavati, Ingvartsen, Crowe, Sorensen, Froidmont and Ferris2019). Non-linear ML algorithms can be used for both classification and regression of predictive models, depending on the nature of the study. However, according to Meza Ramirez et al. (Reference Meza Ramirez, Greenop, Ashton and Rehman2021), SVM and RF are more commonly used for classification predictive models than for regression.
Although the SVM and RF regression models provided good prediction accuracy when the milk MIRS data were used together with additional variables, the more conventional PLS regression method still provided the best outcome. However, there are several other options or packages available for ML approaches that can be tested to better explore the possibility of using such approaches on multivariate data to predict DMI. SVM and RF regression were selected for comparison in this study, since they are both user-friendly tools that can easily be employed by users with different backgrounds.
With any approach, validation is important to ensure a reliable output. In this study, the data from the last year (2021) were used as the external test data to validate the models. This choice of test set may have resulted in lower prediction accuracy compared with a test set randomly selected from the full dataset, as there is a risk that time will cause a bias in the data. However, using the latest collected data reflected the situation where a predictive model is used on data generated after the model was built.
Many modern farms have good data/recordkeeping, to measure the performance of the farm and to ensure optimum profit in parallel with sustainable production (Soyeurt et al., Reference Soyeurt, Froidmont, Dufrasne, Hailemariam, Wang, Bertozzi, Colinet, Dehareng and Gengler2019). To our knowledge, the highest prediction accuracy to date has been obtained by Shetty et al. (Reference Shetty, Løvendahl, Lund and Buitenhuis2017) (R 2 = 0.81), who included MY and BW in a model containing the full MIRS data as a predictor for DMI. However, information on BW is not easily available on every farm. Therefore, different types of models with different parameters included would provide choices and enable users to predict DMI with good accuracy. Basic records on dairy cows, e.g. date of birth, parity number and milking records, are usually available. In development of prediction models for DMI, parameters such as these, which are also easy to retrieve, could be included to improve the prediction accuracy and reduce the prediction error. We included MY, lactation stage, parity and concentrate DMI in the models and found that the predictive ability of PLS, SVM and RF models was improved when more variables were included together with the milk MIRS data. Most Swedish dairy farms also have the possibility to adjust all or most of the concentrate allowance for their animals based on MY and stage of lactation, while the amount of concentrates consumed or delivered to each individual is often available for use as input data. Concentrate intake makes up a part of DMI and thus a relationship with total DMI can be expected. Therefore, it could be useful to include this information in models for predicting DMI.
There was an obvious pattern in the lactation curve, separating early and mid-lactation, so the data for these lactation stages were categorised and analysed separately. Using data for 3–30 DIM gave the highest prediction accuracy in all approaches, possibly due to the shorter range of days and strong linear relationship within this timeline (Rachah et al., Reference Rachah, Reksen, Afseth, Tafintseva, Ferneborg, Martin, Kohler and Prestløkken2020). Overall, the first month of lactation is very crucial as it increases the probability of negative energy balance, since the animal would probably have low DMI compared with the amount of milk produced and body reserves will be used to compensate for this. Later in lactation, there are probably also other mechanisms behind the relationship between milk MIRS and DMI.
In general, MIRS in combination with other easily available data provides good prediction accuracy in predicting DMI. However, with current advances in precision livestock farming, it would be interesting to combine current developed sensor technology, for example 3-dimensional camera and triaxial accelerometer data to estimate feed intake in cows with the MIRS data to enhance the precision in predicting the feed intake. As MIRS from milk is a completely different measure to these measures, there is a good possibility that they may complement each other.
In conclusion, all tools tested were able to predict DMI with moderate performance. Overall, PLS regression analysis gave better results than the other machine learning tools, although the differences between the tools were small. The RF regression approach gave similar accuracy as the more complicated SVM regression. Early lactation DMI gave better prediction results compared with mid-lactation DMI. Inclusion of additional variables, especially MY and concentrate DMI, improved the predictions for both lactation stages (early, mid-) examined in the present study.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0022029923000171.