Culture- or region-specific FFQ are developed to assess dietary intake because foods vary by culture and region(Reference Feskanich, Rimm, Giovannucci, Colditz, Stampfer, Litin and Willett1–Reference Jain3). The FFQ consists of a list of foods eaten commonly in a particular region or by a particular population, each food’s commonly eaten portion size and the reported intake frequency. The FFQ food list typically explains 80–90 % of the variability in the nutrients of interest. The ideal method to derive such a food list is to run several backward stepwise regression models with the nutrients of interest as the outcome and a long list of candidate foods as predictors. However, such an approach requires dietary data from about 1000 to 2000 persons in the study population assessed by some alternative method, such as 24 h dietary recall or food records(Reference Shai, Shahar, Vardi and Fraser4). Because this approach is expensive it is infrequently used. The more common method to derive a food list for the FFQ is based on smaller dietary surveys, existing instruments and expert opinion(Reference Kelemen, Anand, Vuksan, Yi, Teo, Devanesen and Yusuf2). This approach is pragmatic but may be prone to error. Nutrient intake is underestimated if the foods predicting a particular nutrient of interest are under-represented on that FFQ, or overestimated owing to inflated total frequency of intake if a large list of foods representing one food group or nutrient has been included (double counting). For example, if the FFQ contains questions on intakes of rice and biryani (a rice dish that includes meat or vegetables) the respondent may report eating the same food for both questions. In the present paper we describe methods to refine an FFQ used in an ongoing epidemiological study that overestimated nutrient intake due to double counting, and we also compare nutrient intakes assessed by the refined (RFFQ) with nutrient intakes assessed with two 24 h recalls.
Materials and methods
Study population
The Prospective Urban Rural Epidemiological Study (PURE) is a large, ongoing, prospective cohort study being conducted worldwide to investigate societal and individual determinants, including diet, of chronic conditions such as obesity and CVD. Briefly, data are being collected in fourteen countries (total of ≈140 000 adults) in urban and rural areas. In India, there are five data collection sites with a target to enrol approximately 30 000 participants. Trivandrum is one of the five Indian sites where data are being gathered in urban and rural areas on 4000 participants. Ethics review boards at McMaster University, Canada as well as appropriate institutional ethics committees in India have approved the study.
There are two parts in the present analysis: first, refinement of the FFQ and, second, its validation. Data were available for 2527 participants in the refinement part of the study (1351 urban and 1169 rural, seven missing data). Data were collected using the original FFQ (OFFQ) from September 2004 to January 2005. From this we excluded participants with implausible total energy intake values (<2·72 MJ/d (<650 kcal/d), >15·69 MJ/d (>3750 kcal/d)) and had a total data set of 1867 participants (OFFQe). We then conducted a pilot study, consisting of a convenience sample of the PURE study participants (n 100), to validate the RFFQ between September 2005 and January 2006.
Original FFQ
In PURE, diet is assessed with FFQ that are specific for each population(Reference Dehghan, Al Hamad, Yusufali, Nusrath, Yusuf and Merchant5). The FFQ at the Trivandrum site overestimated nutrient intake. This was a 132-item quantitative FFQ that was developed from 24 h recalls in India. Participants reported the usual portion size of each food in the questionnaire and how often on average they consumed it in the previous year. Participants saw different sized cups, plates, bowls and other utensils to help them estimate portion size. Intake frequency consisted of four categories: never, monthly, weekly and daily. As fruits and vegetables availability is seasonal, this was considered when estimating their intake. Fruit and vegetable season duration was determined by interviewing vendors; the median reported value in months was considered as the duration of the season.
To estimate nutrient intake we multiplied the reported intake frequency of each food on the FFQ by the reported portion size and its respective nutrient composition, summing over all foods. The composition of raw food items was determined from the Indian food composition table(Reference Gopalan, Sastari and Balasubramanian6). In certain cases where this information was not available in the Indian food composition table, McCance and Widdowson’s The Composition of Foods (Reference Krebs7) and the US Department of Agriculture’s National Nutrient Database for Standard Reference release 19 (USDA, Washington, DC, USA) were consulted. For prepared foods, we collected recipes and verified them by preparing the foods in a metabolic kitchen or in the participants’ homes. We used the reference food composition table to estimate nutrient content, accounting for preparation method.
As this FFQ provided implausible values of nutrient intake we explored for its possible reasons. We excluded participants with implausible nutrient estimates, i.e. daily energy intake of <2·72 MJ (<650 kcal) or >15·69 MJ (>3750 kcal). We named the FFQ data after excluding over- and under-reporters for energy intake as ‘OFFQe’. We also systematically explored for other potential sources of error, such as errors in data entry, the food composition table, recipe analysis and interviewing techniques, and identified the likely cause to be double counting of foods. For instance, an individual may have reported eating biryani (a rice dish) and then counted it again when reporting rice intake. To reduce the likelihood of this error we shortened this FFQ.
Shortened FFQ
To shorten the FFQ we used stepwise regression analyses with all items on the OFFQe (132 items) as independent variables and the nutrients of interest as dependent variables as suggested by Willett(Reference Willett8) and described below under ‘Statistical analyses’. We also considered how often the food item was eaten and knowledge of local food items while refining the food list.
Design for validation of the refined FFQ
A subset of participants from the main Trivandrum study population was invited to participate in the FFQ validation study. These participants completed two FFQ and two 24 h recall forms over 4 months. The first refined FFQ (RFFQ1) and 24 h recall were administered in September 2005, and the second refined FFQ (RFFQ2) and 24 h recall in November 2005. Trained field staff interviewed the participants at their homes for dietary 24 h recalls. It took 30 minutes to conduct a 24 h recall. Various aids were used during the interview to assist in portion size estimation for the 24 h recalls.
Statistical analyses
Shortening and refinement of the FFQ
To shorten the FFQ we ran a series of stepwise regression analyses with all items on the OFFQ (132 items) as independent variables. The dependent variables included energy, protein, carbohydrate, fat, SFA, fibre, vitamin A, vitamin C, Ca, folate and Zn. The P value for a variable to enter into the model was 0·10, and that for it to remain was 0·05. We included all the foods that predicted any of the nutrients in these models. The SAS statistical software package version 9·1 (SAS Institute, Cary, NC, USA) was used for all analyses.
Validation and reliability analysis
Mean nutrient intakes with their standard deviations were computed for the OFFQ, OFFQe, the two RFFQ and the mean of the two 24 h recalls. Nutrient estimates were log-transformed as they tended to be skewed positively. Pearson product-moment correlations between intakes estimated by the FFQ and those calculated from the recalls were computed. We corrected for errors in nutrient comparisons arising from within-person variation as described by others(Reference Willett8–Reference Rimm, Giovannucci, Stampfer, Colditz, Litin and Willett10). We assessed the crude as well de-attenuated correlations for nutrient estimates. We also assessed the reliability of the RFFQ by calculating intra-class correlation coefficients between energy-adjusted nutrient estimates for RFFQ1 and RFFQ2. To estimate the degree of bias in nutrient estimates obtained from the RFFQ, the regression was performed of energy-adjusted nutrient intakes estimated from the mean of the two 24 h recalls as the outcome v. those from the RFFQ2 as the predictor.
Results
The mean age of the participants (after excluding participants who over- and under-reported total energy intake, n 662) included in the main study was 50·7 (sd 10·1) years, with 73·9 % being women. The mean BMI of this population was 24·1 (sd 4·1) kg/m2; the population was overweight on average according to WHO guidelines for assessing overweight in South Asians (≥23 kg/m2). Only 10·8 % had received university education and 53·7 % of the participants resided in urban areas while the rest lived in rural areas (Table 1).
The numbers of food items explaining 90 % and 99 % of the variance in nutrient intake are presented in Fig. 1. Between five and twelve food items largely explained intakes of some nutrients such as vitamin A, vitamin C and Ca (cumulative R 2 = 90 %), while for other nutrients such as energy, Zn and SFA between nineteen and twenty-two food items were needed. Fifty-seven food items explained 90 % of the variation in eleven nutrients. Likewise, fewer food items were needed to obtain a cumulative R 2 of 99 % for vitamin A, vitamin C and Ca, while more food items were needed to explain a similar level of variation for total energy intake and other nutrients. We shortened our food items list based on the 90 % variance models for total energy, protein, fat, SFA, fibre, carbohydrate, vitamin A, vitamin C, Ca, folate and Zn. In addition to this, thirteen food items that were consumed at least twice monthly were also retained. We then expanded this list based on input from experts in the field and obtained a shortened FFQ (a list of food items in the shortened FFQ appears in the Appendix). Intake responses of this questionnaire were modified so that they were one of nine categories ranging from never or <1 time/month to ≥6 times/d. The portion sizes were fixed based on the median reported serving size in this population as opposed to having an open response category.
Intakes of energy and macronutrients were similar using the RFFQ1, RFFQ2 and 24 h recalls, but higher for the OFFQ and OFFQe (Table 2). Mean usual daily energy intake estimated from the OFFQ was 13·39 (sd 5·46) MJ (3201 (sd 1305) kcal), daily protein intake was 96·1 (sd 44·1) g and fat 120·8 (sd 57·7) g. Despite excluding the over- and under-reporters, the nutrient estimates were still higher from the OFFQe relative to the RFFQ and 24 h recalls. Mean usual daily energy intake estimated from OFFQe was 10·96 (sd 2·65) MJ (2619 (sd 634) kcal), protein was 77·9 (sd 22·2) g, fat 96·0 (sd 33·5) g. In contrast to the OFFQ, mean usual daily intakes estimated from the RFFQ1 were 8·31 (2·20) MJ (1985 (sd 527) kcal) for energy, 58·8 (sd 17·9) g for protein and 64·8 (sd 24·5) g for fat; while the corresponding values estimated from RFFQ2 were 7·94 (sd 2·05) MJ (1897 (sd 489) kcal), 54·5 (sd 16·0) g and 62·0 (sd 23·8) g.
OFFQ, original FFQ; OFFQe, original FFQ after excluding over- and under-reporters; RFFQ1, refined FFQ first administration; RFFQ2, refined FFQ second administration; RE, retinol equivalents.
Comparing RFFQ1 and 24 h recalls, the correlation coefficients ranged from 0·11 for vitamin A to 0·44 for protein intake, while the correlations for RFFQ2 ranged from 0·09 for vitamin A and SFA to 0·35 for protein intake. The de-attenuated correlations between RFFQ1 and 24 h recalls ranged from 0·25 for vitamin A to 0·82 for total fat intake. The de-attenuated correlations between RFFQ2 and 24 h recalls ranged from 0·12 for fibre to 0·49 for protein intake (Table 3).
RFFQ1, refined FFQ first administration; RFFQ2, refined FFQ second administration.
*Log-transformed nutrients.
In the analyses in which the mean of the two 24 h recall intakes was the outcome and nutrient intake from RFFQ2 was the predictor, most food items such as carbohydrate, Zn and the vitamins had very small coefficients indicating little bias. The RFFQ underestimated total energy, protein and Ca intakes but overestimated intakes of fat and folate (Table 3). The intra-class correlations for the nutrients computed from RFFQ1 and RFFQ2, assessing the reliability of the RFFQ, ranged from 0·26 for vitamin C to 0·51 for Ca intake.
Discussion
We refined an FFQ that overestimated nutrient intake in an ongoing study in a developing country. We systematically addressed, to the extent possible, potential sources of error in the estimation of nutrient intakes, which included verifying that the food composition table, recipes and data entry were accurate, to eliminating double counting by systematically shortening the FFQ by regression analyses. We also reformatted the older, quantitative FFQ into a semi-quantitative instrument. Reasonable estimates were obtained when the RFFQ was validated against multiple 24 h recalls. The RFFQ on average took less time to administer (10 minutes) than the OFFQ (18 minutes). While these techniques are described in the literature on FFQ development(Reference Merchant, Dehghan, Chifamba, Terera and Yusuf11), we did not find any example of them being applied to improving estimation of existing instruments. This application may be particularly important in epidemiological studies, as often data collection and FFQ validation occur in parallel(Reference Shu, Yang, Jin, Liu, Kushi, Wen, Gao and Zheng12).
The results from the RFFQ were plausible and consistent. For example, fewer food items predicted 90 % of the vitamin A and C intake (concentrated in a few food items), as opposed to a larger number of food items that predicted total energy intake (Fig. 1). Our findings are similar to those observed by others in which few food items were needed to explain some nutrients due to their limited occurrence in foods, while the more ubiquitous nutrients were explained by more food items(Reference Shai, Shahar, Vardi and Fraser4, Reference Stryker, Salvini, Stampfer, Sampson, Colditz and Willett13). The mean nutrient intakes estimated by the RFFQ were similar to those obtained from the 24 h recall, and within the range reported by others in South Asia(Reference Chadha, Gopinath, Katyal and Shekhawat14, Reference Shobana, Snehalatha, Latha, Vijay and Ramachandran15). In an Indian investigation, mean usual daily energy intake was observed to be 7·32 MJ (1749 kcal) and 7·99 MJ (1910 kcal) in the urban and rural population, respectively(Reference Chadha, Gopinath, Katyal and Shekhawat14). Likewise, in a study conducted in south India, the mean daily energy intake was 8·64 (sd 1·83) MJ (2066 (sd 437) kcal) for men and 7·30 (sd 1·44) MJ (1745 (sd 343) kcal) for women(Reference Shobana, Snehalatha, Latha, Vijay and Ramachandran15). The de-attenuated correlation coefficients we observed between RFFQ1 and the 24 h recalls (ranging from 0·25 to 0·82) in the present study were similar to those reported in other validation studies (range of 0·32–0·61 for a study conducted in Kerala(Reference Hebert, Gupta, Bhonsle, Murti, Mehta, Verghese, Aghi, Krishnaswamy and Mehta16) and 0·55–1·00 for a study conducted in Gujarat(Reference Hebert, Gupta, Bhonsle, Sinor, Mehta and Mehta17)). In agreement with other reports(Reference Hebert, Gupta, Bhonsle, Sinor, Mehta and Mehta17), correlations observed in the present study for vitamin A and C were lower compared with other nutrients. As vitamins A and C are concentrated in a few foods they tend to have high-within person variability and lower correlation coefficients in validation studies, as reported elsewhere(Reference Hebert, Gupta, Bhonsle, Sinor, Mehta and Mehta17). Consistent with other studies, energy intake estimates from the FFQ were lower than those obtained by the 24 h recall(Reference Hebert, Gupta, Bhonsle, Sinor, Mehta and Mehta17).
Fewer food items explained 90 % of variation in our analysis compared with the 126 food items in an Israeli population using a similar approach(Reference Shahar, Shai, Vardi, Brener-Azrad and Fraser18). This may have been because the participants in the Israeli study had varied ethnic and cultural backgrounds, but our population was homogeneous. Moreover, in poorer communities there are fewer food choices and consequently there is a smaller range of dietary variation(Reference Shai, Shahar, Vardi and Fraser4). The number of food items (132 on the OFFQ) might not seem excessive by Western standards, but was probably large for an FFQ in this ethnically homogeneous, lower-income Trivandrum community. We also changed the FFQ format so that the question asked the participants about average portion sizes of foods consumed (obtained from the 24 h recall). By doing so we eliminated the need to show participants visual aids to estimate portion size. We adopted this strategy because it has been reported that frequency of intake alone explains 84 % of the variance in nutrient intake, and addition of open-ended questions on portion size may increase respondent burden and the chances of incomplete data(Reference Noethlings, Hoffmann, Bergmann and Boeing19). Similar findings have also been reported before(Reference Tjonneland, Haraldsdottir, Overvad, Stripp, Ewertz and Jensen20). We also changed the response categories into a 9-point ordinal scale adapted from other validated FFQ(Reference Martin-Moreno, Boyle, Gorgojo, Maisonneuve, Fernandez-Rodriguez, Salvini and Willett21, Reference Willett, Sampson, Stampfer, Rosner, Bain, Witschi, Hennekens and Speizer22). The RFFQ was therefore semi-quantitative. This change required the interviewer to check a category rather than write a number, and asked the participant to estimate a range of usual intake rather than provide an exact number.
Some limitations of our work merit consideration. The correlations we observed in the present study were in general lower than those reported by others who compared FFQ data to several weeks of diet records(Reference Fornes, Stringhini and Elias9, Reference Shobana, Snehalatha, Latha, Vijay and Ramachandran15, 23), but similar to estimates comparing FFQ data to multiple 24 h recalls(Reference Shatenstein, Nadon, Godin and Ferland24) generally and to studies done in the subcontinent in particular(Reference Chen, Ahsan, Parvez and Howe25, Reference Pandey, Bhatia, Boddula, Singh and Bhatia26). A possible reason why our correlations are lower than those reported for FFQ validated against diet records may be that we had data from only two 24 h recalls as a reference method, as opposed to estimates from several days considered by others(Reference Martin-Moreno, Boyle, Gorgojo, Maisonneuve, Fernandez-Rodriguez, Salvini and Willett21, Reference Pandey, Bhatia, Boddula, Singh and Bhatia26, Reference Sevak, Mangtani, McCormack, Bhakta, Kassam-Khamis and Silva27). Moreover, we observed that the correlations for RFFQ2 with the mean of the two 24 h recalls were lower than those from RFFQ1. Generally, the correlation coefficients improve for the second FFQ owing to a learning effect or similar reference period for the two dietary methods. The lower correlations observed in the present study may be because of participant fatigue, as the second 24 h recall and RFFQ2 were administered simultaneously.
Although it has been suggested(Reference Willett8) that stepwise regression analysis can be performed to develop an FFQ food item list, this may not be the optimal strategy if the sample size is less than 1000 to 2000, as some food items may not enter the statistical model due to inadequate power. The strength of our work is that we had a large sample size, which reduces the chances of beta error.
Lengthy FFQ have the potential to overestimate nutrient intake and also are not feasible to administer from a logistics point of view. Shorter FFQ take less time to administer and also provide valid and reliable nutrient estimates. We have shared a strategy for refining lengthy questionnaires and arriving at reasonable estimates of nutrient intake in India. Other researchers in the field may be able to adapt this approach to obtain more valid nutrient estimates from existing FFQ.
Acknowledgements
The authors wish to thank Dr Salim Yusuf, Director of the Population Health Research Institute, McMaster University and Principal Investigator for the PURE study, for providing us the opportunity to analyse and report these data. We also thank Joseph Michael for diligently entering the data from the FFQ as well as the 24 h recalls.
Author contributions: Data collection – K.A., C.R.S., A.V.B.; study design – A.T.M., R.I.; analyses – X.Z., R.I., S.I.; writing – R.I., A.T.M.; critical revisions – R.I., K.A., A.V.B., X.Z., S.I., C.R.S., A.T.M.
None of the authors reported any conflict of interest.
Sources of funding: Funding for this study was provided by the Population Health Research Institute, Hamilton, Canada.