Variance Components Models for Analysis of Big Family Data of Health Outcomes in the Lifelines Cohort Study

Nino Demetrashvili; Nynke Smidt; Harold Snieder; Edwin R. van den Heuvel; Ernst C. Wit

doi:10.1017/thg.2019.1

Variance Components Models for Analysis of Big Family Data of Health Outcomes in the Lifelines Cohort Study

Published online by Cambridge University Press: 04 April 2019

Nino Demetrashvili ,

Nynke Smidt ,

Harold Snieder ,

Edwin R. van den Heuvel and

Ernst C. Wit

Show author details

Nino Demetrashvili: Affiliation:
Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands Department of Medical Statistics, National Center for Disease Control and Public Health, Tbilisi, Georgia
Nynke Smidt: Affiliation:
Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
Harold Snieder: Affiliation:
Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
Edwin R. van den Heuvel: Affiliation:
Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
Ernst C. Wit*: Affiliation:
Bernoulli Institute, University of Groningen, Groningen, The Netherlands Institute of Computational Science, Università della Svizzera italiana, Lugano, Switzerland
*: Author for correspondence: Ernst C. Wit, Email: e.c.wit@rug.nl

Article contents

Abstract
Methods
Lifelines Analysis Results
Simulation Study: Design and Results
Discussion
Financial support
References

Abstract

Large multigenerational cohort studies offer powerful ways to study the hereditary effects on various health outcomes. However, accounting for complex kinship relations in big data structures can be methodologically challenging. The traditional kinship model is computationally infeasible when considering thousands of individuals. In this article, we propose a computationally efficient alternative that employs fractional relatedness of family members through a series of founding members. The primary goal of this study is to investigate whether the effect of determinants on health outcome variables differs with and without accounting for family structure. We compare a fixed-effects model without familial effects with several variance components models that account for heritability and shared environment structure. Our secondary goal is to apply the fractional relatedness model in a realistic setting. Lifelines is a three-generation cohort study investigating the biological, behavioral, and environmental determinants of healthy aging. We analyzed a sample of 89,353 participants from 32,452 reconstructed families. Our primary conclusion is that the effect of determinants on health outcome variables does not differ with and without accounting for family structure. However, accounting for family structure through fractional relatedness allows for estimating heritability in a computationally efficient way, showing some interesting differences between physical and mental quality of life heritability. We have shown through simulations that the proposed fractional relatedness model performs better than the standard kinship model, not only in terms of computational time and convenience of fitting using standard functions in R, but also in terms of bias of heritability estimates and coverage.

Keywords

Determinants and health outcome founders and non-founders fractional relatedness of family members genetic and environmental factors heritability and its confidence interval kinship model mental and physical health scores mixed effects models

Type: Articles
Information: Twin Research and Human Genetics , Volume 22 , Issue 1 , February 2019 , pp. 4 - 13

DOI: https://doi.org/10.1017/thg.2019.1 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s) 2019

Excessive weight, especially obesity, is a major public health concern in the western world. Around two-thirds of the adult population in the USA and at least half of the population of many European countries are currently overweight or obese (Berghöfer et al., Reference Berghöfer, Pischon, Reinhold, Apovian, Sharma and Willich2008; Flegal et al., Reference Flegal, Carroll, Ogden and Curtin2010; Wang et al., Reference Wang, Beydoun, Liang, Caballero and Kumanyika2008). In the Netherlands, the prevalence of overweight is 48.3% (Volksgezondheid, Nationaal Kompas, 2012), and 12.7% of the Dutch population is classified as obese. It is known that obesity — defined as a BMI of 30 or more — is a major risk factor for many chronic diseases, such as hypertension, stroke, coronary heart disease, diabetes, arthritis and overall mortality (Flegal et al., Reference Flegal, Kit, Orpana and Graubard2013; Prospective Studies Collaboration et al., Reference Whitlock, Lewington, Sherliker, Clarke, Emberson and Peto2009). Furthermore, increased BMI has also been shown to be associated with reduced physical health-related quality of Life (HRQoL) (Ul-Haq et al., Reference Ul-Haq, Mackay, Fenwick and Pell2012, Reference Ul-Haq, Mackay, Fenwick and Pell2013). However, evidence on the relationship between BMI and mental HRQoL is inconclusive. Some studies have found that BMI is associated with poor mental health (Baumeister & Härter, Reference Baumeister and Härter2007; Ohayon, Reference Ohayon2007; Petry et al., Reference Petry, Barry, Pietrzak and Wagner2008), whereas others did not find such a relationship (Crisp & McGuiness, Reference Crisp and McGuiness1976; Goldney et al., Reference Goldney, Dunn, Air, Dal Grande and Taylor2009; Palinkas et al., Reference Palinkas, Wingard and Barrett-Connor1996; Petry et al., Reference Petry, Barry, Pietrzak and Wagner2008). Ul-Haq et al. (Reference Ul-Haq, Mackay, Fenwick and Pell2014) also found that the association between BMI and mental health among the Scottish adult population (N = 37,272) was moderated by age and sex. In contrast to the above-mentioned studies, Ul-Haq et al. (Reference Ul-Haq, Mackay, Fenwick and Pell2014) used the full spectrum of BMI and adjusted for potential confounders and found that only young obese women (<45 years of age with BMI >29.9 kg/m²) had significantly reduced mental health. Furthermore, being underweight was also associated with diminished mental health among women of all ages, but not men.

Background on Lifelines

Lifelines is a large, population-based cohort study and biobank investigating the biological, behavioral and environmental determinants of healthy aging among 167,729 inhabitants from the northern part of the Netherlands. The cohort profile of the Lifelines study has been extensively described in Scholtens et al. (Reference Scholtens, Smidt, Swertz, Bakker, Dotinga, Vonk and Stolk2015). Summarizing, the participants’ baseline visit took place between December 2006 and December 2013. All general practitioners in the three northern provinces of the Netherlands were asked to invite their registered patients aged 25–49 years. All persons who consented to participate were asked to provide contact details to invite their family members (i.e., partner, parents and children), resulting in a three-generation study. In addition, participants could also register their participation via the Lifelines website. Lifelines adopted a multigenerational study design to disentangle the genetic, lifestyle and environmental contributions to the development of chronic diseases, study the between-generation similarities and identify the preclinical stages of ageing at an early age (Stolk et al., Reference Stolk, Rosmalen, Postma, de Boer, Navis, Slaets and Wolffenbuttel2008). Baseline data were collected from 167,729 participants, aged from 6 months to 93 years. Follow-up is planned for at least 30 years, with questionnaires administered every 1.5 years and a physical examination scheduled every 5 years. The physical examinations, including anthropometry, lung function, blood pressure, electrocardiogram (ECG) and cognition tests, are conducted at one of the Lifelines research sites. In addition, fasting blood and 24-h urine samples are collected from all participants. A comprehensive questionnaire on history of (chronic) diseases, HRQoL, lifestyle (physical activity, alcohol use, diet and smoking status), individual socioeconomic status (income and education level), psychosocial stress, work (profession, working hours), psychosocial characteristics and medication use is completed at home.

Lifelines is a facility that is open to all researchers. Information on application and data access procedure is summarized on www.lifelines.net. An overview of the available data is presented in the online Lifelines Data Catalogue.

Motivation and Goal

Although the explicit familial structure of the data is an advantage in studying the genetic and environmental components of various health-related outcomes, it is in fact a complicating factor in studying the effect of traditional epidemiological covariates. Given that participants are related (e.g., grandparent–grandchild, parent–child, and sibling–sibling relations), genetic, shared environmental and health behavioral factors may confound the effects of BMI on physical and mental HRQoL.

The standard way to control for familial effects is through the kinship model (Almasy & Blangero, Reference Almasy and Blangero1998), which is a variance components model whereby genetic relatedness is modeled through a kinship covariance matrix. Despite the genetic plausibility of the model, it is computationally prohibitive for thousands of individuals and hundreds of families. This means that it cannot be used for the Lifelines study.

Our first objective is to study the effect of BMI on mental and physical components of HRQoL scores with and without accounting for relatedness in a family. Another objective is to determine a computationally efficient model that incorporates genetic and shared environment to assess the contribution of epidemiological determinants on health outcomes. Given the large sample size and inclusion of extended families in the study, Lifelines allows the identification of the effect of relatedness with higher precision than other studies that are often (much) smaller and limited to particular types of family relationships (e.g., Lichtenstein et al., Reference Lichtenstein, Yip, Björk, Pawitan, Cannon, Sullivan and Hultman2009; Noh et al., Reference Noh, Yip, Lee and Pawitan2006; Pawitan et al., Reference Pawitan, Reilly, Nilsson, Cnattingius and Lichtenstein2004; Rabe-Hesketh et al., Reference Rabe-Hesketh, Skrondal and Gjessing2008; Yip et al., Reference Yip, Björk, Lichtenstein, Hultman and Pawitan2008). The current Lifelines study includes extended families with up to 19 members.

Not only are the variance components due to specific factors of interest, but also the relative contribution of these variances in the total variance of the outcome. Related to this is the concept of intraclass correlation coefficient (ICC). The ICC represents the heritability coefficient in a narrow sense when applied to additive genetic models. The concept of heritability originates from Fisher (Reference Fisher1918) and Wright (Reference Wright1920) and was formalized by Lush (Reference Lush1940). An extensive review on the concept and misconceptions of heritability is given by Visscher et al. (Reference Visscher, Hill and Wray2008). In the Methods section, we provide the definitions of shared environmental, unique environmental and hereditary ICCs in the context of our models. We also briefly introduce the beta-approach (Demetrashvili et al., Reference Demetrashvili, Wit and Van den Heuvel2016) to construct confidence intervals for the ICC. The beta-approach has been successfully applied to construct confidence intervals for ratios of sums of variance components in linear and nonlinear mixed-effects models (Demetrashvili & Van den Heuvel, Reference Demetrashvili and Van den Heuvel2015). This approach will use the first and second moments of the ICC estimate in combination with a beta distribution for accurate confidence intervals.

In the next section, we provide the background on the reconstruction of families and outcome measures. Then we describe various models and give a motivation of their use, including criteria for their selection. Following, we provide the analysis of familial confounding of BMI-related physical and mental HRQoL in the Lifelines study. We then compare our fractional relatedness model with more traditional kinship models through simulations. We conclude the article with an extensive discussion.

Methods

In this section, we explain how we reconstructed extended families from the available local kinship relationships. The outcomes of interest are mental and physical health, which were reconstructed from a RAND-36 questionnaire. Then we applied mixed-effects models beside a fixed-effect model to study the effect of BMI on mental and physical health.

Family Reconstruction

For a large number of participants in the Lifelines study, their parents, partner and children are also included in the study. Such information is relevant for disentangling the genetic, behavioral and shared environmental variances. In biometric genetics, the coefficient of relatedness or genetic correlation for two individuals is defined as the expected proportion of genes of two individuals that are identical by descent (Sham, Reference Sham1998, p. 208). A related concept in biometric genetics, which is used in this study, is that of a founder. Individuals without ancestors in the study are called founders, whereas others are called non-founders (Almgren et al., Reference Almgren, Bendahl, Bengtsson, Hössjer and Perfekt2003, p. 10). Founders in our study population are assumed unrelated.

Considering the information provided by Lifelines participants, we define a family as a group of related individuals sharing environmental and/or genetic factors. For example, health responses of mother and child may be similar due to both genetic similarity and shared behavior and environment, whereas the health responses of partners are related only due to the latter. Within the context of the Lifelines study, we define the concept of an extended family as a connected graph of individuals either via parent–child or partner–partner relationships. An example of a family is given in Figure 1. Note that the sibling relationships in this graph are inferred from common parent relationships. Sibling information itself is not recorded within Lifelines. We do not have extensive information for all reconstructed families in the Lifelines study, as some members might not participate in the study.

Fig. 1. Family example consisting of 13 members.

Information on children from previous marriages is, in principle, also available. However, not all familial information is complete in Lifelines, since people may not be willing, for example, to identify ex-partners. Some of the information, however, can be reconstructed from the partial information provided by the participants. For example, if a child declares both parents and the parents declare this child, but do not declare each other as partners, we make a link between such parents as a couple. For family reconstruction, we identify a set of related individuals through parent–child and/or partner–partner relationships and call this set a family. Once an individual is assigned to a particular family, no further reassignments take place. The model we propose in this study requires construction of the relatedness matrix. Relatedness between founders and non-founders is fractional, as explained in the next section. In Table 1, we outline the algorithm used for construction of fractional relatedness matrix.

Table 1. Algorithm for Construction of a Fractional Relatedness Matrix

Outcome Measures

A sample consisting of 91,759 participants from the baseline Lifelines cohort study data release was available for analysis. HRQoL was measured using the Dutch version of the RAND-36 questionnaire (Hays & Morales, Reference Hays and Morales2001; Van der Zee & Sanderman, Reference Van der Zee and Sanderman1993; Van der Zee et al., Reference Van der Zee, Sanderman, Heyink and de Haes1996). HRQoL refers to how health impacts on an individual’s ability to function and his or her perceived well-being in physical, mental and social domains of life. The RAND-36 consists of 36 items measuring eight health concepts, that is, physical functioning, role limitations caused by physical health problems, bodily pain, general health perceptions, role limitations caused by emotional problems, social functioning, emotional well-being, and energy/fatigue. The first four reflect the physical health and the last four reflect the mental health of an individual. The scales of these eight concepts are combined into two summary measures of HRQoL, the physical component score (PCS) and the mental component score (MCS), using the scoring algorithm of Ware et al. (Reference Ware, Kosinski and Keller1994). PCS and MCS are between 0% and 100% and higher scores correspond to better quality of life.

Statistical Models

We compare four models to examine whether the effect of BMI on HRQoL differs with and without accounting for family structure. These models are: M₀: multiple regression model; M₁: mixed-effects model with random intercept for the family, capturing the environmental familial effect; M₂: mixed-effects model with random slopes for founders within a family, capturing the genetic familial effect. The fourth model, M₃, is a combination of the last two models. Thus, M₃ is a mixed-effects model with random intercept for family and random slopes for founders, capturing both environmental and genetic familial effects.

Assume that a total of n individuals and I families are included in the study. Suppose for the ith family n_i members have been observed (i = 1, 2, …, I), such that $n = \sum\limits_{i = 1}^I {{n_i}} $. The multiple regression model M₀ of the HRQoL response y_ij for the jth member of the ith family can be written as:

(1)

$${y_{ij}} = {\bf{x}}_{ij}^T\beta + {\varepsilon _{ij}},$$

where x_ij is a p × 1 vector for the jth individual in the ithfamily measured on p covariates, including BMI and possible confounders, β is a p × 1 vector of coefficients and residual errors ɛ_ij across all observations are assumed to be identically and independently distributed (iid) having a normal distribution with mean zero and variance $\sigma_{R}^{2},\,{\varepsilon_{ij}}\mathop\sim\limits^{iid} N(0,\sigma _R^2)$. The fixed effects x_ij can be quantitative or dummy variables to represent categorical variables, so the effective number of covariates may be less or equal to p.

Note, observations y_ij within the same family are most likely correlated, but the multiple regression model M₀ does not account for this. Model M₁ will account for this correlation by introducing a random intercept u_i for every family:

(2)

$${y_{ij}} = {\bf{x}}_{ij}^T\beta + {u_i} + {\varepsilon _{ij}},$$

where u_i is normally distributed with mean zero and variance $\sigma _R^2,\,{u_i}\mathop\sim\limits^{iid} N(0,\sigma _u^2)$; the definitions of x_ij, β and ɛ_ij are the same as in (1).

Model M₁ does not disentangle the genetic and shared environmental variation. Since one of the goals is to estimate the variance contribution in MCS and PCS due to various factors, model M₂ will assume that the genetic correlation between family members is due to additive genetic effects of alleles (with no dominant and epistatic effects). By assuming all genetic information is in the founders, and consequently imposing the fractional relatedness effect between founders and other members of the family, model M₂ introduces the random slopes υ_i for the set of founders m_i in family i and can be formulated as:

(3)

$${y_{ij}} = {\bf{x}}_{ij}^T\beta + {\bf{F}}_{ij}^T{\upsilon _i} + {\varepsilon _{ij}}$$

where definitions of x_ij, β and ɛ_ij are the same as in (1) and F_ij is the m_i × 1 vector of founders for individual j in family I, where F_ijk is a fractional relatedness of individual j to founder k in family i with $\sum\limits_{k = 1}^{{m_i}} {{F_{ijk}}} = 1$; υ_i is the m_i × 1 vector of random slopes for founders in family I, where υ_i = (υ_i ₁, … υ_im), ${\upsilon _{ik}}\mathop\sim\limits^{iid} N(0,\sigma _f^2)$. We assume independence between random terms.

An important computational advantage of model M₂ compared to the traditional kinship model is employment of a substantially smaller design matrix F_ij. Namely, in M₂, the design matrix across all families is of size m by n, where m is the maximum number of founders among all families and n is the total number of participants in all families. In the classical kinship model, the variance component matrix would be of size n by n. Dimensionality reduction is particularly crucial when one analyzes large number of participants, such as in Lifelines.

An example of a family consisting of 13 members in the Lifelines study is shown in Figure 1: oval shapes refer to females and squares to males. The associated fractional relatedness matrix F is demonstrated below. The M₂ model assumes that the health outcome has a hereditary component. For example, the founders F1 and F2 both share half of their random hereditary effects with their child, member 3, and one-quarter with their grandchild, member 7.

$$Fi = \left( {\matrix{ 1 \cr 2 \cr 3 \cr 4 \cr 5 \cr 6 \cr 7 \cr 8 \cr 9 \cr {10} \cr {11} \cr {12} \cr {13} \cr } \left| {\matrix{ {F1} & {F2} & {F12}&{F13} \cr 1& 0 & 0 & 0 \cr 0&1&0 & 0 \cr {1/2} & {1/2} & 0 & 0 \cr {1/2} & {1/2} & 0 & 0 \cr {1/2} & {1/2} & 0 & 0 \cr {1/2} & {1/2} & 0 & 0 \cr {1/4} & {1/4} & {1/2} & 0 \cr {1/4} & {1/4} & {1/2} & 0 \cr {1/4} & {1/4} & {1/2} & 0 \cr {1/4} & {1/4} & 0 & {1/2} \cr {1/4} & {1/4} & 0 & {1/2} \cr 0 & 0 & 1 & 0 \cr 0 & 0 & 0 & 1 \cr } } \right.} \right)$$

Model M₃ is a combination of models M₁ and M₂, and can be written:

(4)

$${y_{ij}} = {\bf{x}}_{ij}^T\beta + {u_i} + {\bf{F}}_{ij}^T{\upsilon _i} + {\varepsilon _{ij}},$$

where definitions and assumptions used in models M₁ and M₂ remain the same for this model. Variance components $\sigma _u^2, \sigma _f^2 \, {\rm {and}} \,\sigma _R^2$ are due to shared environmental, genetic and unique factors, respectively.

Inference and Model Selection

The overall significance of each covariate is tested using a conditional F test. In this test, as in the usual F test of covariates for regression models, the conditional estimate of the residual error variance is used. More details on the F test are given in Pinheiro and Bates (Reference Pinheiro and Bates2009, §2.4.2). Confidence intervals for marginal coefficients β_l are constructed based on conditional t tests. Each fixed-effect coefficient can be tested marginally in the presence of other fixed effects in the model (Pinheiro & Bates, Reference Pinheiro and Bates2009, pp. 92–96). The approximate 100% (1−α) confidence limits on the β_l are computed as:

(5)

$${\hat \beta _l} \pm t{}_{df{}_l}(1 - \alpha /2){\sqrt {\hat var(\hat \beta )} _{ll}},$$

where ${\hat \beta _l}$ is an estimate of lth fixed effect, ${t_{d{f_l}}}(q)$ denotes the qth quantile of a t distribution with $d{f_l}$ degrees of freedom and $\hat var{(\hat \beta )_{ll}}$ is an estimate of the variance of ${\hat \beta _l}$. Clearly $\hat var(\hat \beta )$ is the variance–covariance matrix of the vector $\hat \beta $ of fixed-effects estimates. More on the determination of the degrees of freedom for our model is in the below ‘Estimation of fixed effects’ section.

To select the best model we used the Bayesian information criterion (BIC; Schwarz, Reference Schwarz1978). The BIC for these models is defined as:

(6)

$${\rm{BIC}}(M) = - 2{l_M}(\theta |y) + df(M)\ln (n),$$

where ${l_M}( \cdot )$ is the log-likelihood function for the estimated model M with θ a vector of all parameters, df(M) denotes the overall number of parameters in the model, that is, the regression and variance components parameters and n is the total number of observations used to fit the model. The model with the smallest BIC is preferred.

Intraclass Correlation Coefficient

The models we defined above allow us to define the following three types of ICC: (1) the behavioral and shared environmental ICC (c ²), as the proportion of total variance due to shared environmental components; (2) the hereditary ICC (h ²), as the proportion of total variance due to the additive genetic component; and (3) the unique environmental ICC (e ²), as the proportion of total variance due to unique environmental components. We define these ICCs for model M₃ as follows:

(7)

$${c^2} = {{\sigma _U^2} \over {\sigma _U^2 + \sigma _f^2 + \sigma _R^2}}\,{h^2} = {{\sigma _f^2} \over {\sigma _U^2 + \sigma _f^2 + \sigma _R^2}}\,{e^2} = {{\sigma _R^2} \over {\sigma _U^2 + \sigma _f^2 + \sigma _R^2}}$$

For M₁ and M₂ models, the c ², h ² and e ² are deduced from formula (7), by ignoring (setting to zero) those variance components that are not present in the model.

We outline the beta-approach for obtaining the CI for h ², though this approach can be similarly applied to c ² and e ². The distribution of the estimator h ² is approximated with a beta distribution, ${\hat h^2}\sim{Beta}(a,b) $ with parameters a > 0 and b > 0. If ${\hat h^2}$ is an estimate of the variance of the mean and $\hat \tau _{{{\hat h}_2}}^2$ is an estimate of the variance of ${\hat h^2}$, the method of moment estimates for a and b are given as:

(8)

$$\matrix{{\hat a = {{{{\hat h}^2}[{{\hat h}^2}(1 - {{\hat h}^2}) - \hat \tau _{{{\hat h}_2}}^2]} \over {\hat \tau _{{{\hat h}_2}}^2}},} \cr{\hat b = {{(1 - {{\hat h}^2})[{{\hat h}^2}(1 - {{\hat h}^2}) - \hat \tau _{{{\hat h}_2}}^2]} \over {\hat \tau _{{{\hat h}_2}}^2}}.} \cr} $$

A first-order Taylor expansion is used to approximate $\hat \tau _{{{\hat h}_2}}^2$, as shown in Demetrashvili et al. (Reference Demetrashvili, Wit and Van den Heuvel2016). The approximate 100% (1−α) confidence interval on the h ² in (7) is then given by the lower and upper confidence limits as:

(9)

$$\matrix{{{\rm{LC}}{{\rm{L}}_{{{\hat h}^2}}} = B{}_{\hat a,\hat b}^{ - 1}(1 - \alpha /2),} \cr {{\rm{LC}}{{\rm{L}}_{{{\hat h}^2}}} = B{}_{\hat a,\hat b}^{ - 1}(\alpha /2),} \cr} $$

with $B{}_{a,b}^{ - 1}(q)$ being the qth quantile of the beta (a and b) distribution. A detailed description of the beta-approach is given in Demetrashvili et al. (Reference Demetrashvili, Wit and Van den Heuvel2016).

Lifelines Analysis Results

Analysis of the Lifelines data was conducted using R (R Core Team, 2017), version 3.4.0. Mixed-effect models were fitted by applying the lme function of the nlme package using maximum-likelihood. Unlike for the traditional kinship model, we were able to fit all models to thousands of subjects and families using the standard R function. All results below are presented with two-sided 95% confidence intervals.

A sample of the baseline Lifelines cohort was used consisting of 91,759 participants, from which we constructed 32,531 families. The distribution of family sizes is summarized in Table 2. The largest family has 19 members. There are 253 singletons, 18,585 families have two members, and so on. About 99% of all reconstructed families consist of at least two members. Among all participants, 44% were recruited via their general practitioner, 13% via self-registration and the remaining 43% were recruited as family members of the first two groups. The number of declared partners is 56,560, meaning that 62% of all individuals in Lifelines currently have a partner. The number of declared fathers is 18,342 (20%), whereas the number of declared mothers is 26,627 (29%); 34% of all individuals in Lifelines have at least one child. The maximum number of declared children is 7, which occurred once.

Table 2. Counts of Family Sizes for the Original Set of 91,759 Participants and the Remaining 89,353 Participants after Removing Incomplete Records

We omitted 2406 (2.6%) observations for both MCS and PCS analyses. These observations were incomplete with respect to PCS scores, MCS scores or BMI. There were no missing values for sex or age. Finally, 89,353 observations were included in the analysis. The distribution of both outcomes, MCS and PCS, is slightly left-skewed. The median (25th, 75th percentile) MCS and PCS were 53.4 (48.9, 56.4) and 54.6 (50.5, 56.8), respectively. The distribution of family sizes for 89,353 participants is shown in Table 2. About 97% of the 32,452 reconstructed families with complete data consisting of at least two family members. The maximum number of founders is 9.

Descriptive Statistics of Covariates

BMI is calculated by dividing a person’s weight measurement (in kilograms) by the square of their height (in meters) and subsequently categorized into six categories: underweight (<18.5) kg/m², normal weight (18.5−24.9) kg/m², overweight (25.0−29.9) kg/m², class I obese (30.0−34.9) kg/m², class II obese (35.0−39.9) kg/m² and class III obese (>40) kg/m² (World Health Organization, 1995). Out of 89,353 observations, 686 (0.7%) were underweight, 39,277 (44%) were normal, 35,824 (40.1%) were overweight, 10,494 (11.7%) were obese I, 2306 (2.6%) were obese II and 766 (0.9%) were obese III.

All subjects were 18 years and older, with an average age of 45. Out of 89,353 observations, 38,841 (44%) were men. The same proportions of males and females were found when all 91,759 observations were summarized.

Model Selection, Variance Components and Heritability

In the analysis of all models, we used 89,353 observations with 32,452 constructed families. BMI is treated as a categorical variable with six levels. Age and sex are included in all models. Age is treated as a continuous variable. Sex is a categorical variable with male being a reference category. Model M₀ is a multiple regression model with BMI, age and sex. Models M₁, M₂ and M₃ are the variance components models. M₁ models the environmental random component of the family. M₂ models the genetic random components of the founders. M₃ models both the random components of the family and founders.

We used BIC for model selection. BIC consistently selects the true model for large sample sizes (Claeskens & Hjort, Reference Claeskens and Hjort2008) and tends to choose parsimonious models (i.e., models with few explanatory variables). Using the BIC model, M₁ provides the best fit for MCS and M₃ for PCS. Since the best model for MCS is modeling the shared environmental factors, the estimated ICC shown in Table 3 implies that approximately 12–14% variation in MCS is determined by shared environmental variation. Regarding the best model M₃ for PCS, both the shared environmental and the genetic contribution are included; approximately 12–14% variation in PCS is determined by genetic variation and 3–4% by shared environmental variation.

Table 3. Estimates of variance components, ICC and its confidence interval for outcomes MCS and PCS

Note: LCL and UCL stand for lower and upper confidence limits, respectively; best model for outcomes MSC and PCS is in bold; ${{\bf{\widehat c}}^{\bf{2}}}$ shows the proportion of behavioral and shared environmental variance in total phenotype variance and ${{\bf{\widehat h}}^{\bf{2}}}$ shows the proportion of additive genetic variance in total phenotype variance.

Estimation of Fixed Effects

Besides accounting for the familial correlation structure, the aim of the study was to estimate the effect of BMI, age and sex on the HRQoL scores, PCS and MCS. The results of the conditional F tests are presented in Table 4. The degrees of freedom for the tests of significance of slopes are 56,894, which were calculated by subtracting the number of families and the number of parameters of fixed effects from the number of observations (i.e., 89,353−32,452−7). These degrees of freedom are also used in the t test. The results for individual effects are presented in Figures 2 and 3 and in Table 5.

Fig. 2. BMI effects for MCS surrounded by confidence intervals.

Fig. 3. BMI effects for PCS surrounded by confidence intervals.

Table 4. Conditional F Tests for BMI, Age, and Sex of Selected Models

Figures 2 and 3 show the effect sizes of BMI (middle line) surrounded by confidence intervals (outer lines) of these effects. Plain lines are used for the selected models of MCS and PCS. Dashed lines are used for the other three models. Obviously, the effects of BMI on MCS and PCS of HRQoL match very closely across four models (lines overlap) and this is true for all categories of BMI. Confidence intervals also match very closely. This implies that the BMI effects do not change with and without accounting for relatedness in a family.

We see inverted parabolic shape of MCS across increasing BMI categories. A somewhat different shape is observed for PCS across increasing BMI categories. Interestingly, the MCS slightly increases for overweight people compared to the normal category. It also shows a dramatic drop for underweight individuals. The PCS decreases for all categories of BMI compared to the normal category, and the decreasing trend of PCS is increasingly steeper with increasing BMI. The confidence intervals of all BMI categories for PCS are beyond the confidence limits of the normal category, meaning that all BMI categories have substantially different PCSs than the normal category.

In Table 5, we present the coefficients for fixed effects with measures of uncertainty, namely standard error and lower and upper confidence limits. The results show that each additional year of age is associated with a 0.078-unit increase (0.1% of maximum observed value) in MCS and a decrease by about the same amount in PCS, on average, holding BMI and sex constant. Women have lower MCS, by approximately 1.891 units (2.6% of maximum observed value), and lower PCS, by approximately 1.011 units (1.4%), on average, holding BMI and age constant.

Table 5. Estimates of Coefficients for BMI, Age and Sex of Selected Models

Simulation Study: Design and Results

We conducted a simulation study to compare the commonly used variance component model with kinship matrix (Almasy & Blangero, Reference Almasy and Blangero1998) to our suggested model containing a reduced matrix of founders. The kinship model is computationally demanding and therefore we had to simulate relatively small sample sizes. We generated data using the model (10) shown below:

(10)

$$\matrix{{{y_{ij}} = {\bf{x}}_{ij}^T\beta + {u_i} + {k_{ij}}\sigma _f^2 + {\varepsilon _{ij}},} \cr {{\varepsilon _{ij}}\mathop\sim\limits^{iid} N(0,\sigma _R^2),} \cr } $$

where definitions of x_ij, β, u_i, $\sigma _u^2$, $\sigma _R^2$ and ɛ_ij are the same as in models (2), (3) and (4), $\sigma _f^2$ is the variance due to genetics and k_ij is the kinship effect for the jth member of the ith family. k_ij is generated from the multivariate normal distribution with mean zero and variance–covariance matrix Ω. The Ω consists of coefficients of pairwise relationships formed in the following way: in the first degree of relationship (parent–child and siblings), the coefficient of relationship is 1/2, in the second degree of relationship (grandparent/grandchild, half-sibling and avuncular), the coefficient of relationship is 1/4 and similarly for other degrees, as shown in Table 1 of Almasy and Blangero (Reference Almasy and Blangero2010). Technically, Ω is a block-diagonal matrix with coefficient of relationships among family members on the diagonal blocks and zeros (the coefficients of relationship between families) on the off-diagonal blocks.

Simulation parameters were selected from the results of the best-fitted model for Lifelines PCS data (see parameters in PCS: M₃ of Table 5) and equal to: β _underweight = 56.5, β _normal = 57.5, β _overweight = 56.7, β _obese1 = 54.9, β _obese2 = 53.0, β _obese3 = 50.2, β _age = −0.08, β _sex = −1.0, $\sigma _u^2 = 1.5,\,\sigma _f^2 = 6.0,\,\sigma _R^2 = 40.0$. The variable for age was generated from the normal distribution with a mean of 45 and a standard deviation of 14. The variable for sex was generated from Bernoulli distribution with a probability of .44 (for men). The number of observations for BMI categories ‘underweight’, ‘normal’, ‘overweight’, ‘obese 1’, ‘obese 2’, and ‘obese 3’ were generated from the multinomial distribution with probabilities of .007, .44, .401, .117, .026, and .009, respectively. These probabilities were calculated from the Lifelines data. Then the BMI was generated from a normal distribution based on the number of observations from the multinomial distribution and the following mean and standard deviation parameters: 17.7, 0.77 for ‘underweight’; 23.0, 1.5 for ‘normal’; 27.0, 1.4 for ‘overweight’; 32.0, 1.4 for ‘obese 1’; 37.0, 1.4 for ‘obese 2’; 43.0, 3.0 for ‘obese 3’. For family sizes, we set 2, 3, 4, 5 and 6 members, each size repeated 2, 4, 10, 20 and 38 times, respectively. Even though family sizes were repeated, in fact all families were different in their composition (e.g., for a family size of 2, we constructed parent–child and partner–partner families; for a size of 3, we constructed families of parent–parent–child, parent–child–grandchild, parent–child–child and parent–child–partner of child). In total, 74 families were constructed containing 384 numbers of observations. The PCS outcomes were generated using model (10). Afterwards, we fitted both models (kinship and M₃) and compared the biases of heritability estimates and coverage probabilities of 95% confidence intervals of heritabilities. The heritability in model (10) is equivalent to hereditary ICC (h ²) and computed as shown in (7). Confidence intervals for heritability were calculated using the beta-approach, as described in Section 2.5. We conducted 100 simulations. The kinship model was fitted using the lmekin function of the coxme (Therneau, Reference Therneau2018) package and model M₃ using the lme function of the nlme (Pinheiro et al., Reference Pinheiro, Bates, DebRoy and Sarkar2017) package in R. Results of the simulation studies are summarized in Table 6.

Table 6. Comparison of Heritability Parameters and Computational Time between Kinship (lmekin Function) and M₃ (lme Function) Models

^a Heritability is equivalent to hereditary ICC (h2) and computed using formula (7).

Results from setting 3 are shown in Figure 4. The kinship model results in larger bias (0.18) in comparison with the one (0.09) in M₃ model. The M₃ model results in larger variation than the kinship model on average, as shown in Figure 4. Subsequently, the kinship model demonstrates substantial undercoverage (0.86) while the M₃ model shows some overcoverage (0.98) for a two-sided 95% confidence interval.

Fig. 4. Overlapping histograms for comparison of estimated heritabilities across M₃ and kinship models in setting 3: vertical bar (0.13) shows true heritability, light grey histogram (left) and smoothed plain line show distribution of estimated heritabilities in M₃ model and dark grey histogram (right) and smoothed dashed line show distribution of estimated heritabilities in kinship model.

We compared the computational time needed for fitting the kinship and M₃ models. We varied the number of families and, correspondingly, the total number of observations. We examined three settings, each with 100 simulations. Settings and results are shown in Table 6. M₃ is 8 times faster than the kinship model (setting 1). As the number of families quadruples (from setting 1 to 3), the computational time increases 2.5 times for the M₃ model (lme function) and 15.2 times for the kinship model (lmekin function). Thus, there is an exponential increase in computational time for the kinship model as the number of observations increases while there is much slower increase (roughly linear) in computational time for the M₃ model.

Discussion

In this study, the primary goal was to answer whether relatedness in a family must be accounted for when estimating the effects of risk factors of interest in large family-based cohort studies. The answer to this question is of particular importance for researchers analyzing the Lifelines data and data from other large cohort studies of families. From our study, it is clear that the effects of BMI on MCS and PCS of HRQoL scores do not change when accounting for family structure. This conclusion confirms theoretical considerations within longitudinal data analysis given by Diggle et al. (Reference Diggle, Heagerty, Liang and Zeger2002, chapter 1). The authors state that when the focus of the study is on modeling the dependence between the response and explanatory variable, then the nature of correlation among responses is unimportant if there is a large number of families relative to the number of individuals per family. Our Lifelines data clearly satisfy this criterion.

McArdle et al. (Reference McArdle, O’Connell, Pollin, Baumgarten, Shuldiner, Peyser and Mitchell2007) conducted a simulation study with the objective to compare the performance of association analysis of family-based designs that account for and ignore family structure in assessment of the phenotype–genotype association. They concluded that effect size estimates and power are not significantly affected by ignoring family structure, although type 1 error rates increase when family structure is ignored, and the magnitude of the increase depends on trait heritability and pedigree configuration. Induced type 1 error is directly related to diminished standard errors (and narrow confidence intervals), leading to liberal inference about regression coefficients, that is, falsely claiming significance when there is none. In our analysis of both PCS and MCS, we saw that ignoring the correlation (or family structure) led instead to larger standard errors of the regression coefficients (although the increase was very small), thereby leading to conservative inference about the covariate effects. Therefore, the standard errors of regression parameters can be larger or smaller when ignoring family structure, and therefore may lead not only to liberal inference and inflated type I errors, but also to conservative inference and deflated type I errors. Increase or decrease of the standard errors depends on (1) the relationship between the family structure and the covariates of interest and (2) the family structure effect size on the outcome.

We compared the Lifelines analysis results on association between BMI and HRQoL with similar results in the literature. Increased BMI has been shown to be associated with reduced physical HRQoL (Ul-Haq et al., Reference Ul-Haq, Mackay, Fenwick and Pell2012, Reference Ul-Haq, Mackay, Fenwick and Pell2013); however, evidence on the relationship between BMI and mental HRQoL was antagonistic. In the Lifelines study, we see reduced mental HRQoL for all categories of BMI in comparison with the overweight category. This conclusion matches with the conclusion of Ul-Haq et al. (Reference Ul-Haq, Mackay, Fenwick and Pell2013) from meta-analysis study. Furthermore, an inverted U-shape of mental HRQoL across increasing BMI categories is seen in both our Lifelines study (Figure 2) and that of Scottish study conducted by Ul-Haq et al. (Reference Ul-Haq, Mackay, Fenwick and Pell2012). Similarly to Ul-Haq et al. (Reference Ul-Haq, Mackay, Fenwick and Pell2013), we see that increasing BMI is associated with impaired HRQoL in Lifelines. In addition, our study reveals a shape of association between BMI and physical HRQoL. There is an inverted J-shape negative association in PCS across increasing BMI categories (Figure 3).

In this work, we studied real data and did not assume any a priori family inheritance structure. The main strength of our study is the use of Lifelines data which makes it possible to estimate the relatedness in more complex families than other studies can. Consequently, our results are practically more relevant. We learned that with or without accounting for family structure, the effect of determinants on health outcomes does not significantly change. Nevertheless, the ability to incorporate a family structure into our model in a computationally efficient way allows one to disentangle the genetic, shared behavioral and environmental variances. Furthermore, our proposed model allows for estimating hereditary, behavioral and shared environmental, and unique environmental ICCs of both HRQoL outcomes in computationally an efficient way through fractional relatedness of founders and non-founders.

The kinship model as implemented in the SOLAR (Almasy & Blangero, Reference Almasy and Blangero1998) software is unable to handle the analysis of the Lifelines data with over 89,000 individuals, although we did not study exhaustively all methods, such as generalized estimating equations (Liang & Zeger, Reference Liang and Zeger1986) that could have been used to implement the kinship model. We overcame the computational infeasibility with the kinship model by introducing and fitting a variance component model with fractional relatedness using standard functions in R.

We have made the assumption in our study that founders are unrelated, but Lifelines may not have complete information. For example, siblings’ information itself is not recorded within Lifelines, and therefore siblings might be modeled as two founders while they do have a common ancestor and are related. It would be interesting to examine whether modeling just a subset of founders would impact the variance component parameter estimates.

Furthermore, MCS and PCS may be genetically correlated, meaning that there is genetic overlap between these two traits (i.e., the same set of genes may regulate these traits). If interest lies in separation of genetic and environmental contributions simultaneously in MCS and PCS, then bivariate models could be used, similarly to the way that others (Lichtenstein et al., Reference Lichtenstein, Yip, Björk, Pawitan, Cannon, Sullivan and Hultman2009; Yip et al., Reference Yip, Björk, Lichtenstein, Hultman and Pawitan2008) modeled schizophrenia and bipolar disorder using multivariate generalized linear mixed models.

In summary, the proposed model offers to solve the computational issues involved in modeling family structure when thousands of families are analyzed, and subsequently to fit the model accurately using standard functions of R.

Acknowledgments

The authors wish to acknowledge the services of the Lifelines cohort study, the contributing research centers delivering data to Lifelines and all the study participants. The authors declare no conflicts of interest.

Financial support

During the first six months, this work was supported by the University of Groningen, University Medical Center Groningen as part of PhD thesis and later, it was supported by a NWO STAR travel grant from the University of Groningen. For the remaining work, this research received no specific grant from any funding agency, commercial or not-for-profit sectors.

References

Almasy, L., & Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. The American Journal of Human Genetics, 62, 1198–1211.CrossRef Google Scholar PubMed

Almasy, L., & Blangero, J. (2010). Variance component methods for analysis of complex phenotypes. Cold Spring Harbor Protocols, 2010, pdb-top77.CrossRef Google Scholar PubMed

Almgren, P., Bendahl, P., Bengtsson, H., Hössjer, O., & Perfekt, R. (2003). Statistics in genetics (Lecture notes). Lund, Sweden: Lund University.Google Scholar

Baumeister, H., & Härter, M. (2007). Prevalence of mental disorders based on general population surveys. Social Psychiatry and Psychiatric Epidemiology, 42, 537–546.CrossRef Google Scholar PubMed

Berghöfer, A., Pischon, T., Reinhold, T., Apovian, C. M., Sharma, A. M., & Willich, S. N. (2008). Obesity prevalence from a European perspective: A systematic review. BMC Public Health, 8, 200.CrossRef Google Scholar PubMed

Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Crisp, A. H., & McGuiness, B. (1976). Jolly fat — Relation between obesity and psychoneurosis in general population. BMJ, 1, 7–9.CrossRef Google Scholar PubMed

Demetrashvili, N., & Van den Heuvel, E. R. (2015). Confidence intervals for intraclass correlation coefficients in a nonlinear dose-response meta-analysis. Biometrics, 71, 548–555.CrossRef Google Scholar

Demetrashvili, N., Wit, E. C., & Van den Heuvel, E. R. (2016). Confidence intervals for intraclass correlation coefficients in variance components models. Statistical Methods in Medical Research, 25, 2359–2376.CrossRef Google Scholar PubMed

Diggle, P. J., Heagerty, P. J., Liang, K.-Y., & Zeger, S. L. (2002). Analysis of longitudinal data. New York, NY: Oxford University Press.Google Scholar

Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399–433.CrossRef Google Scholar

Flegal, K. M., Carroll, M. D., Ogden, C. L., & Curtin, L. R. (2010). Prevalence and trends in obesity among US adults, 1999–2008. JAMA, 303, 235–241.CrossRef Google Scholar PubMed

Flegal, K. M., Kit, B. K., Orpana, H., & Graubard, B. I. (2013). Association of all-cause mortality with overweight and obesity using standard body mass index categories: A systematic review and meta-analysis. JAMA, 309, 71–82.CrossRef Google Scholar PubMed

Goldney, R. D., Dunn, K. I., Air, T. M., Dal Grande, E., & Taylor, A. W. (2009). Relationships between body mass index, mental health, and suicidal ideation: Population perspective using two methods. Australian and New Zealand Journal of Psychiatry, 43, 652–658.CrossRef Google Scholar PubMed

Hays, R. D., & Morales, L. S. (2001). The Rand-36 measure of health-related quality of life. Annals of Medicine, 33, 350–357.CrossRef Google Scholar PubMed

Liang, K., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.CrossRef Google Scholar

Lichtenstein, P., Yip, B. H., Björk, C., Pawitan, Y., Cannon, T. D., Sullivan, P. F., & Hultman, C. M. (2009). Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: A population-based study. The Lancet, 373, 234–239.CrossRef Google Scholar PubMed

Lush, J. L. (1940). Intra-size correlations or regressions of offspring on dam as a method of estimating heritability of characteristics. Journal of Animal Science, 33, 293–301.Google Scholar

McArdle, P. F., O’Connell, J. R., Pollin, T. I., Baumgarten, M., Shuldiner, A. R., Peyser, P. A., & Mitchell, B. D. (2007). Accounting for relatedness in family based genetic association studies. Human Heredity, 64, 234–242.CrossRef Google Scholar PubMed

Noh, M., Yip, B., Lee, Y., & Pawitan, Y. (2006). Multicomponent variance estimation for binary traits in family-based studies. Genetic Epidemiology, 30, 37–47.CrossRef Google Scholar PubMed

Ohayon, M. M. (2007). Epidemiology of depression and its treatment in the general population. Journal of Psychiatric Research, 41, 207–213.CrossRef Google Scholar PubMed

Palinkas, L. A., Wingard, D. L., & Barrett-Connor, E. (1996). Depressive symptoms in overweight and obese older adults: A test of the ‘jolly fat’ hypothesis. Journal of Psychosomatic Research, 40, 59–66.CrossRef Google Scholar

Pawitan, Y., Reilly, M., Nilsson, E., Cnattingius, S., & Lichtenstein, P. (2004). Estimation of genetic and environmental factors for binary traits using family data. Statistics in Medicine, 23, 449–465.CrossRef Google Scholar PubMed

Petry, N. M., Barry, D., Pietrzak, R. H., & Wagner, J. A. (2008). Overweight and obesity are associated with psychiatric disorders: Results from the national epidemiologic survey on alcohol and related conditions. Psychosomatic Medicine, 70, 288–297.CrossRef Google Scholar PubMed

Pinheiro, J. C., & Bates, D. M. (2009). Mixed-effects models in S and S-PLUS. New York, NY: Springer Science & Business Media.Google Scholar

Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., & R Core Team. (2017). nlme: Linear and nonlinear mixed effects models. Retrieved from https://CRAN.R-project.org/package=nlme Google Scholar

Prospective Studies Collaboration; Whitlock, G., Lewington, S., Sherliker, P., Clarke, R., Emberson, J., … Peto, R. (2009). Body-mass index and cause-specific mortality in 900 000 adults: Collaborative analyses of 57 prospective studies. The Lancet, 373, 1083–1096.Google Scholar PubMed

Rabe-Hesketh, S., Skrondal, A., & Gjessing, H. K. (2008). Biometrical modeling of twin and family data using standard mixed model software. Biometrics, 64, 280–288.CrossRef Google Scholar PubMed

R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar

Scholtens, S., Smidt, N., Swertz, M. A., Bakker, S. J. L., Dotinga, A., Vonk, J. M., … Stolk, R. P. (2015). Cohort profile: LifeLines, a three-generation cohort study and biobank. International Journal of Epidemiology, 44, 1172–1180.CrossRef Google Scholar PubMed

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.CrossRef Google Scholar

Sham, P. (1998). Statistics in human genetics. West Sussex, UK: John Wiley & Sons.Google Scholar

Stolk, R. P., Rosmalen, J. G. M., Postma, D. S., de Boer, R. A., Navis, G., Slaets, J. P. J., … Wolffenbuttel, B. H. R. (2008). Universal risk factors for multifactorial diseases. European Journal of Epidemiology, 23, 67–74.CrossRef Google Scholar PubMed

Therneau, T. M. (2018). coxme: Mixed effects cox models. R package version 2.2-10. https://CRAN.R-project.org/package=coxme.Google Scholar

Ul-Haq, Z., Mackay, D. F., Fenwick, E., & Pell, J. P. (2012). Impact of metabolic comorbidity on the association between body mass index and health-related quality of life: A Scotland-wide cross-sectional study of 5, 608 participants. BMC Public Health, 12, 143.CrossRef Google Scholar PubMed

Ul-Haq, Z., Mackay, D. F., Fenwick, E., & Pell, J. P. (2013). Meta-analysis of the association between body mass index and health-related quality of life among adults, assessed by the sf-36. Obesity, 21, E322–E327.CrossRef Google Scholar PubMed

Ul-Haq, Z., Mackay, D. F., Fenwick, E., & Pell, J. P. (2014). Association between body mass index and mental health among Scottish adult population: A cross-sectional study of 37272 participants. Psychological Medicine, 44, 2231–2240.CrossRef Google Scholar PubMed

Van der Zee, K., & Sanderman, R. (1993). Rand-36. Groningen, the Netherlands: Northern Centre for Health Care Research, University of Groningen.Google Scholar

Van der Zee, K. I., Sanderman, R., Heyink, J. W., & de Haes, H. (1996). Psychometric qualities of the rand 36-item health survey 1.0: A multidimensional measure of general health status. International Journal of Behavioral Medicine, 3, 104–122.Google Scholar

Visscher, P. M., Hill, W. G., & Wray, N. R. (2008). Heritability in the genomics era concepts and misconceptions. Nature Reviews Genetics, 9, 255–266.CrossRef Google Scholar PubMed

Volksgezondheid, Nationaal Kompas. (2012). Gezondheidsmonitor ggd’en, cbs en rivm. The Hague, NL: Centraal Bureau voor de Statistiek.Google Scholar

Wang, Y., Beydoun, M. A., Liang, L., Caballero, B., & Kumanyika, S. K. (2008). Will all Americans become overweight or obese? Estimating the progression and cost of the US obesity epidemic. Obesity, 16, 2323–2330.CrossRef Google Scholar PubMed

Ware, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental summary scales: A user’s manual. Boston, MA: The Health Institute.Google Scholar

World Health Organization (WHO). (1995). Physical status: The use of and interpretation of anthropometry: Report of a WHO expert committee. Technical Report Series, 854, 1–452. Geneva, CH: World Health Organization.Google Scholar

Wright, S. (1920). The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. Proceedings of the National Academy of Sciences of the United States of America, 6, 320–332.CrossRef Google Scholar PubMed

Yip, B. H., Björk, C., Lichtenstein, P., Hultman, C. M., & Pawitan, Y. (2008). Covariance component models for multivariate binary traits in family data analysis. Statistics in Medicine, 27, 1086–1105.CrossRef Google Scholar PubMed

Fig. 1. Family example consisting of 13 members.

Table 1. Algorithm for Construction of a Fractional Relatedness Matrix

Table 2. Counts of Family Sizes for the Original Set of 91,759 Participants and the Remaining 89,353 Participants after Removing Incomplete Records

Table 3. Estimates of variance components, ICC and its confidence interval for outcomes MCS and PCS

Fig. 2. BMI effects for MCS surrounded by confidence intervals.

Fig. 3. BMI effects for PCS surrounded by confidence intervals.

Table 4. Conditional F Tests for BMI, Age, and Sex of Selected Models

Table 5. Estimates of Coefficients for BMI, Age and Sex of Selected Models

Table 6. Comparison of Heritability Parameters and Computational Time between Kinship (lmekin Function) and M3 (lme Function) Models

Fig. 4. Overlapping histograms for comparison of estimated heritabilities across M3 and kinship models in setting 3: vertical bar (0.13) shows true heritability, light grey histogram (left) and smoothed plain line show distribution of estimated heritabilities in M3 model and dark grey histogram (right) and smoothed dashed line show distribution of estimated heritabilities in kinship model.

Article contents

Variance Components Models for Analysis of Big Family Data of Health Outcomes in the Lifelines Cohort Study

Abstract

Keywords

Background on Lifelines

Motivation and Goal

Methods

Family Reconstruction

Outcome Measures

Statistical Models

Inference and Model Selection

Intraclass Correlation Coefficient

Lifelines Analysis Results

Descriptive Statistics of Covariates

Model Selection, Variance Components and Heritability

Estimation of Fixed Effects

Simulation Study: Design and Results

Discussion

Acknowledgments

Financial support

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests