Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A gene based approach to test genetic association based on an optimally weighted combination of multiple traits

  • Jianjun Zhang,

    Roles Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, University of North Texas, Denton, TX, United States of America

  • Qiuying Sha,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America

  • Guanfu Liu,

    Roles Investigation, Writing – review & editing

    Affiliation School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China

  • Xuexia Wang

    Roles Conceptualization, Methodology, Project administration, Supervision, Writing – review & editing

    Xuexia.Wang@unt.edu

    Affiliation Department of Mathematics, University of North Texas, Denton, TX, United States of America

Abstract

There is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases for which multiple correlated traits are often measured. Joint analysis of multiple traits could increase statistical power by aggregating multiple weak effects. Existing methods for multiple trait association tests usually study each of the multiple traits separately and then combine the univariate test statistics or combine p-values of the univariate tests for identifying disease associated genetic variants. However, ignoring correlation between phenotypes may cause power loss. Additionally, the genetic variants in one gene (including common and rare variants) are often viewed as a whole that affects the underlying disease since the basic functional unit of inheritance is a gene rather than a genetic variant. Thus, results from gene level association tests can be more readily integrated with downstream functional and pathogenic investigation, whereas many existing methods for multiple trait association tests only focus on testing a single common variant rather than a gene. In this article, we propose a statistical method by Testing an Optimally Weighted Combination of Multiple traits (TOW-CM) to test the association between multiple traits and multiple variants in a genomic region (a gene or pathway). We investigate the performance of the proposed method through extensive simulation studies. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. Additionally, we illustrate the usefulness of TOW-CM based on a COPDGene study.

Introduction

Complex diseases are often characterized by many correlated phenotypes which can better reflect their underlying mechanism. For example, hypertension can be characterized by systolic and diastolic blood pressure [1]; metabolic syndrome is evaluated by four component traits: high-density lipoprotein (HDL) cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity, and diastolic blood pressure [2]; and a person’s cognitive ability is usually measured by tests in domains including memory, intelligence, language, executive function, and visual-spatial function [3]. Also, more and more large cohort studies have collected or are collecting a broad array of correlated phenotypes to reveal the genetic components of many complex human diseases. Therefore, by jointly analyzing these correlated traits, we can not only gain more power by aggregating multiple weak effects, but also understand the genetic architecture of the disease of interest [4].

Even though genome-wide association studies (GWASs) have been remarkably successful in identifying genetic variants associated with complex traits and diseases, the majority of the identified genetic variants only explain a small fraction of total heritability [5]. Furthuer, a gene is the basic functional unit of inheritance whereas the GWAS are primarily focused on the paradigm of single common variant. However, most published GWASs only analyzed each individual phenotype separately, although results on related phenotypes may be reported together. Large-scale GWAS of complex traits have consistently demonstrated that, with few exceptions, common variants have moderate-to-small effects. Therefore, it is important to identify appropriate methods that fully utilize information in multivariate phenotypes to detect novel genes in genetic association studies.

In GWAS, several methods have been developed for multivariate phenotypes association analysis [3] to test association between multivariate continuous phenotypes and a single common variant. To our knowledge, current multivariate phenotypes association methods can be roughly classified into two categories: univariate analysis and multivariate analysis. Univariate analysis methods perform an association test for each trait individually and then combine the univariate test statistics or combine the p-values of the univariate tests [69]. Even though such methods are computationally efficient, they neglect the omnipresent correlation between individual phenotypes and may reduce the power compared to multivariate analysis. Multivariate analysis methods jointly analyze more than one phenotype in a unified framework and test for the association between multiple phenotypes and genetic variants. Multivariate analysis methods include multivariate analysis of variance (MANOVA) [10], linear mixed effect models (LMM) [11], and generalized estimating equations (GEE) [12]. Another special approach is to consider reducing the dimension of the multivariate phenotypes by using dimension reduction techniques. The common method for dimensionality reduction is principal component analysis (PCA) [13] which essentially finds the combination of these phenotypes and assumes that the transformed phenotypes are independent. The limitation of this method is that it can not properly account for the variation of phenotypes or genotypes. It is also hard to interpret the meaning of principle components of the multivariate phenotypes, especially in practice.

Recent studies show that complex diseases are caused by both common and rare variants [1420]. Gene-based analysis requires statistical methods that are fundamentally different from association statistics used for testing common variants. It is essential to develop a novel statistical method to test the association between multiple traits and multiple variants (common and/or rare variants). In this article, we develop a statistical method to test the association between multiple traits and genetic variants (rare and/or common) in a genomic region by Testing the association between an Optimally Weighted combination of Multiple traits (TOW-CM) and the genomic region. TOW-CM is based on the score test under a linear model, in which the weighted combination of phenotypes is obtained by maximizing the score test statistic over weights. The weights at which the score test statistic reaches its maximum are called the optimal weights. We also use extensive simulation studies to compare the performance of TOW-CM with MANOVA [10], multi-trait sequence kernel association test MSKAT [21] and minimum p-value [22]. Simulation studies demonstrate that, in all the simulation scenarios, TOW-CM is either the most powerful test or comparable to the most powerful test among the four tests. We also illustrate the usefulness of TOW-CM by analyzing a real COPDGene study.

Methods

We consider a sample with n unrelated individuals. Each individual has K (potentially correlated) traits and has been genotyped at M variants in a considered region (a gene or a pathway). Denote yik as the kth trait value of the ith individual and xim as the genotype score in additive coding of the ith individual at the mth variant. Let Y = (Y1, ⋯, YK) denote the random vector of K traits and X = (X1, ⋯, XM) denote the random variable of the genotype score at M variants for these n individuals where Yk = (y1k, ⋯, ynk)T and Xm = (x1m, ⋯, xnm)T. Consider a linear combination of Y denoted as , where w = (w1, ⋯, wK)T.

We model the relationship between the combination of multiple continuous traits with the M genetic variants in the considered region using the linear model (1) where β0 is the intercept and β = (β1, ⋯, βM)T is the corresponding vector of coefficients. To test the association between the combination of the multiple traits and the M genetic variants is equivalent to test the null hypothesis H0: β = 0 under Eq (1). We use the score test statistic to test H0: β = 0 under Eq (1). Let and then the test statistic is: (2) where U = (PX)′ PYw and . The score test can be rewritten as a function of w: (3) where P = P′ and PP′ = P. We propose to maximize S(w) to get the optimal weight and then define the statistic to evaluate the association between the optimally weighted combination of the target traits and test genetic variants.

When D = YPY is positive definite, maximizing S(w) is equivalent to maximizing (4) where L is the lower triangular matrix obtained from the Cholesky decomposition of D = LLT. However, the matrix of D is usually not full rank because of existing correlation between multiple traits. If the matrix D is semi-positive define matrix, we introduce a ridge parameter λ0, for which we suggest the choice , where n is the number of individuals in the testing data, and modify the adjustment to mitigate the effect of the non-positive matrix D in order to avoid the instability: D = YPY + λ0I. Let C = L−1 YPX(XPX)−1 XPYLT and c be the eigenvector corresponding to the largest eigenvalue of the matrix C, then S(w) is maximized when L′(w) equals c. Hence Eq (4) is maximized when wo = LT c. In a special case, if all the traits we consider are independent and M = 1, we can get an analytical weight referred to [22]: (5) for the kth phenotype, k = 1, 2, 3, …, K. The Eq (5) is equivalent to where the numerator is the correlation coefficient between the kth phenotype Yk and the genotypic variant X and the denominator can be viewed as the variance of the kth phenotype Yk. It means that wk has same direction with the correlation between the phenotype Yk and the genotypic variant X, and puts big weight to the kth trait when it has strong association with the genotypic variant and/or it has low variance.

We define the statistic to test an optimally weighted combination of multiple traits (TOW-CM), , as (6)

We use permutation methods to evaluate P-values of T. The TOW-CM method can also be extended to incorporate covariates. Suppose that there are p covariates. Let zil denote lth covariate of the ith individual. We adjust both trait value yik and genotypic score xim for the covariates by applying linear regressions. That is,

Let and denote the residuals of yik and xim, respectively. We incorporate the covariate effects in TOW-CM by replacing yik and xim in Eq (6) by and . With covariates, the statistic of TOW-CM is defined as:

Comparison of tests

We compared the performance of our method (TOW-CM) with the following methods: 1) Multivariate Analysis of Variance (MANOVA) [10]; 2) Multi-trait Sequence Kernel Association Test (MSKAT) [21]; 3) Minimum p-value based on the p-values of the individual trait TOW [22] (denoted as minP).

Simulation

In simulation studies, we use the empirical Mini-Exome genotype data including genotypes of 697 unrelated individuals on 3205 genes obtained from Genetic Analysis Workshop 17 (GAW17). Two differen type of variants (Common variants: minor allele frequency (MAF)>0.05 and Rare variants: MAF<0.05) are chosen from a super gene (Sgene) including four genes: ELAVL4 (gene1), MSH4 (gene2), PDE4B (gene3), and ADAMTS4 (gene4). The pattern of the allele frequency distribution of the Sgene is similar as the 3205 genes’ [22]. In our simulation studies, we generate genotypes based on the genotypes of 697 individuals in these four genes. The genotypes are extracted from the sequence alignment files provided by the 1,000 Genomes Project for their pilot3 study (http://www.1000genomes.org). To generate the genotype of an individual, we generate two haplotypes according to the haplotype frequencies.

We test K = 4 related traits with a compound-symmetry correlation matrix and consider two covariates: a standard normal covariate z1 and a binary covariate z2 with P(z2 = 1) = 0.5. We generate trait values based on genotypes by using the following models: where ϵ = (ϵ1, ϵ2, ϵ3, ϵ4) is zero-mean normal with variances 1 and correlation ρ. We set the magnitude of correlation |ρ| to 0.2, 0.5, and 0.8, and the signs of symmetric location of covariate matrix are randomly chosen from (-1,1). η = (η1, η2, η3, η4) are contributions from a set of genotypic variants, which are simulated as follows.

For type I error, phenotypes are generated under the null model i.e. η = 0. To evaluate power, we randomly choose one common variant and nc (20%) rare variants as casual variants. We assume that all the nc rare causal variants have the same heritability and the heritability of the common causal variant is twice of the heritability of rare causal variants. That is, we model the genotypic variants’ contribution to disease risk as where xc and xj denote the common variant and rare variant, respectively. βc and βkj represent the corresponding effect size. Let h and hk denote the heritability of all the causal variants for all the K traits and for the kth trait, respectively. We generate K random numbers t1, ⋯, tK from a uniform distribution between 0 and 1, and the heritability of kth trait denotes . For the kth trait, we assign the effect size of common variants (7) and the magnitude of the effect of rare variants (8) where R denotes the ratio of the heritability of rare causal variants to the heritability of the common causal variant.

For power comparisons, we conducted simulations under the four scenarios: each time only the first L traits are associated with the set of variants, L = 1, 2, 3, 4, respectively. Intuitively, in the first scenario (L = 1), when only the first trait is associated with the variants set, the minP method (it equals to test the first trait alone) may have good performance. However, we will show that by simultaneously testing correlated null traits, our proposed method (TOW-CM) could actually improve the detection power compared to test the first trait alone. When there are multiple correlated traits that are associated with the rare variants set, the proposed TOW-CM would offer vastly improved detection power than the minimum p-value based approach. In each scenario, we also consider different percentage of risk variants for rare variants.

Simulation results

Table 1 summarizes the estimated type I error rates of our method TOW-CM with other three comparable methods under different significance levels and different magnitude of trait correlation |ρ|. The type I error rates are evaluated using 10000 replicated samples and the P-values are estimated using 10000 permutations for TOW-CM and minP. For the 10000 replicated samples, the 95% confidence intervals (CIs) for the estimated type I error rates of nominal levels 0.05, 0.01, and 0.001 are (0.046, 0.054), (0.008, 0.012), and (0.0004, 0.0016), respectively. From this table, we can see that all of the estimated type I error rates are either within 95% CIs or close to the bound of the corresponding 95% CIs, which indicate that the type I error rates of all methods are controlled under all considered scenarios.

thumbnail
Table 1. The estimated type I error rates for TOW-CM, minP, MANOVA and MSKAT.

https://doi.org/10.1371/journal.pone.0220914.t001

In power comparisons, the P-values of TOW-CM, minP are calculated using 1000 permutations, while the P-values of MANOVA and MSKAT are calculated by asymptotic distributions. The powers of all the four tests are evaluated using 1000 replicated samples at a nominal significance level of 0.05. Figs 16 present the power under significance level 0.05 for L = 4, 3, 2, 1 respectively.

thumbnail
Fig 1. Power comparison of four tests as a function of heritability for four continuous traits with the magnitude of correlation at 0.2, 0.5 and 0.8, respectively.

All four traits are associated with the gene for the left panel and only the first three traits are associated with the gene for the right panel. Sample size is 1,000 and 20% of rare variants are causal. All causal variants are risk variants. The powers are evaluated at a significance level of 0.05.

https://doi.org/10.1371/journal.pone.0220914.g001

thumbnail
Fig 2. Power comparison of four tests as a function of heritability for four continuous traits with the magnitude of correlation at 0.2, 0.5 and 0.8, respectively.

All four traits are associated with the gene for the left panel and only the first three traits are associated with the gene for the right panel. Sample size is 1,000 and 20% of rare variants are causal variants among which 90% of causal variants are risk variants and 10% of causal variants are protective variants. The powers are evaluated at a significance level of 0.05.

https://doi.org/10.1371/journal.pone.0220914.g002

thumbnail
Fig 3. Power comparison of four tests as a function of heritability for four continuous traits with the magnitude of correlation at 0.2, 0.5 and 0.8, respectively.

All four traits are associated with the gene for the left panel and only the first three traits are associated with the gene for the right panel. Sample size is 1,000 and 20% of rare variants are causal among which 80% of causal variants are risk variants and 20% of causal variants are protective variants. The powers are evaluated at a significance level of 0.05.

https://doi.org/10.1371/journal.pone.0220914.g003

thumbnail
Fig 4. Power comparison of four tests as a function of heritability for four continuous traits with the magnitude of correlation at 0.2, 0.5 and 0.8, respectively.

Only the first two traits are associated with the gene for left panel and only the first traits are associated with the gene for right panel. Sample size is 1,000 and 20% of rare variants are causal variants. All causal are risk variants. The powers are evaluated at a significance level of 0.05.

https://doi.org/10.1371/journal.pone.0220914.g004

thumbnail
Fig 5. Power comparison of four tests as a function of heritability for four continuous traits with the magnitude of correlation at 0.2, 0.5 and 0.8, respectively.

Only the first two traits are associated with the gene for left panel and only the first traits are associated with the gene for right panel. Sample size is 1,000 and 20% of rare variants are causal. 90% of causal are risk variants and 10% of causal are protective variants. The powers are evaluated at a significance level of 0.05.

https://doi.org/10.1371/journal.pone.0220914.g005

thumbnail
Fig 6. Power comparison of four tests as a function of heritability for four continuous traits with the magnitude of correlation at 0.2, 0.5 and 0.8, respectively.

Only the first two traits are associated with the gene for left panel and only the first traits are associated with the gene for right panel. Sample size is 1,000 and 20% of rare variants are causal among which 80% of causal variants are risk variants and 20% of causal variants are protective variants. The powers are evaluated at a significance level of 0.05.

https://doi.org/10.1371/journal.pone.0220914.g006

These figures show the power comparisons of the four tests (TOW-CM, MANOVA, MSKAT and minP). Power is a function of the total heritability based on three cases (all causal are risk variants, 90% causal are risk variants, and 80% causal are risk variants) for each specific scenario L. These figures show that TOW-CM is consistently the most powerful test among the four tests, and MANOVA is the second most powerful test when genotypes of genetic variants have impact on more than 1 traits. MSKAT is consistently less powerful than the other two multivariate tests (TOW-CM and MANOVA) most likely because there are only 8% variants with MAF in the range of (0.01,0.035) in Sgene which the simulations are based on. Similar to SKAT, MSKAT will lose power when the MAF of causal variants are not in the range (0.01,0.035) [23]. The minP method is consistently less powerful than TOW-CM and MANOVA because they ignore the traits’ dependence by directly using minimum of the P-values of testing the four single traits. Overall, we can see that they suffer power loss when the correlations among traits increase.

An interesting scenario is one in which only the first trait is associated with the variants set and all the others are null traits (L = 1). Stephens [24] and Wu et al. [25] have reported that joint testing by incorporating a correlated null trait could improve the power for testing association of a common variant. When only the first trait is associated with the variants set, minP is either the most powerful test or has similar power to the most powerful test especially in the case of both causal variants under weak traits correlation (|ρ| = 0.2). The TOW-CM and MANOVA statistic could benefit from increased traits correlations, and offer vastly improved power by incorporating strongly correlated null traits. Thus, our results verify the conclusion of [24] and [25].

Overall, we can see that the proposed TOW-CM is an attractive approach that provides good power in most of the scenarios.

Application to the COPDGene

Chronic obstructive pulmonary disease (COPD) is one of the most common lung diseases characterized by long term poor airflow and is a major public health problem [26]. The COPDGene Study is a multi-center genetic and epidemiologic investigation dedicated to studying COPD [27]. Participants in the COPDGene Study gave consent for the use of data collected during the study in downstream analyses. This study is sufficiently large and appropriately designed for analysis of COPD. In this study, we consider more than 5000 non-Hispanic Whites (NHW) participants where the participants have completed a detailed protocol, including questionnaires, pre- and post-bronchodilator spirometry, high-resolution CT scanning of the chest, exercise capacity (assessed by six-minute walk distance), and blood samples for genotyping. The participants were genotyped using the Illumina OmniExpress platform. The genotype data have gone through standard quality-control procedures for genome-wide association analysis detailed at http://www.copdgene.org/sites/default/files/GWAS_QC_Methodology_20121115.pdf.

Based on the literature studies of COPD [28, 29], we selected 7 key quantitative COPD-related phenotypes, including FEV1 (% predicted FEV1), Emphysema (Emph), Emphysema Distribution (EmphDist), Gas Trapping (GasTrap), Airway Wall Area (Pi10), Exacerbation frequency (ExacerFreq), Six-minute walk distance (6MWD), and 4 covariates, including BMI, Age, Pack-Years (PackYear) and Sex. EmphDist is the ratio of emphysema at -950 HU in the upper 1/3 of lung fields compared to the lower 1/3 of lung fields where we did a log transformation on EmphDist in the following analysis, referred to [28]. In the analysis, participants with missing data in any of these phenotypes were excluded.

To evaluate the performance of our proposed method on a real data set, we applied all of the 4 methods (TOW-CM, MANOVA, MSKAT and minP) to the COPD associated genes or genes containing significant single-nucleotide polymorphisms (SNPs) in NHW population with COPD-related phenotypes [30]. In the analysis, we first removed the missing data in any genotypic variants and then adjusted each of the 7 phenotypes for the 4 covariates using linear models. In the analysis, participants with missing data in any of the 11 variables were excluded. Therefore, a complete set of 5,430 individuals across 50 genes were used in the following analyses. In order to compare these methods, we adopted the commonly used 107 permutations for TOW-CM and minP methods. For this verification study, we use 0.05 as the significance level for MANOVA, MSKAT and TOW-CM methods and use Bonferroni corrected significance level 0.05/7 = 7.14 × 10−3 for minP methods since this method perform association tests across each trait, respectively. The results are summarized in Table 2. From Table 2, we can see that TOW-CM identified 14 genes, minP identified 14 genes, MANOVA identified 12 genes and MSKAT identified 4 genes. Among these four methods, TOW-CM identified the most significant genes where all of these 14 genes had previously been reported to be in association with COPD by eligible studies [7, 30], among which 5 genes (LOC105377462,CHRNA3, CHRNA5,HYKK,IREB2) are statistically significant if we use a more stringent cut-off 1.00 × 10−3 for a multiple testing issue with 50 genes in total. Because the MAFs of most variants are not in the range of (0.01,0.035) which is a range favoring MSKAT, MSKAT performs worse than the other three comparable methods (Yang et al. 2017). TOW-CM and minP perform better than MANOVA, which is perhaps because only a proportion of phenotypes are associated with COPD. The method minP missed some genes in comparision to our method TOW-CM, it may because the method minP ignores the correlation between these seven phenotypes.

thumbnail
Table 2. The p-values of significant genes in the genetic association analysis for COPD using these four different methods.

https://doi.org/10.1371/journal.pone.0220914.t002

Discussion

GWAS have identified many variants with each variant affecting multiple phenotypes, which suggests that pleiotropic effects on human complex phenotypes may be widespread. Also, recent studies have shown that complex diseases are caused by both common and rare variants [14, 16, 19]. Therefore, statistical methods that can jointly analyze multiple phenotypes for common or/and rare variants have advantages over analyzing each phenotype individually or only considering for common variants (GWAS). In this article, we propose TOW-CM method to perform multivariate analysis for multiple phenotypes in association studies based on the following reasons: (1) complex diseases are usually measured by multiple correlated phenotypes in genetic association studies; (2) there is increasing evidence showing that studying multiple correlated phenotypes jointly may increase power for detecting disease associated genetic variants, and (3) there is a shortage of gene-based approaches for multiple traits. Simulation results show that TOW-CM has correct type I error rates and is consistently more powerful in comparision to the other three tests. The real data analysis results show that TOW-CM has excellent performance in identifying genes associated with complex disease with multiple correlated phenotypes such as COPD.

One disadvantage of TOW-CM is that the test statistic does not have an asymptotic distribution and a permutation procedure is needed to estimate its P-value, which is time consuming compared to the methods whose test statistics have asymptotic distributions. To save time when applying TOW-CM to genetic association studies, we can use the “step-up” procedure [31] to determine the number of permutations, which can show evidence of association based on a small number of permutations first (e.g. 1,000) and then a large number of permutations are used to test the selected potentially significant genes. Specifically, for the analysis of real data, the computation time of p-value estimation of TOW-CM for all genes was about three days using our R program on 50 Dell PowerEdge C6320 servers. Each server has two 2.4GHz Intel Xeon E5-2680 v4 fourteen-core processors and 600 MB average memory. We also uploaded the R program on GitHub, https://github.com/Jianjun-CN/TOW-CM/blob/master/R%20Code Furthermore, TOW-CM method can not only be used for gene-based association studies, but also can be extended to transcriptome-wide association study (TWAS), which needs further investigations.

Acknowledgments

The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org).

This research used data generated by the COPDGene study, which was supported by NIH grants U01HL089856 and U01HL089897. The COPDGene project is also supported by the COPD Foundation through contributions made by an Industry Advisory Board comprised of Pfizer, AstraZeneca, Boehringer Ingelheim, Novartis, and Sunovion.

A superior high-performance computing infrastructure at University of North Texas, was used in obtaining results presented in this publication.

References

  1. 1. Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, et al. Genome-wide association study identifies eight loci associated with blood pressure. Nature Genetics. 2009;41(6):666. pmid:19430483
  2. 2. Zabaneh D, Balding DJ. A genome-wide association study of the metabolic syndrome in Indian Asian men. PloS One. 2010;5(8):e11961. pmid:20694148
  3. 3. Yang Q, Wang Y. Methods for analyzing multivariate phenotypes in genetic association studies. Journal of Probability and Statistics. 2012. pmid:24748889
  4. 4. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics. 2013;14(7):483. pmid:23752797
  5. 5. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747. pmid:19812666
  6. 6. Kim J, Bai Y, Pan W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genetic Epidemiology. 2015;39(8):651–663. pmid:26493956
  7. 7. Liang X, Wang Z, Sha Q, Zhang S. An Adaptive Fisher’s Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies. Scientific Reports. 2016;6,34323. pmid:27694844
  8. 8. O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984:1079–1087. pmid:6534410
  9. 9. Yang Q, Wu H, Guo C Y., Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genetic Epidemiology. 2010;34(5):444–454. pmid:20583287
  10. 10. Cole DA, Maxwell SE, Arvey R, Salas E. How the power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables. Psychological Bulletin. 1994;115(3),465.
  11. 11. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982:963–974. pmid:7168798
  12. 12. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22.
  13. 13. Ott J, Rabinowitz D. A principal-components approach based on heritability for combining phenotype information. Human heredity. 1999;49(2):106–111. pmid:10077732
  14. 14. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nature Genetics. 2008;40(6),695. pmid:18509313
  15. 15. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nature Genetics. 2010;42(4),348. pmid:20208533
  16. 16. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases?. The American Journal of Human Genetics. 2001;69(1):124–137. pmid:11404818
  17. 17. Pritchard J. K, Cox N J. The allelic architecture of human disease genes: common disease–common variant or not?. Human molecular genetics. 2002;11(20):2417–2423. pmid:12351577
  18. 18. Stratton MR, Rahman N. The emerging landscape of breast cancer susceptibility. Nature Genetics. 2008;40(1),17. pmid:18163131
  19. 19. Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Human Molecular Genetics. 2010;19(R2):R145–R151. pmid:20705737
  20. 20. Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell. 2007;11(2):103–105. pmid:17292821
  21. 21. Wu B, Pankow JS. Sequence kernel association test of multiple continuous phenotypes. Genetic Epidemiology. 2016;40(2):91–100. pmid:26782911
  22. 22. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genetic Epidemiology. 2012;36(6):561–571. pmid:22714994
  23. 23. Yang X, Wang S, Zhang S, Sha Q. Detecting association of rare and common variants based on cross-validation prediction error. Genetic Epidemiology. 2017;41(3):233–243. pmid:28176359
  24. 24. Stephens M. A unified framework for association analysis with multiple related phenotypes. PloS One. 2013;8(7),e65245. pmid:23861737
  25. 25. Wu B, Pankow JS. Statistical methods for association tests of multiple continuous traits in genome-wide association studies. Annals of Human Genetics. 2015;79(4):282–293. pmid:25857693
  26. 26. Murphy TF, Sethi S. Chronic obstructive pulmonary disease. Aging. 2002;19(10):761–775.
  27. 27. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD: Journal of Chronic Obstructive Pulmonary Disease. 2011;7(1):32–43.
  28. 28. Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, Laird N, et al. Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC Systems Biology. 2014;8(1):78. pmid:24964944
  29. 29. Han MK, Kazerooni EA, Lynch DA, Liu LX, Murray S, Curtis JL, et al. Chronic obstructive pulmonary disease exacerbations in the COPDGene study: associated radiologic phenotypes. Radiology. 2011;26(1):274–282.
  30. 30. Berndt A, Leme AS, Shapiro SD. Emerging genetics of COPD. EMBO Molecular Medicine. 2012;4(11):1144–1155. pmid:23090857
  31. 31. Pan W, Kim J, Zhang Y, Shen X, Wei P. A powerful and adaptive association test for rare variants. Genetics. 2014,genetics–114.