Linkage analysis of LDL cholesterol in American Indian populations: the Strong Heart Family Study

Previous studies have demonstrated that low density lipoprotein cholesterol (LDL-C) concentration is influenced by both genes and environment. Although rare genetic variants associated with Mendelian causes of increased LDL-C are known, only one common genetic variant has been identified, the apolipoprotein E gene ( APOE ). In an attempt to localize quantitative trait loci (QTLs) influencing LDL-C, we conducted a genome-wide linkage scan of LDL-C in participants of the Strong Heart Family Study (SHFS). Nine hundred eighty men and women, age 18 years or older, in 32 extended families at three centers (in Arizona, Oklahoma, and North and South Dakota) were phenotyped for LDL-C concentration and other risk factors. Using a variance component approach and the program SOLAR, and after accounting for the effects of covariates, we detected a QTL influencing LDL-C on chromosome 19, nearest marker D19S888 at 19q13.41 [logarithm of odds (LOD) 5 4.3] in the sample from the Dakotas. This region on chromosome 19 includes many possible candidate genes, including the APOE/C1/C4/C2

Abstract Previous studies have demonstrated that low density lipoprotein cholesterol (LDL-C) concentration is influenced by both genes and environment. Although rare genetic variants associated with Mendelian causes of increased LDL-C are known, only one common genetic variant has been identified, the apolipoprotein E gene (APOE). In an attempt to localize quantitative trait loci (QTLs) influencing LDL-C, we conducted a genome-wide linkage scan of LDL-C in participants of the Strong Heart Family Study (SHFS). Nine hundred eighty men and women, age 18 years or older, in 32 extended families at three centers (in Arizona, Oklahoma, and North and South Dakota) were phenotyped for LDL-C concentration and other risk factors. Using a variance component approach and the program SOLAR, and after accounting for the effects of covariates, we detected a QTL influencing LDL-C on chromosome 19, nearest marker D19S888 at 19q13. 41 [logarithm of odds (LOD) 5 4.3] in the sample from the Dakotas. This region on chromosome 19 includes many possible candidate genes, including the APOE/C1/C4/C2 gene cluster. In follow-up association analyses, no significant evidence for an association was detected with the APOE *«2 and APOE *«4 alleles (P 5 0.76 and P 5 0.53, respectively). Suggestive evidence of linkage to LDL-C was detected on chromosomes 3q, 4q, 7p, 9q, 10p, 14q, and 17q.
These linkage signals overlap positive findings for lipid-related traits and harbor plausible candidate genes for LDL-C. Supplementary key words genetics . lipids . low density lipoprotein Low density lipoprotein cholesterol (LDL-C) is a strong independent predictor of cardiovascular disease, because it is the most important contributor of cholesteryl ester to atherosclerotic plaque (1). It is well established that ge-netic factors contribute to the variability in LDL-C concentration. The structural gene for apolipoprotein B (APOB), the apolipoprotein in LDL-C, and the LDL receptor gene, a key regulator of plasma LDL-C concentrations, have been isolated and sequenced. Four monogenic diseases [familial hypercholesterolemia (2), familial ligand-defective apoB-100 (3), sitosterolemia (4), and autosomal recessive hypercholesterolemia (5)] have been identified that cause hyperlipidemia by impairing the activity of hepatic LDL-C receptors, which function to remove LDL-C from the plasma (6). In addition, various other candidate genes encoding proteins from the lipid metabolism pathway have been associated with variation in lipid levels, yet the magnitude of each polymorphism's effect is generally small in a given population (7).
Strong aggregate genetic effects on LDL-C variability have been recognized across multiple populations (8)(9)(10)(11), although the identification of specific variants has been lacking. One possible exception is the common variant in the structural apolipoprotein E gene (APOE), which has been consistently associated with LDL-C variation across multiple populations but accounts for ,10% of the total phenotypic variance (12,13). Interactions between the APOE polymorphisms and environmental modulators, particularly nutrient intake, have also been indicated (14). Other relevant polymorphisms in the APOE gene promoter region (2219G/T and 1113G/C) have also been evaluated, but the findings from these studies have been inconsistent (15)(16)(17)(18).
Genome-wide linkage scans have been performed to identify regions of the genome harboring genes that regulate cholesterol concentrations, but the identification of quantitative trait loci (QTLs) has been challenging. Bosse et al. (19) recently published a compendium of genome-wide scans of lipid-related phenotypes, which identified 11 genome scans of LDL-C. Two more genome scans of LDL-C have been published recently (20,21), and as is the case with most common complex diseases, several regions of the genome have been implicated, perhaps suggesting locus heterogeneity. To advance our knowledge of the genetic factors influencing LDL-C, more studies are needed because replication of existing QTLs in independent samples is essential. Thus, in an attempt to localize new QTLs influencing LDL-C and/or to provide new evidence in support of previously identified QTLs, we conducted a genome scan of LDL-C in participants of the Strong Heart Family Study (SHFS).

Study population
The Strong Heart Study (SHS) began in 1988 to investigate cardiovascular disease and its risk factors in a geographically diverse group of resident American Indian tribal members aged 45 to 74 years at three study centers in Arizona, Oklahoma, and North and South Dakota. The SHFS, a component of the SHS, was initiated in 1996 with the goal of localizing genes that influence risk factors for clinical and subclinical cardiovascular disease, diabetes, and obesity and their progression over time. In the first phase of the SHFS, completed in September 1999, at least 300 members of 9-12 extended families were recruited and examined in each of the three centers in Arizona, South Dakota, and Oklahoma. In the second phase of the family study, initiated in 2001, an additional 900 family members in each center were recruited and examined, for a total of .3,600 men and women in multiple extended families. This study focuses on participants from the first phase of the family study, for which phenotypic and genotypic data are complete. Detailed descriptions of the SHS and SHFS study design and laboratory protocols have been published (22). The SHS and SHFS protocols were approved by the Indian Health Service Institutional Review Board, by the institutional review boards of the participating institutions, and by the 13 American Indian tribes participating in these studies.

Phenotypic and genotypic data
Phenotypic data. Standard protocols were used for the collection of all data and are described in detail in previous publications (23). During a personal interview, information on the following demographic characteristics was obtained by questionnaire: income, residence, marital status, number of household members, tribal enrollment, degree of Indian heritage, education, and other cultural factors. Participants were asked lifestyle questions, with a focus on smoking, alcohol intake, and physical activity. For questions on alcohol intake, current and ever drinking were defined as having had at least one alcoholic beverage in the last year and/or lifetime, respectively. Smoking was defined as having had at least 100 cigarettes. Physical activity was measured using metabolic equivalents (METs), where 1 MET is equal to a resting oxygen consumption of z3.5 ml/kg/min. A reproductive history was taken that included questions concerning parity, gravidity, menopausal status, and estrogen use.
Fasting blood samples were obtained during the physical examination and assayed at MedStar Research Institute using standard laboratory methods as described previously (23). Triglycerides, total cholesterol, and HDL-C were measured using enzymatic reagents and the Hitachi 717. LDL-C was derived using the Friedewald equation; it was directly measured in subjects with triglyceride values of .400 mg/dl (24). Glucose was measured by the hexokinase method (Glucose-HK; BMD, Inc.) on the Hitachi 717. Type II diabetes, impaired fasting glucose, and normal glucose tolerance status were determined according to American Diabetes Association criteria (25). As described previously (26), APOE phenotype was determined in whole plasma using a modification of the method of Kamboh, Ferrell, and Kottke (27), in which 2 h of isoelectric focusing are followed by immunoblotting to visualize the APOE bands.
Genotypic data. The SHFS genotyping procedures have been documented (V. G. Diego, H. H. H. Göring, S. Cole, L. Almasy, and K. E. North, unpublished data). In brief, DNA was isolated from fasting blood samples using organic solvents and then amplified in separate PCR procedures with primers specific for short tandem repeat markers using the ABI PRISM Linkage Mapping Set-MD10 version 2.5 (Applied Biosystems, Foster City, CA). PCR products were loaded into an ABI PRISM 377 DNA sequencer for laser-based automated genotyping. Analyses and assignment of the marker alleles were done using computerized algorithms (Applied Biosystems).
We used sex-averaged chromosomal maps obtained from the Marshfield Center for Medical Genetics (http://research. marshfieldclinic.org/genetics). All genetic distances are reported in Haldane centimorgans (cM). Pedigree relationships were verified using the PREST (Pedigree Relationship Statistical Tests) package (28), which uses likelihood-based inference statistics for genome-wide identity-by-descent allele sharing. Mendelian inconsistencies and spurious double recombinants were detected using the SimWalk2 package (29). The overall blanking rate for both types of errors was ,1% of the total number of genotypes for Arizona, North and South Dakota, and Oklahoma. We used the web resources of the University of California Santa Cruz (http:// genome/ucsc.edu) and Online Mendelian Inheritance in Man (http://www3.ncbi.nlm.nih.gov/entrez/query.fcgi?db 5 OMIM) to determine the cytogenetic location of markers and to search for candidate genes.

Analytical methods
Quantitative analyses. SAS, version 8.0, was used to screen covariates for statistical significance using backward-and forwardstepwise linear regression by center. SOLAR, version 2.1.2, was used to perform multipoint variance component linkage analysis of LDL-C among 980 SHFS participants (South Dakota 5 333, Oklahoma 5 307, Arizona 5 340) (30). Details of this model have been described previously (30,31).
The use of the variance component approach requires an estimate of the identity-by-descent matrix. We used the Loki package (32), which uses a Markov chain Monte Carlo stochastic procedure to compute the identity-by-descent allele sharing at points throughout the genome conditional on the genotype information available at neighboring points.
Application to SHFS data. Because we recruited extended families, the sample of examined individuals included a large number of relative pair types, including information on 10,380 relative pairs (South Dakota 5 4,260, Oklahoma 5 3,039, Arizona 5 3,081; Table 1). All phenotypic outliers (here defined as any value .4 standard deviations from the mean) were removed before analysis (the number excluded varies by analysis; n , 10). Participants on lipid-lowering medications (13 individuals in total; North and South Dakota 5 10, Oklahoma 5 2, Arizona 5 1) were excluded from the analyses.
To maximize our power to detect genetic effects, we considered two different models of covariate adjustment in each population separately. In model 1, adjustments were made for age, sex, age squared, and age-sex interactions. For model 2, additional covariates of body mass index, smoking status, alcohol consumption, estrogen use, diabetes status, impaired fasting glucose, and physical activity were screened for statistical significance, and all covariates whose effects were significant at the P < 0.10 level were retained in subsequent analyses. We additionally confirmed the significance of model 2 covariates while accounting for family relationships in SOLAR. Residuals were generated for both models (model 1 and model 2) and used in all subsequent genetic analyses. Kurtosis values for LDL-C were ,0.50 for all analyses (33).
To verify the significance of our linkage signals, we calculated LOD scores under the assumption of multivariate normality and under the hypothesis of no linkage, using 10,000 replicates and simulation methods incorporated into SOLAR (30). The empirical distribution of the simulated LOD scores is used to assign percentiles to each replicate, calculate an expected test statistic on the basis of the percentile, and produce a correction constant, which is used to determine an adjusted LOD score and a corresponding P value (34).

Association analysis of APOE polymorphism
An association study of APOE polymorphism and LDL-C concentrations was performed using both model 1 and model 2 covariate adjustment strategies in the North and South Dakota population only, as linkage to 19q was specific to the South Dakota center. A total of 307 participants had complete data on the APOE polymorphism. We assessed deviations from Hardy-Weinberg equilibrium using the standard Chi-square test. A measured genotype approach, incorporating a variance component to account for linkage, was used to test for association.

RESULTS
The descriptive characteristics of the SHFS participants are summarized in Table 2. The average age 6 SD of participants in each center was z41 6 16, 44 6 15, and 41 6 16 years in South Dakota, Oklahoma, and Arizona, respectively. The SHFS participants were more often female, and a high prevalence of cigarette smoking and alcohol consumption was observed. As expected, a high prevalence of diabetes was noted, especially in the Arizona center. LDL-C concentrations varied by center and were highest in the North and South Dakota center (122.06 6 31.98 mg/dl).
In model 1, adjustments were made for age, sex, and their interactions. In model 2, additional adjustments were made for significant covariates (P , 0.10): METs and current and ever smoking status; alcohol consumption and estrogen use; and METs, ever smoking status, and diabetes status in the North and South Dakota, Oklahoma, and Arizona centers, respectively. The statistical significance of all model 2 covariates was confirmed after accounting for the relatedness of family members in SOLAR (P , 0.05). A QTL for LDL-C (adjusted LOD 5 4.3) was detected in the South Dakota population on chromosome 19 at 93 cM, nearest marker D19S888 (19q13.41) (Fig. 1). The 1 LOD unit support interval spanned 15 cM, from 83 to 98 cM (52.565-58.985 Mb p terminus). Table 3 presents the multipoint genome-wide adjusted LOD scores and their locations from the variance component linkage analyses for all peaks >1.8 [suggestive evidence of linkage (35)]. Although no other genome-wide significant evidence of linkage was detected, suggestive  We had data on an interesting candidate gene for the 19q linkage signal, namely the genotypes at the APOE polymorphism, which is ,2 Mb from the peak linkage signal. We tested for association between the APOE polymorphism and LDL-C concentration using the model 1 and model 2 adjustment strategies. As a previous study of this population noted the APOE-sex interaction on circulating lipid levels (36), all analyses incorporated such interaction. The frequencies of the *e2, *e3, and *e4 alleles of the APOE gene were 0.02, 0.90, and 0.08, respectively. No deviation from Hardy-Weinberg equilibrium was detected (P 5 0.29). There were no individuals homozygous for the *e2 allele. For genetic analysis, an additive model was used, and two dummy variables to encode for the number of rare *e2 alleles and *e4 alleles were considered. No significant association between the APOE polymorphism and LDL-C was found (P 5 0.76 and P 5 0.53 for the *e2 and *e4 alleles, respectively), and when forced into the model, ,1% of the residual phenotypic variation of LDL-C was accounted for by the APOE polymorphism.

DISCUSSION
In our previous work, we demonstrated a substantial genetic component of variation in LDL-C (8). The present study was undertaken to identify regions of the genome harboring genes that influence LDL-C. Here, we present evidence for a QTL for LDL-C (adjusted LOD 5 4.3) in the North and South Dakota population on chromosome 19, nearest marker D19S888. Three large families in the South Dakota population contribute substantially to the overall LOD score (consisting of 263 phenotyped individuals comprising 3,884 relative pairs) and include the majority of participants from the South Dakota center. No substantial LOD scores were obtained in the samples from Arizona and Oklahoma in this region (19q), potentially indicating genetic heterogeneity among the different American Indian samples at this locus. Fig. 1. Multipoint adjusted LOD scores on chromosome 19 for LDL cholesterol. Model 1 was adjusted for age, sex, age squared, and age-sex interaction in all centers. Model 2 in the Dakotas (DA) was additionally adjusted for metabolic equivalents (METs), ever smoking, and current smoking. Model 2 in Arizona (AZ) was additionally adjusted for METs, diabetes status, and ever smoking. Model 2 in Oklahoma (OK) was additionally adjusted for current alcohol consumption and current estrogen use. cM, centimorgan. Chromosome 19q, where we observe our strongest linkage evidence, has previously been implicated in six genome scans of lipid-related traits (37)(38)(39)(40)(41)(42). Genome-wide significant LOD scores were reported in this region for LDL-C among participants from the Quebec Family Study (LOD 5 3.6) (40), for LDL-C among adolescent Dutch, Swedish, and Australian twins (LOD 5 5.7) (39), for triglyceride concentrations in large extended families from Utah (LOD 5 3.2) (42), and for apoE in participants from the Rochester Heart Study (LOD 5 4.2) (41). Suggestive linkage has also been noted for LDL-C in Hutterite family members (P 5 0.0001) (39) and for LDL particle size in Mexican-American participants of the San Antonio Family Heart Study (1.9 < LOD < 2.3) (38).
When additionally adjusting LDL-C for current smoking, ever smoking, and METs, the magnitude of the LOD score on chromosome 19q decreased by 1.6 units, yet it still provided suggestive evidence for linkage (adjusted LOD 5 2.7). These reductions are partially related to the individuals who were removed from analyses when including these additional covariates in model 2 (n 5 25 exclusions for missing covariate data). For example, when model 1 is restricted to the same sample of individuals included in model 2, the adjusted LOD score of model 1 decreases from 4.3 to 3.6. Thus, the LOD score of model 2 (2.7) is still reduced by 0.9 LOD compared with the LOD score of model 1 (3.6) obtained when analyzing the exact same sample of individuals with different adjustment strategies. It is possible that the measures of physical activity and smoking status have shared genetic effects with LDL-C. Previous studies have supported a significant genetic influence on smoking and physical activity (43,44), and we have observed significant [h 2 (heritability) current smoking 5 0.16 6 0.11 (P 5 0.023); h 2 ever smoking 5 0.21 6 0.11 (P 5 0.01)] and suggestive [h 2 METs 5 0.10 6 0.09 (P 5 0.06)] heritabilities in this population as well. Moreover, a significant genetic correlation between METs and LDL-C was detected [r G (genetic correlation) 5 0.87 6 0.48; P 5 0.048], which indicates shared genetic effects between these two traits. Thus, to the extent that there are pleiotropic effects of genes on METs and LDL-C, the evidence for linkage will be reduced when adjusting for these covariates. Nonetheless, although the LOD score decreases, we find suggestive evidence for linkage on chromosome 19, which offers strong support for the presence of a lipid-related QTL on chromosome 19 and suggests the robustness of this signal. Moreover, our results show a high degree of overlap with other studies and may indeed provide probable locations for candidate gene follow-up studies.
Approximately 142 known genes underlie the 1 LOD unit decrease support interval (15 cM, 7.5 Mb) of the 19q signal. The most logical candidate genes are located z2 Mb from our highest LOD score, the APOE/C1/C4/C2 gene cluster. ApoE is the major constituent of chylomicrons and VLDL and serves as a ligand for LDL receptors (45). The APOE gene has three common alleles, APOE*e2, *e3, and *e4, that code for three major isoforms that vary in their affinity for receptor binding (13). Thus, carriers of the *e2 allele (which has a dramatically reduced affinity for binding) are less efficient at making and transferring VLDLs and chylomicrons from the blood to the liver and thus are slower to take up postprandial lipoprotein particles compared with *e3 and *e4 carriers. These APOE genetic variants have been associated with LDL variation across multiple populations (12,13) as well as with other lipid-related traits, such as triglyceride concentration (46). In addition, interactions between the APOE polymorphisms and environmental modulators, such as nutrient intake (14,47), sex (48), age (49), smoking behavior (49), alcohol consumption (50), and type 2 diabetes (49), have all been indicated. Moreover, other relevant polymorphisms in the APOE gene promoter region (2219G/T and 1113G/C) have also been evaluated, although the findings from these studies have been inconsistent (15)(16)(17)(18).
In this study, no association of APOE polymorphism with LDL-C was found. This result is contrary to previous work in the SHS, which reported an association of the APOE polymorphism with LDL-C. However, the previous study by Kataoka et al. (37) examined the original, mostly unrelated SHS participants across all three subpopulations (n 5 4,541), of which only a small fraction are included in the SHFS sample (n 5 64). Moreover, the sample of individuals examined is much smaller in our study (n 5 307 with APOE data) because the linkage signal was specific to the South Dakota center only. Regardless of these factors, the APOE polymorphism cannot account for the observed linkage of variation in LDL-C levels to 19q. This suggests that there are other important variants in the APOE gene or in other nearby genes that influence variation in LDL-C. As mentioned above, APOE is located within an exciting cluster of genes (APOE/C1/C4/C2) that are likely relevant to circulating lipid levels.
One such likely candidate is the apoC-I gene. ApoC-I is a constituent of VLDL and HDL and inhibits the binding of VLDL to LDL-related protein (51) as well as the APOEmediated binding of VLDL to the LDL receptor (52), leading to reduced levels of LDL-C (53). Rare APOC1 variants and the common HpaI restriction fragment-length polymorphism promoter polymorphism have been associated with triglyceride concentrations (54,55), although the findings have been inconsistent (56). ApoC-II operates to activate lipoprotein lipase and leads to increased levels of LDL-C (57). APOC2 variants have been associated with total cholesterol and LDL-C levels in Nigerians (58) but not in African-Americans (59). The exact function of apoC-IV is unknown, although it appears to play a role in lipid metabolism (60). The Leu36Pro and Leu96Arg APOC4 variants have been associated with triglyceride concentrations in white women (61). Overall, the association of lipidrelated traits with variants in the APOE/C1/C4/C2 gene cluster has been fairly inconsistent, with the exception of the APOE gene variants. Such inconsistency is possibly related to the close linkage of these genes on chromosome 19q. Strong linkage disequilibrium between APOC1 and APOE variants has been reported in the literature (54,56); thus, the interpretation of findings with single polymorphisms is extremely difficult, given the disparate effects of these polymorphisms in the lipid pathway. Future research should involve resequencing the genes in this cluster to catalog the full extent of genetic variation present in this population and to identify the functional polymorphisms responsible for the observed linkage signal.
We also observed several loci with suggestive linkage to LDL-C, some novel and others supporting positive findings from other studies. Table 4 summarizes the observed linkage signals from this study that are supported by evidence of linkage to lipid-related traits in other populations on 3q, 4q, 7p, 9q, 10p, 14q, and 17q. Although these signals do not meet the genome-wide significance threshold, these results still offer valuable information in that they may identify regions worthy of further study and help to distinguish between true-and false-positives. For example, although we detect suggestive linkage to LDL on chromosome 17q in the Oklahoma center, this signal is robust to adjustment for covariates (Table 4) and has been implicated in at least four other lipid-related genome scans (9,41,62,63).

Conclusions
In conclusion, our findings suggest that one or more genes on chromosome 19q regulate LDL-C concentration. This region harbors the APOE/C1/C4/C2 gene cluster, an excellent candidate for LDL-C variability because of its accepted role in lipid metabolism. Future research should pursue these positional candidate genes to determine whether polymorphisms in these genes are the source of this highly replicated linkage signal.
Our study also provides corroboration of genomic regions that possibly influence interindividual variation in LDL-C concentrations on chromosomes 3q, 4q, 7p, 9q, and 17q and new information about genomic regions on 10p and 14q. The identification and confirmation of QTLs for LDL-C will bring us closer to the identification of the functional genes that influence LDL-C and aid in disentangling the complex causes of cardiovascular disease.
The authors thank the SHFS participants. Without their participation, this project would not have been possible. In addition, the cooperation of the Indian Health Service hospitals and clinics, and the directors of the SHS clinics, Betty Jarvis, Marcia O'Leary, Dr. Tauqeer Ali, and the many collaborators and staff of the SHS, have made this project possible. The authors also thank Dr. Thomas Welty for his contribution to the SHS Dakota center. This research was funded by a cooperative agreement that includes Grants U01 HL-65520, U01 HL-41642, U01 HL-41652, U01 HL-41654, and U01 HL-65521 from the National Heart, Lung, and Blood Institute. Development of SOLAR and the methods implemented in it are supported by U.S. Public Health Service Grant MH-059490 from the National Institutes of Health. The views expressed herein are those of the authors and do not necessarily reflect those of the Indian Health Service. Linkage criteria were as suggested by Rao and Gu (35). Apo, apolipoprotein; CHOL, total serum cholesterol; FH, familial hypercholesterolemia; HDL-C, high density lipoprotein cholesterol; LDL-PPD, low density lipoprotein peak particle diameter; LDL-1, -2, and -3, cholesterol concentration in three LDL size fractions.
a Model 1 was adjusted for age, sex, age squared, and age-sex interaction in all centers. b Model 2 in the Dakotas was additionally adjusted for METs, ever smoking, and current smoking. c Model 2 in Arizona was additionally adjusted for METs, diabetes status, and ever smoking. d Model 2 in Oklahoma was additionally adjusted for current alcohol consumption and current estrogen use.