Genetic determinants of infectious disease susceptibility

1. Introduction Beginning last century, modernization and improved health care have brought a reduction in infectious diseases mortality mostly seen in developed countries, but the burden of infectious diseases in developing countries, such as Indonesia, continues to remain high[14]. Surprisingly, in recent years, there has been a resurgence in TB incidences[15], even among the developed nations, and this phenomena has sparked renewed interest in epidemiological and other studies of infectious diseases. The World Health Organization (WHO) and other organizations are reaching out to developing countries, where infectious diseases are endemic[16, 17]. Despite the massive efforts made in introducing drugs and vaccines for treatment and prevention, many common infectious diseases, such as TB and hepatitis B, have yet to be brought under sufficient control, and the spread continues [15, 18-21]. Surveillance of infections revealed that certain individuals could be exposed to large doses of infectious agent, but were recalcitrant to infection[22]. Furthermore, even for those infected, a large percentage of these individuals have the immunity to naturally clear the infection without disease symptoms[6]. This suggests that host defense is often an effective means of controlling infection. Heritability studies also indicated a strong genetic component in determining the variable degrees of immune response to infection, and disease susceptibility among individuals[1, 2]. The burden of infectious diseases has been tremendous throughout the history of mankind, not only in economic terms, but especially as a major selective pressure on our genome [23, 24]. A diverse immunological response to a wide range of infectious agents translates into an evolutionary advantage, thus it is not surprising to find genes involving immunity as the most abundant and diverse in the human genome[25]. With their multiple adverse social, economical, and evolutionary effects, there is indeed motivation to study host genetics in controlling immune response to infectious diseases[5-7]. Global initiatives, such as the Human Genome Project, the International HapMap Consortium and the recent 1000 Genomes Project, have cataloged most of the genetic variations that are common in humans[26-30]. These efforts have spurred the advent of high throughput genotyping technology for conducting genetic association studies in a cost effective and efficient manner[31]. This essential advancement in genomics has helped to materialize the main objective of the current PhD project, which is to identify genetic variation that influences susceptibility to infection or the variable outcomes to infection. Two genetic mapping studies are used to achieve the aim: the first is a casecontrol study on pulmonary TB susceptibility, and the latter, a post vaccination antibody titer response against hepatitis B virus infection. The study populations for both studies were drawn from Indonesia, where these diseases are endemic[16, 20] (Figures. and 2), and there is significant interest in elucidating the genomic mechanisms of disease control. Figure 1(A): Annual number of new reported TB cases. [16] Figure 1(B)[16] Figure2: Worldwide distribution of chronic HBV infection and annual incidence of primary hepatocellular carcinoma[20]. 2. Literature review 2.1 Human genome Our human genome is a billion base pairs long sequence of deoxyribonucleic acid (DNA) that is packed as sets of 23 chromosomes in a single cell. Only about percent of this sequence contains genetic code for genes that function in biological processes. Despite humans being highly complex organisms, it is fascinating that we require less than 30,000 genes to function, a number, which is similar to those of simpler organisms. In addition, although there is a surprisingly high percentage of similarity in our genome between two individuals (~99%), there are vast differences in many phenotypic aspects, from the outward appearance to the internal responses to environmental assaults, such as pathogen infections. Contributing to these differences are, interestingly, small DNA variations. They comprise of the remaining small percentage of differences, which renders each of us unique, and are important in determining our response to diseases and well-being. 2.2 Genes and immunity Over the course of human evolutionary and migratory history, selection and adaptation to environmental challenges has shaped our genome. Natural selection has influenced our modern population’s collection of genetic variations, and may instill a population specific genetic response to environmental triggers[15, 24, 32]. Since the beginning of our species, we are constantly confronted with massive amounts of microbes, and the burden of infectious diseases can be tremendous. Therefore, it is not surprising that current studies on natural selection in humans has highlighted genes related to host defense and immunity as among the most strongly selected genes[23, 24]. This implies that immunity and host defense in individuals today are highly influenced by the historical experience and genetic responses of our ancestors to infections, and the consequential selection and thus inherited form of our current gene pool[25, 33]. 2.3 Types of genetic variation Genetic variants are primarily ancestral mutations that arise many generations back, which were successfully passed on to offspring, and thus occur in increased frequency in the current population. When its alleles reach a frequency of more than percent in the population, these variations are termed polymorphisms. Although many of these variants may confer functional changes to protein, most of them act as markers for mapping specific genomic loci of interest, and serve to aid the identification of differences between individuals. Restriction fragment length polymorphisms (RFLPs) RFLPs were among the earliest type of DNA variants/polymorphisms utilized for genetic studies. It is characterized as an alteration to its electrophoresis gel pattern. When there is a base change (variant form) in the DNA, it renders restriction endonuclease unable to recognize and cut the specific target sequence, and hence produces fragments of different lengths, which were identified by gel electrophoresis followed by Southern blotting. Although a useful tool in the earlier days, RFLPs had many drawbacks due to their relative scarcity in the genome, and the fact that tedious and time-consuming laboratory steps were difficult to automate. Microsatellites Microsatellites are composed of multiple tandem repeats of a short DNA sequence motif, in which differences in the number of short sequence repeats differentiates between alleles. Unlike SNPs, they have an extremely high mutation rate, giving rise to their high variability, thus rendering them a highly informative and popular choice for linkage studies, especially in the 1990s. Owing to their highly polymorphic nature, they are not ideal for population based association studies, which have now become the mainstay of genetic epidemiological study design. In addition, they have the disadvantage of being less amenable to cost efficient high throughput genotyping technology, and their finite number in the genome limits the density of microsatellite-based genetic maps. Single nucleotide polymorphism (SNP) Single nucleotide polymorphisms (SNPs) are variants in the form of single base substitutions; in which each of its alleles has a population frequency of at least 1%. SNPs are the most abundant and well-studied form of genetic variation in our genome. The completion of the Human Genome Project discovered at least 1.8 million SNPs, and had attempted to map the physical positions of these SNPs to their specific genomic locations, which are found to be widespread throughout the genome[26]. Most SNPs are located in introns or between genes, and are therefore non-coding, while those that are in proteincoding region could be either non-synonymous or synonymous, depending on whether there is an amino acid change involved. Nevertheless, all classes of SNPs could implicate a change in phenotype, since intronic and intergenic SNPs may also affect regulation and expression of genes. To date, there are more than 10 million common SNPs in humans that have been cataloged in databases, of which many were independently validated (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi). This is a treasure trove of information, where the data for genetic studies can be mined and utilized. In addition, SNPs are usually bi-allelic and relatively stable due to its low mutation rate, which makes it easy to genotype and decipher, and is thus perfectly amenable for high-throughput technology[34]. Therefore, it is the variant of choice for the popular population-based association studies in recent years. 2.4 Infection and host defense Harmful microorganisms, such as pathogenic bacteria and parasitic viruses, which we encounter constantly in our environment, cause infectious diseases. The intrusive attack of such infectious agents typically triggers a natural cascade of tightly regulated immune responses to control the infection, which often resulted in a successful attempt to eradicate the pathogens. In natural infections, there are two basic lines of defense – innate and adaptive immunity. Although it is usually successful in controlling infections, our immune system is complex and multi-factorial, which makes it susceptible to persistent pathogens, thus failing in its protective role and allowing the onset of disease. In addition, the variability in our genome could possibly cause variation in the immune responses of different individuals. Although many may be exposed to the same infectious agents in the environment, only certain individuals suffer the onset of the disease from the infection[22, 35]. This is proven true when heritability studies on twins, and other familial aggregation and segregation studies indicated a genetic component in contributing to the variable immunity between individuals [3, 4, 36, 37]. This suggests a potential area of study where the identification of genetic variation in genes that may influence immune responses can be made to further understand the relationship between pathogenic exposure and actual infections[2, 4, 6, 35, 38]. 2.5 Human genetic traits 2.5.1 Simple monogenic traits Mendelian traits are controlled by a single locus and show a simple Mendelian inheritance pattern. We have been relatively successful in identifying the genetic culprits for this kind of monogenic traits, which they tend to be rare single mutations that often display severe phenotype early in life, and therefore are infrequently found in the vast population[39]. However, many common diseases that most people in the general population suffer from are far more complex. Even though common traits like diabetes may still have a tendency of correlating in families, they not follow simple inheritance patterns, and hence encompasses a different spectrum of genetic architecture[40]. 2.5.2 Multi-factorial common complex traits Most common complex traits have been shown to result from the complex interplay of multiple genes with environmental factors [41]. Even though each factor contributes only subtly, they are able to effect phenotypic changes if allowed to sufficiently accumulate to tip the balance. Evidence from archeology and population genetics suggested that our current population size of more than billion people is the result of a fairly rapid expansion over just the last 100,000 years or less from a relatively small 10 addition, we have also adopted robust quality control measures for the process, for example the DNA samples of different phenotype status were randomly assigned to each DNA plate, so as to prevent possible genotyping errors (if there is any) from being differential between comparison groups. Strict standard operating procedures are also adhered during the assays, thus ensuring data are generated with consistent quality. For good laboratory practice, appropriate control measures are also included to allow assessments on contamination as well as for validation through concordance checks within and between assays. Information on phenotype status is also blinded, thus preventing any bias in the ascertainment of genotype calls. Furthermore, as detailed in the methods sections, the produced raw genotype data is also subjected to strict quality control filtering, before a dataset is being accepted for the actual association analysis. Despite all data are invaluable to us, it would still need to be rigorously discarded, if there are concerns with quality. For example, the TB samples from Hong Kong have two different types of DNA template (genomic versus whole genome amplified) between the cases and controls, which had produced bias in the genotype calls. In spite of the cohort would provide important validation to our findings, the results from this dataset were still disregarded and had to be excluded in Study 1. Nevertheless, if our studies should have any unknown misclassification of exposure (genotyping error) in our samples, it is likely to be non-differential, which would lead to a more conservative risk estimate that could have biased towards the null. 6.2.2 Confounding As epidemiologic research is mainly conducted among free-living human subjects that lacks control on the characteristics of the distribution of variables, hence the 89 relationship between the exposure and the trait of interest may be subjected to disturbing influence from a third extraneous factor, called confounder. All types of epidemiologic study are susceptible to confounding, and if unaccounted for, could mask the true effect or yield spurious association. Nevertheless, confounding can be neutralized at the design stage (e.g. by randomization or matching), and at the analysis stage through adjustment or stratification. Fortunately, as previously discussed, lifestyle and environmental factors are less likely to influence the genotypes of the study participants in genetic association studies. However, like all population-based studies, it could still be prone to confounding by difference in ethnicity, because there is possibility in certain traits to have ethnic influences. Therefore, in instances when our study utilized cohorts of different ethnic populations for validation, we ensured appropriate adjustments were made during analysis, through stratifying the data by ethnicity. Population stratification Nevertheless, even within a cohort there could still possibly be sub-group influences on the trait, such that the distributions between comparison groups tend to disparately over-represent particular sub-group for population sub-structure to occur. Furthermore, the nature of some loci in the genome have been ascribed to be capable of having sub-group (ethnic) specific allele frequencies differences, which most frequently have no effect on the trait, thus, association tests conducted on such loci would have derived results confounded by population stratification[64, 84]. The presence of population stratification is a serious problem for genetic studies, so it should need to be adequately assessed and controlled for, by using comprehensive (possibly genome wide) genotype data, or if necessary, with additional genotyping of AIMs to accomplish this. 90 This could be slightly difficult for the candidate gene design because small number of loci is usually typed in this kind of study, which is insufficient for the assessment to be done adequately. Furthermore, it is not always easy to identify appropriate set of AIMs, such that population stratification might seldom be detected in candidate gene studies. However, in recent years, it has become a norm for large number of randomly selected markers to have typed in a GWAS, such that it directly allows accurate assessment of possible genomic inflation, and if necessary, appropriate control/ adjustments for the inflation due to population stratification could also be readily performed[116, 148]. Cryptic relatedness Genetic association studies are based on the comparisons of the differences in frequency of genetic marker between unrelated cases and controls sampled from the outbred population, found at large. The presence of unknown kinship in the comparison group would violate this assumption, such that their genotypes are no longer independently drawn from the population frequencies. Hence, cryptic relatedness among subjects is another source of confounding unique in genetic studies that may potentially inflate false positive rate[84, 117]. Nonetheless, in recent years, tools have been robustly developed for diagnosing and controlling population stratification and cryptic relatedness for the large-scale genome wide data [84, 116, 119, 148]. The studies conducted in this thesis had also engaged these measures in its quality control practices, detailed in the methods section. 6.2.3 Random error Random error is a possible alternative explanation for the findings of a study. Despite, making efforts to exclude possible bias and confounders, as well as adopting 91 robust assays for accurate genotyping results, there is still likelihood for an observation to be resulted by chance. Like most study, we have expected to have percent chance for a finding to be incorrect, and have adopted an alpha (α) threshold of 0.05 to determine significance for a statistical test. However, large-scale studies are typical of performing large number of tests (SNPs), such that multiple testing adjustments are necessary. As with most GWAS, our studies followed the usual practice of adopting the Bonferroni corrected (α/ no. of tests) significance threshold, which controls the “family-wise error rate” at alpha (α), notwithstanding that there were criticisms for this practice as being too strict[153]. Furthermore, this is also despite the fact that, there are significant LDs within our genome, such that most of the SNPs tested were not necessarily be independent from one another. Hence, for exploratory study a less conservative alpha (α) threshold might be acceptable to suggest association. 92 7. Overall conclusions We are fortunate to have begun this project in the opening of a new era of genome wide association studies (GWAS), which to date, have brought an exponential increase in the number of indisputable associations that are important for many complex traits[50]. Even though the resources and knowledge to conduct GWAS were limited when we first started, the learning process is an exciting experience. The case-control study on pulmonary TB susceptibility is the first GWAS conducted in the Genome Institute of Singapore that has since piloted many of the successful GWAS of the institute. This study was performed with the Affymetrix 100K mapping set, which was then, the only GWAS product available for high throughput genotyping in the market. Along with cost constraints, we adopted an initial GWAS of just about a hundred cases and controls each[72], similar to those of Klein et al. study on age-related macular degeneration (Science, 2005)[67]. With just 96 cases and 50 controls genotyped in the same type of 100K array, the study on age-related macular degeneration had found genome wide significant (p [...]... other stages 20 3 Objectives The main objective of my thesis project is to identify genetic variation that influences susceptibility to infectious disease This was conducted in two studies to specifically address the following objectives: I To understand host genetic susceptibility to the natural infection of pulmonary tuberculosis II To identify genetic determinants influencing hepatitis B vaccine antibody... use genetic mapping for discovering relevant genes in pulmonary TB susceptibility For this kind of common disease, the genetic mode of action frequently attributed from multiple genetic factors, which individually produce modest effects and is present commonly in the population[41, 46] In this case, population-based association study that compare allele frequency differences between disease and non -disease. .. a breakdown of LD D’ and r2 are the two basic measurements for the extent of pairwise LD between two loci D’ estimates the history of recombination between a pair of variants[57], where a value of 1 indicates absence of recombination, whereas values less than 1 indicate presence of recombination that separates the two loci On the other hand, r2 is determined by dividing D’ by the product of the four... Incidentally, the necessity of familial design for detecting transmission also renders linkage studies a propensity to discover highly penetrant single gene defects of severe effect size, which seldom covers the polygenic spectrum of common complex trait Therefore, it is not surprising that most success stories of linkage analysis have the characteristic of mendelian disease, of which Huntington disease is the... study on pulmonary tuberculosis susceptibility 4.1 Background With evidence dated as far back as 9,000 years ago, TB is a disease with long history of causing much suffering to many, and it is thus one of the top infectious causes of mortality[74] It is primarily an infection of the lungs that is spread by inhaling droplets from the coughing, sneezing or even speech of an ill individual TB has close... tuberculosis) has infected around a third of the world’s population, only 3-10% of those infected develop active disease during their lifetime[76] More than 90% of infected individuals remain asymptomatic with a latent infection This indicates that host immune / defense pathways are often highly effective in controlling this disease Because the infection causes such a burden of disease in those unable to contain... trait of interest In general, non-synonymous SNP in the coding region of a gene is a promising candidate, because of the ease of predicting from database for an obvious functional role[51] However, due to evolutionary pressure, most of the functional coding sequence of genes are highly conserved and have low frequency in the genome[52] In addition, many of the causal variants in common complex diseases... Accra, Ghana 27 4.2 Genetic study 4.2.1 Candidate gene approach The infectious nature of M tuberculosis has made experimental infection studies of inoculating the pathogen directly into humans ethically impossible Hence, most of the TB candidate genes are from immunology studies that are frequently modeled on animals or through shared etiologies among infectious diseases[2, 62] Most of the previous candidate... statistic of p = 9.6 x 10-8, as their evidence of significant association, but this was based on a joint analysis of all four infectious diseases On the other hand, the evidence specifically for TB (p 0.04) was only marginally significant; hence it is not unexpected that the association of rs8177374 with TB is not replicated in our study Besides, it is obvious that TB, invasive pneumococcal disease, ... multiple specific pathways of infection and host defense Even if TIRAP were involved in each of these diseases in a similar way, its relative contribution should be expected to vary From the results of this study, we suggest against combining datasets of distinct diseases in a single association test to avoid false conclusion Nevertheless, we want to encourage a unified effort of combining well-powered . variable degrees of immune response to infection, and disease susceptibility among individuals[1, 2]. The burden of infectious diseases has been tremendous throughout the history of mankind, not. experience and genetic responses of our ancestors to infections, and the consequential selection and thus inherited form of our current gene pool[25, 33]. 2.3 Types of genetic variation Genetic. environment, cause infectious diseases. The intrusive attack of such infectious agents typically triggers a natural cascade of tightly regulated immune responses to control the infection, which often resulted

Genetic determinants of infectious disease susceptibility

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan