c Indian Academy of Sciences RESEARCH ARTICLE LIG1 polymorphisms: the Indian scenario AMIT KUMAR MITRA1,2 , ASHOK SINGH2 , INDIAN GENOME VARIATION CONSORTIUM3 and SRIKANTA KUMAR RATH2 ∗ 1 Institute of Human Genetics, Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA 2 CSIR-Central Drug Research Institute, Lucknow 226 031, India 3 Nodal Laboratory, CSIR-Institute of Genomics and Integrative Biology, New Delhi 110 007, India Abstract Elucidation of the genetic diversity and relatedness of the subpopulations of India may provide a unique resource for future analysis of genetic association of several critical community-specific complex diseases. We performed a comprehensive exploration of single nucleotide polymorphisms (SNPs) within the gene DNA ligase 1 (LIG1) among a multiethnic panel of Indian subpopulations representative of the ethnic, linguistic and geographical diversity of India using a two-stage design involving DNA resequencing-based SNP discovery followed by SNP validation using sequenom-based genotyping. Thirty SNPs were identified in LIG1 gene using DNA resequencing including three promoter SNPs and one coding SNP. Following SNP validation, the SNPs rs20580/C19008A and rs3730862/C8804T were found to have the most widespread prevalence with noticeable variations in minor allele frequencies both between the Indian subpopulation groups and also from those reported on other major world populations. Subsequently, SNPs found in Indian subpopulations were analysed using bioinformaticsbased approaches and compared with SNP data available on major world populations. Further, we also performed genotype– phenotype association analysis of LIG1 SNPs with publicly available data on LIG1 mRNA expression in HapMap samples. Results showed polymorphisms in LIG1 affect its expression and may therefore change its function. Our results stress upon the uniqueness of the Indian population with respect to the worldwide scenario and suggest that any epidemiological study undertaken on the global population should take this distinctiveness in consideration and avoid making generalized conclusions. [Mitra A. K., Singh A., Indian Genome Variation Consortium and Rath S. K. 2014 LIG1 polymorphisms: the Indian scenario. J. Genet. 93, xx–xx ] Introduction India occupies only 2.4% of the world’s land area, while it supports over 17.5% of the world’s population (Census of India 2001). India has more than 2000 ethnic groups, all major religions of the world, four major language families (Indo-European, Dravidian, Austro-Asiatic and TibetoBurman) and the morphological subgroups Caucasoid, Mongoloid, Australoid and Negritos (Grierson 1927; Malhotra 1978). Further complexity is lent by the great variation that occurs across this population on social parameters such as income and education. Such a rich repertoire of diversity provides valuable repositories for genetic association studies. Elucidation of the genetic diversity of the numerous, large, localized, isolated subpopulation groups within India is therefore a primary prerequisite for trying to understand ∗ For correspondence. E-mail: [email protected]; [email protected]. the genetic basis of several critical complex disorders. However, genome projects undertaken on the world populations so far have historically overlooked the major Indian population subgroups (Genomes Project Consortium 2010, 2012; International HapMap Consortium 2003, 2010). DNA ligase I (LIG1) is the main replicative ligase of eukaryotes active in the replication forks of dividing chromosomes where it is critical for joining of Okazaki fragments and completing DNA synthesis and also to complete base excision repair (BER) pathway that involves resynthesis of DNA by sealing the gaps in the resynthesized regions, since unsealed nicks in genomic DNA are potentially dangerous for the cell (Ma et al. 1995; Timson et al. 2000). Therefore, DNA ligase is an essential enzyme not only in the synthesis of new DNA during cell division, but also in maintaining the integrity of the genome. Defects in DNA repair pathway is considered to play a central role in cancer biology, whereby some individuals are at very high risk of cancer due to Keywords. LIG1; MAF; SNP; HapMap. Journal of Genetics, Vol. 93, No. 2, August 2014 Amit Kumar Mitra et al. Table 1. Detailed information on the Indian subpopulations included in the study (each population was labelled on the basis of linguistic group). Three letter code Subpopulation type Kurku Sahariya Baiga Santhal Munda KKU SHY BAI STL MUN Tribe Tribe Tribe Tribe Caste AA-E-IP3 AA-NE-IP1 AA-W-IP1 DR-C-IP1 DR-C-IP2 Juang Khasi Kolis Gond Vaidiki Brahmin JNG KHS KLS GND VKB Tribe Caste Caste Tribe Tribe DR-C-LP1 DR-E-IP1 DR-S-IP1 DR-S-IP2 DR-S-IP3 DR-S-IP4 IE-E-LP1 IE-E-LP2 IE-E-LP3 IE-E-LP4 IE-NE-IP1 IE-NE-LP1 Bison Horn Maria Madia Paniyan Chenchu Halakki Kuruman Chik Baraik Kayastha (WB) Mahishya Oriya Brahmin Hajong Namsudra BHM MDA PNY CNC HLK KRM CIB KWB MHA ORB HJG NSD Tribe Tribe Tribe Tribe Tribe Tribe Caste Caste Tribe Caste Tribe Caste IE-N-IP1 IE-N-IP2 IE-N-LP1 Kannet (HP) Tharu Chamar (Hp) KNT THR CMR Tribe Tribe Caste IE-N-LP10 IE-N-LP11 Saryuparin Brahmin Khatri SPB KHT Caste Tribe IE-N-LP2 Jats JAT Caste IE-N-LP3 Kanyakubj Brahmin KKB Caste IE-N-LP5 IE-N-LP6 Kashmiri Pandit Kayastha (Up) KSP KUP Caste Caste IE-N-LP7 Koli (Hp) KOL Caste IE-N-LP8 IE-N-LP9 Rajput (Uttarakhand) Rajput (HP) RJU RJH Caste Caste IE-N-SP1 IE-N-SP2 IE-N-SP3 IE-N-SP4 IE-N-SP5 IE-S-IP1 IE-W-IP1 IE-W-IP2 IE-W-LP1 IE-W-LP2 IE-W-LP3 IE-W-LP4 TB-NE-LP1 TB-N-IP1 TB-N-SP1 Aggarwals Ramgariah Sikh Sunni Shia Syed (Sunni) Hakkipikki Bhil Dongri Bhil Deshastha Brahmins Kokanastha Brahmins Paliwal Brahmin Patidar Meitei Spiti (HP) Buddhists AGL RAM SUI SHI SYD HPK BHL DBH DEB KOB PAL PTD MEI SPT BUD Caste Religious group Religious group Religious group Religious group Tribe Tribe Tribe Caste Caste Caste Caste Caste Tribe Religious group TB-N-SP2 Buddhist heterogenous BUH Religious group Population ID Subpopulation name AA-C-IP1 AA-C-IP4 AA-C-IP5 AA-E-IP1 AA-E-IP2 Geographical location within India Madhya Pradesh, Maharashtra Madhya Pradesh, Rajasthan, Uttar Pradesh Madhya Pradesh, Chhattisgarh Jharkhand, Bihar, Orissa, West Bengal, Tripura, Assam Bihar, Jharkhand, West Bengal, Orissa, Madhya Pradesh, Tripura Orissa Meghalaya, Tripura, Assam Madhya Pradesh, Gujarat, Maharashtra, Daman & Diu Chhattisgarh Chhattisgarh, Madhya Pradesh, Andhra Pradesh, Maharashtra, Bihar, Orissa, Assam Chattisgarh Orissa Kerala, Tamil Nadu Andhra Pradesh Karnataka Kerala, Tamil Nadu Jharkhand, West Bengal West Bengal, Delhi, Jharkhand, Tripura, Assam, Orissa West Bengal, Bihar, Orissa Orissa Meghalaya, Assam Assam, West Bengal, Tripura, Orissa, Mizoram, Manipur, Meghalaya, Bihar, Madhya Pradesh Himachal Pradesh, Punjab, Haryana Uttarakhand, Uttar Pradesh, Bihar Himachal Pradesh, Uttar Pradesh, Bihar, Delhi, Haryana, Punjab, Tripura, Gujarat, Rajasthan, Madhya Pradesh, Orissa, West Bengal, Goa, Daman and Diu Uttar Pradesh, Madhya Pradesh, Bihar, Delhi Punjab, Haryana, Delhi, Himachal Pradesh, Jammu and Kashmir, Uttar Pradesh, Bihar, Rajasthan, Gujarat, Maharashtra, Tamil Nadu Delhi, Haryana, Uttar Pradesh, Punjab, Rajasthan, Himachal Pradesh, Jammu and Kashmir, Madhya Pradesh Uttar Pradesh, Jharkhand, Bihar, Madhya Pradesh, Chhattisgarh, Delhi Jammu and Kashmir, Delhi Uttar Pradesh, Delhi, Bihar, Madhya Pradesh, Rajasthan, Punjab Himachal Pradesh, Punjab, Haryana, Delhi, Maharashtra, Uttar Pradesh, Rajasthan, Karnataka Uttarakhand, Uttar Pradesh, Gujarat, Rajasthan Himachal Pradesh, Punjab, Haryana, Delhi, Orissa, Bihar, West Bengal, Assam, Karnataka, Andhra Pradesh Haryana Punjab, Delhi, Rajasthan Uttar Pradesh Uttar Pradesh Jammu and Kashmir Karnataka Gujarat, Rajasthan, Madhya Pradesh, Maharashtra, Tripura Gujarat, Rajasthan Maharashtra Maharashtra, Goa, Karnataka Rajasthan Gujarat Manipur, Assam Himachal Pradesh Jammu and Kashmir, Himachal Pradesh, Sikkim, Arunachal Pradesh, Assam Jammu and Kashmir IE, Indo-European; TB, Tibeto-Burman; DR, Dravidian; AA, Austro-Asiatic, followed by geographical zone E, east; W, west; N, north; NE, northeast; S, south, and size and nature of population LP, large population; SP, small population; IP, isolated population; The OG-W-IP derived from an African population was used as outlier. Journal of Genetics, Vol. 93, No. 2, August 2014 LIG1 polymorphisms: the Indian scenario inherited single-nucleotide polymorphisms (SNPs) in genes controlling DNA repair which may impair their function and contribute to genetic susceptibility towards various cancers (Bohr 1995; Ma et al. 1995; Tlsty et al. 1995; Cheng et al. 1998). The present study focusses on the identification and validation of SNPs within the exonic and flanking regions of the gene LIG1 (gene name ligase I, DNA, ATP-dependent; gene ID, 3978; OMIM ID, 126391; gene length, 54858 bp; chromosomal location, 19q13.2–19p13.3) within selected Indian subpopulations representative of the ethnic, linguistic and geographic diversity of India to generate an India-specific information resource of the allele and genotype frequencies of SNPs in LIG1. Materials and methods Population identification and sample collection Details on population identification, study design and sample collection are available elsewhere (Indian Genome Variation Consortium 2005, 2008). Briefly, a two-stage study design was employed which involved an initial step of SNP discovery within all the 27 exons and flanking regions of the gene LIG1 carried out by bidirectional DNA sequencing on a set of 43 samples representative of the linguistic groups, morphological, regional and religious diversity of India. This was followed by validation of SNPs identified in the SNP discovery phase using Sequenom’s MassArray genotyping technology (Sequenom, San Diego, USA) on a larger multiethnic Indian panel of 576 representative samples from 24 Indian subpopulations which included 14 from Indo-European (IE), three from Tibeto-Burman (TB), four from Dravidian (DR) and two from Austro-Asiatic (AA) linguistic subpopulation clusters drawn from geographically and ethnically diverse subpopulations. Each population was labelled on the basis of language, followed by geographical zone and ethnic category, e.g. IE-E-LP1, where IE stands for Indo-European linguistic subgroup, E stands for eastern part of India and LP signifies large population. Details of the subpopulations included in our study are provided in table 1. The study was approved by Medical Ethics Committee of Central Drug Research Institute (CDRI). DNA isolation Genomic DNA from population samples was extracted from peripheral blood leucocytes using modified salting out procedure (Miller et al. 1988), quantitated and stored at −20◦ C. Primer designing and PCR For sequence analysis among population samples, LIG1 DNA sequence was obtained from NCBI RefSeq sequence database (GenBank accession no. NT_011109.13). PCR primers were designed for all the exons and flanking regions of the gene LIG1 using the PrimerSelect module of Lasergene ver. 6.0 (DNASTAR, Madison, USA) as shown in figure 1. Detailed list of the primers designed is available on request. The primer sequences were verified using NCBI BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and UCSC in silico PCR (http://genome-mirror.duhs.duke.edu/cgi-bin/ hgPcr) to eradicate the possibility of amplification of any nonspecific DNA sequences. All the PCR reactions were carried out using reagents from Fermentas Life Sciences (Fisher Scientific, Pittsburgh, USA) in a total reaction volume of 50 μL containing nearly 100 ng genomic DNA, 1.5 U Taq polymerase in 1× PCR buffer, 1.5 mM MgCl2 , 0.2 mM dNTPs Figure 1. Snapshot from UCSC genome browser depicting the human DNA ligase 1 (LIG1) genes and the PCR amplicons (AMP). Journal of Genetics, Vol. 93, No. 2, August 2014 Amit Kumar Mitra et al. and 15 pM of each primer. Thermal cycling conditions were as follows: initial denaturation step at 95◦ C for 10 min, 31 cycles of PCR consisting of denaturation at 94◦ C for 1 min, annealing at annealing temperatures optimized using gradient PCR for 1 min and extension at 72◦ C for 1 min, followed by a final extension step at 72◦ C for 5 min. The PCR products were visualized by electrophoresis on 1.5% agarose gel. at the site of the substitution (Ng and Henikoff 2003) and PolyPhen (Polymorphism Phenotyping, http://genetics.bwh. harvard.edu/pph/index.html) that predicts the functional importance of the amino acid substitution by merging the conservation score with physicochemical differences and structural features of the polymorphic variants (Ramensky et al. 2002). DNA resequencing and SNP detection Comparison with HapMap database Bidirectional DNA sequencing was performed on ABI 3100 and ABI3730 capillary-based sequencers (Applied Biosystems, Foster City, USA). Sequence assembly and SNP analysis were performed using the Phred-Phrap-Consed package (University of Washington; Seattle, WA; http://www. phrap.org/phredphrapconsed.html) that detects the presence of heterozygous single nucleotide substitutions following fluorescence-based sequencing of PCR products (Nickerson et al. 1997) and the Multiple Sequence Alignment module Seqman of Lasergene 6.0 software (DNA STAR, Madison, USA). LIG1 SNP data obtained in our study on the Indian subpopulations were further compared with genotype data available on the major world populations available through the HapMap database belonging to Chinese (CHB), Japanese (JPT), African (YRI) or CEPH (European) ancestries (The International HapMap Consortium 2003). The reason behind including major world populations from the International HapMap project was to use the publicly available SNP data as well as genomewide gene expression data (GSE7761) on HapMap samples for subsequent genotype–phenotype association analysis. SNP validation Analysis of association of LIG1 SNPs with its expression in HapMap populations SNPs were selected for the next phase of study (SNP validation) based on strict selection criteria which include minor allele frequencies (MAFs) of the SNPs greater than 10% in the first stage of SNP exploration, location within exonic or regulatory regions and spacing between SNPs (> 1 kb). Some reported SNPs were force-included into the SNP validation procedure, owing to potential importance in context of the function of the gene or to fill additional gaps in the genome to account for uniform spacing between SNPs. Genotyping was performed using matrix assisted laser desorption ionization (MALDI) time of flight (TOF) mass spectroscopy (MS) technology-based chemistry on the Sequenom’s MassArray Platform (Sequenom, San Diego, USA). Quality control (QC) performed prior to consideration of each SNP for analysis included Hardy–Weinberg checks using Fisher’s exact test at 5% significance level. Bioinformatic analysis The SNPs investigated in the discovery and validation phases of the study were analysed bioinformatically using the following tools: (i) Transcription factor binding sites promoter SNPs were evaluated for the loss/gain of known cis regulatory motif-binding sites using MatInspector 8.0 ver. software (Genomatix Software GmbH, Munich, Germany) (Cartharius et al. 2005). (ii) Prediction of consequences on protein structure and/or function: Phenotypic implications of the nonsynonymous SNPs (nsSNPs) found within LIG1 were predicted using programs such as SIFT (Sorting Intolerant From Tolerant, http://sift.bii.a-star.edu.sg/) that predicts the impact of any amino acid substitution on protein function and computes a score representing the likelihood of mutability To further investigate the role of LIG1 SNPs, SNP information for all the LIG1 SNPs genotyped in the CEPH populations was retrieved from the HapMap SNP database. The CEPH population subgroup consists of 90 samples from 30 trios (two parents and one child) belonging to the European ancestry. LIG1 gene expression data generated using Affymetrix ExonArray was publicly available and downloaded from NCBI’s Gene Expression Omnibus database (GSE7761). Genotype–phenotype association was performed between LIG1 SNPs and its expression within all the unrelated (parents only) samples. Statistical analysis Genotype frequencies were calculated using the program Gencount while Maxlik1 was used to calculate maximum likelihood estimates of allele frequencies, standard deviations and Hardy–Weinberg chi-square values. Allhet was used to generate chi-square values for testing the homogeneity of the data on SNPs between the Indian subpopulations included in the study. Gencount, Maxlik1 and Allhet programs were designed in-house by collaborators for the analysis. Dispan (http://iubio.bio.indiana.edu/soft/ molbio/ibmpc/), a genetic distance and phylogenetic analysis program, was used to calculate average heterozygosity (H0 ) and its standard error for every SNP in each subpopulation under study, Genetic diversity (Ht ) and its associated parameters such as interpopulation gene variation (GST ) and intrapopulation gene variation (Hs ), etc. The program also has inbuilt options for performing bootstrapping (Saitou and Nei 1987). Journal of Genetics, Vol. 93, No. 2, August 2014 LIG1 polymorphisms: the Indian scenario Haplotype analysis was performed from phase unknown genotype data using the software Phase ver. 2.1, that uses a Bayesian Markov chain–Monte Carlo algorithm (Stephens and Scheet 2005). The population OG-W-IP derived from an African population (Singh 2002) was used as an outlier in the phylogenetic tree. The Kruskal–Wallis test and Wilcoxon’s rank-sum test were used to perform genotype–phenotype association analysis between HapMap SNPs and LIG1 gene expression. Results Results of two-stage SNP exploration study Thirty SNPs including three promoter SNPs and one SNP in the coding region were identified in LIG1 gene during the first stage of SNP exploration process using resequencing (table 2). Eleven SNPs were found in the HapMap database within major world populations while seven SNPs were found in the NCBI’s dbSNP database but they were not present in HapMap. Detailed comparison of minor allele frequencies of the SNPs identified in the discovery phase and the SNP data available on major HapMap subpopulations is provided in figure 2. Twelve novel SNPs were found that were not present in HapMap or NCBI’s dbSNP database at the time of commencement of this study. Based on the criteria for selection of SNPs described in the Materials and methods section, the final list for SNP validation study using sequenom-based genotyping included 10 SNPs; rs3730862, rs4987181, rs4987070, rs12981963, rs20580, rs11879148, rs3730933, rs3730966, rs3731003 and rs11666150. During validation, four of the LIG1 SNPs (rs3730933, rs3730966, rs11879148 and rs12981963) were found to be monomorphic. Four SNPs (rs11666150/A54472C, rs37310 03/C42344T, rs4987070/A8945G and rs4987181/C8885T) were restricted to less than three subpopulations of India. Minor alleles of rs11666150 were found in the subpopulations IE-E-LP2 and OG-W-IP, rs3731003 mutant was found in OG-W-IP, rs4987070 mutant was found in AA-C-IP5, while the SNP rs4987181 was observed only in one sample from the subpopulation DR-S-LP3. Consequently, two SNPs Table 2. Frequencies of LIG1 SNPs identified during discovery phase following DNA sequencing. Results were analysed using Phred/Phrap/Polyphred/Consed pipeline and DNAStar. SNP ID rs439132 rs20579 rs3730849 rs3730853 rs56224917 rs3730861 rs3730862 rs20580 rs3730931 rs2288878 rs2288880 28668C/T rs3730976 rs2288882 29629A/G 31601G/C 34247C/A rs392891 rs3730944 34547T/G 34569C/T 34607G/T 37611C/T rs3731004 37782A/T 38112C/T 38188G/A rs3731007 42189C/A rs3731009 Position from ATG Location in LIG1 gene (NT_011109.13 reverse complement) Alleles MAF Consequence to transcript Validation status –132 –48 –13 2897 2966 3954 3985 14230 21542 21900 28549 28668 29311 29548 29629 31601 34247 34464 34485 34547 34569 34607 37611 37652 37782 38112 38188 42179 42189 42394 4647 4731 4766 7675 7744 8732 8763 19008 26320 26678 33327 33446 34089 34326 34407 36379 39025 39242 39263 39325 39347 39385 42389 42430 42560 42890 42966 46957 46967 47172 A:G C:T C:T T:A T:G C:T C:T A:C A:G G:A G:C C:T A:A C:T A:G G:C C:A T:A G:A T:G C:T G:T C:T G:C A:T C:T G:A C:T C:A G:A 0.167 0.045 0.208 0.071 0.024 0.093 0.426 0.462 0.045 0.281 0.167 0.05 0.5 0.3 0.028 0.042 0.056 0.365 0.058 0.019 0.019 0.071 0.017 0.017 0.158 0.024 0.02 0.175 0.083 0.083 5PRIME_UTR 5PRIME_UTR 5PRIME_UTR Intronic Intronic Intronic Intronic Synonymous_coding Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic Intronic HapMap HapMap HapMap Not in HapMap Not in HapMap Not in HapMap HapMap HapMap HapMap HapMap HapMap Novel HapMap Not in HapMap Novel Novel Novel HapMap Not in HapMap Novel Novel Novel Novel Not in HapMap Novel Novel Novel HapMap Novel Not in HapMap NCBI GenBank accession no. NT_011109.13 (reverse complement); MAF, minor allele frequency. Journal of Genetics, Vol. 93, No. 2, August 2014 Amit Kumar Mitra et al. 1 0.9 Minor allele frequencies 0.8 0.7 MAF_DSNP 0.6 MAF_CEPH 0.5 MAF_YRI MAF_CHB 0.4 MAF_JPT 0.3 0.2 0.1 0 rs20579 rs20580 rs2288878 rs3730849 rs3730862 rs3730931 rs3730976 rs3731007 rs392891 rs439132 Figure 2. Comparison of minor allele frequencies (MAFs) of LIG1 SNPs between discovery phase (DSNP) results and data available on major HapMap subpopulations. (rs20580/C19008A and rs3730862/C8804T) were found to be present in more than three subpopulations across the country. Allele and genotype frequencies of the SNPs, rs20580 and rs3730862 are provided in tables 3 and 4, respectively. The overall MAFs of the SNPs rs20580 (A) and rs3730862 (T) in the Indian population were 0.48 and 0.39, respectively. Considerable variations in MAFs were noted across the four linguistic lineages and geographical zones of India, ranging from 0.24 in IE-N-LP9 to 0.75 in IE-NE-IP1 for rs20580 (A) and from as low as 0.11 in IE-W-LP3 to 0.77 in IE-NE-IP1 for SNP rs3730862 (T). Among the linguistic groups, the highest MAF for rs20580 (A) was observed in the AustroAsiatic and Tibeto-Burman populations (0.63), whereas the highest MAF observed for rs3730862 (T) was in the AustroAsiatic (0.58) subpopulation. The Dravidian linguistic cluster exhibited lowest MAF for both of the polymorphisms, 0.43 and 0.37 for rs20580 (A) and rs3730862 (T), respectively. The mean heterozygosity values, considering both Table 3. Genotype and allele frequencies for the SNP rs20580 (LIG1) among the Indian subpopulations following SNP validation. Subpopulations n CC AA-C-IP5 AA-E-IP3 DR-C-IP2 DR-S-IP4 DR-S-LP2 DR-S-LP3 IE-E-IP1 IE-E-LP2 IE-E-LP4 IE-NE-IP1 IE-NE-LP1 IE-N-IP2 IE-N-LP1 IE-N-LP5 IE-N-LP9 IE-N-SP4 IE-W-LP1 IE-W-LP2 IE-W-LP3 IE-W-LP4 OG-W-IP TB-NE-LP1 TB-N-IP1 TB-N-SP1 Overall (India) 17 21 19 23 23 23 22 22 23 22 23 23 18 22 23 22 21 20 23 23 19 22 22 23 519 0.00 0.14 0.37 0.30 0.43 0.52 0.14 0.36 0.43 0.05 0.39 0.09 0.56 0.27 0.61 0.32 0.29 0.45 0.52 0.30 0.42 0.09 0.23 0.13 0.31 Genotype frequencies CA AA 0.71 0.52 0.32 0.30 0.30 0.35 0.55 0.45 0.35 0.41 0.35 0.35 0.33 0.50 0.30 0.55 0.38 0.40 0.43 0.39 0.42 0.36 0.36 0.57 0.41 0.29 0.33 0.32 0.39 0.26 0.13 0.32 0.18 0.22 0.55 0.26 0.57 0.11 0.23 0.09 0.14 0.33 0.15 0.04 0.30 0.16 0.55 0.41 0.30 0.28 2n 34 42 38 46 46 46 44 44 46 44 46 46 36 44 46 44 42 40 46 46 38 44 44 46 1038 Journal of Genetics, Vol. 93, No. 2, August 2014 Allele frequencies p (C) q (A) 0.35 0.40 0.53 0.46 0.59 0.70 0.41 0.59 0.61 0.25 0.57 0.26 0.72 0.52 0.76 0.59 0.48 0.65 0.74 0.50 0.63 0.27 0.41 0.41 0.52 0.65 0.60 0.47 0.54 0.41 0.30 0.59 0.41 0.39 0.75 0.43 0.74 0.28 0.48 0.24 0.41 0.52 0.35 0.26 0.50 0.37 0.73 0.59 0.59 0.48 SD 0.014 0.012 0.013 0.011 0.011 0.010 0.011 0.011 0.011 0.010 0.011 0.010 0.012 0.011 0.009 0.011 0.012 0.012 0.010 0.011 0.013 0.010 0.011 0.011 0.000 LIG1 polymorphisms: the Indian scenario Table 4. Genotype and allele frequencies for the SNP rs3730862 (LIG1) within the Indian subpopulations following SNP validation. Subpopulations n WW (CC) AA-C-IP5 AA-E-IP3 DR-C-IP2 DR-S-IP4 DR-S-LP2 DR-S-LP3 IE-E-IP1 IE-E-LP2 IE-E-LP4 IE-NE-IP1 IE-NE-LP1 IE-N-IP2 IE-N-LP1 IE-N-LP5 IE-N-LP9 IE-N-SP4 IE-W-LP1 IE-W-LP2 IE-W-LP3 IE-W-LP4 OG-W-IP TB-NE-LP1 TB-N-IP1 TB-N-SP1 Overall (India) 11 21 16 21 20 21 20 17 22 22 22 22 21 21 23 20 23 22 22 22 23 21 22 23 498 0.18 0.19 0.56 0.38 0.50 0.57 0.25 0.53 0.64 0.05 0.50 0.09 0.67 0.38 0.70 0.40 0.43 0.41 0.77 0.45 0.91 0.19 0.23 0.26 0.43 Genotype frequencies WM (CT) MM (TT) 0.36 0.52 0.44 0.24 0.25 0.33 0.50 0.41 0.32 0.36 0.32 0.45 0.29 0.38 0.26 0.40 0.39 0.41 0.23 0.50 0.09 0.38 0.32 0.52 0.36 the LIG1 polymorphisms together, ranged from 0.296 (IE-ELP4) to 0.512 (TB-NE-LP1) (table 5). Gst , Ht and Hs values Table 5. Average heterozygosity (H0 ) values for the gene LIG1 among Indian subpopulations. Subpopulations AA-C-IP5 AA-E-IP3 DR-C-IP2 DR-S-IP4 DR-S-LP2 DR-S-LP3 IE-E-IP1 IE-E-LP2 IE-E-LP4 IE-NE-IP1 IE-NE-LP1 IE-N-IP2 IE-N-LP1 IE-N-LP5 IE-N-LP9 IE-N-SP4 IE-W-LP1 IE-W-LP2 IE-W-LP3 IE-W-LP4 OG-W-IP TB-NE-LP1 TB-N-IP1 TB-N-SP1 rs3730862 rs20580 LIG1 SNPs_combined (rs3730862,rs20580) 0.485 0.315 0.401 0.353 0.359 0.413 0.510 0.496 0.507 0.495 0.384 0.511 0.513 0.511 0.433 0.467 0.495 0.477 0.494 0.396 0.485 0.512 0.502 0.471 0.512 0.334 0.460 0.426 0.294 0.487 0.483 0.406 0.084 0.502 0.372 0.394 0.481 0.394 0.495 0.495 0.496 0.207 0.478 0.492 0.485 0.511 0.444 0.507 0.499 0.324 0.431 0.389 0.326 0.45 0.497 0.451 0.296 0.499 0.378 0.452 0.497 0.453 0.464 0.481 0.495 0.342 0.486 0.444 0.485 0.512 0.473 0.489 0.45 0.29 0.00 0.38 0.25 0.10 0.25 0.06 0.05 0.59 0.18 0.45 0.05 0.24 0.04 0.20 0.17 0.18 0.00 0.05 0.00 0.43 0.45 0.22 0.21 2n Allele frequencies p (C) q (T) 22 42 32 42 40 42 40 34 44 44 44 44 42 42 46 40 46 44 44 44 46 42 44 46 996 0.36 0.45 0.78 0.50 0.63 0.74 0.50 0.74 0.80 0.23 0.66 0.32 0.81 0.57 0.83 0.60 0.63 0.61 0.89 0.70 0.96 0.38 0.39 0.52 0.61 0.64 0.55 0.22 0.50 0.38 0.26 0.50 0.26 0.20 0.77 0.34 0.68 0.19 0.43 0.17 0.40 0.37 0.39 0.11 0.30 0.04 0.62 0.61 0.48 0.39 SD 0.022 0.012 0.013 0.012 0.012 0.010 0.013 0.013 0.009 0.010 0.011 0.011 0.009 0.012 0.008 0.012 0.010 0.011 0.007 0.010 0.004 0.012 0.011 0.011 0.000 for the LIG1 SNPs taken together were 0.122, 0.491 and 0.431, respectively while the Gst , Ht and Hs values for individual SNPs were 0.102, 0.497 and 0.446, respectively for rs3730462 and 0.142, 0.486 and 0.417, respectively, for rs20580. Comparison with HapMap database When compared to the world population data on the HapMap database, the overall MAF of rs20580 (A) was found to be in close agreement with the central European [CEU] (0.46) population but very different from the Japanese [JPT] (0.65), Han Chinese [CHB] (0.58) and Yoruba [YRI] (0.53) populations. Consistently, the MAF of rs3730862 (T) observed in the present study (0.37) was also found to be closer to that of the central European [CEU] (0.33) and very different from the MAFs observed in Japanese [JPT] (0.58), Han Chinese [CHB] (0.51) and Yoruba [YRI] (0.01) populations (The International HapMap Consortium 2003) (table 6). Further analysis using the Indo-European linguistic subpopulation cluster alone demonstrated comparable MAFs for both rs20580A and rs3730862T among LPs, while the isolated populations (IPs/tribals) showed strikingly high MAFs. Results of bioinformatic analysis Bioinformatic analysis of the promoter SNPs rs439132A/G, rs20579C/T and rs3730849C/T using MatInspector showed differential transcription factor binding potential between Journal of Genetics, Vol. 93, No. 2, August 2014 Amit Kumar Mitra et al. Table 6. Comparison of mutant allele frequencies within the Indian population and between the Indian linguistic subpopulation clusters and the world populations (data obtained from HapMap project). rs20580 (LIG1) SD rs3730862 (LIG1) SD AA DR IE TB IND 0.62 0.43 0.46 0.63 0.48 0.056 0.037 0.020 0.042 0.016 0.58 0.35 0.37 0.57 0.39 0.062 0.038 0.020 0.043 0.015 CEU CHB JPT YRI 0.46 0.58 0.65 0.53 0.047 0.053 0.051 0.054 0.33 0.51 0.58 0.01 0.043 0.053 0.053 0.008 Populations in the current experiment Populations in the HapMap database (AA, Austro-Asiatic; D, Dravidian; IE, Indo-Europeans; TB, Tibeto-Burman; IND, Overall mutant allele frequency of the Indian population; CEU, CEPH (Utah residents with ancestry from northern and western Europe); YRI, Yoruba in Ibadan, Nigeria; JPT, Japanese in Tokyo, Japan; CHB, Han Chinese in Beijing, China) the wild type and mutant alleles (figure 3). Analysis of the consequences of the nonsynonymous SNPs on protein function using the protein prediction softwares SIFT and polyphen showed that rs4987070 (D72G), rs4987181 (P52L) and rs11666150 (Q892H) were potentially damaging, while rs3731003 (T614I) was benign/tolerated. Figure 3. Analysis of transcription factor-binding potential of the promoter SNPs rs439132A/G, rs20579C/T and rs3730849C/T using MatInspector 8.0. (SNPs +/− 10 bp were considered for analysis. Detailed information and colour keys for matrix families are provided). Journal of Genetics, Vol. 93, No. 2, August 2014 LIG1 polymorphisms: the Indian scenario p = 0.02 Log (LIG1 mRNA expression) Log (LIG1 mRNA expression) p = 0.024 Figure 4. Association of LIG1 SNPs with LIG1 mRNA expression among unrelated HapMap CEPH samples. (LIG1 SNP information was obtained from HapMap database (http://www.HapMap.org); LIG1 mRNA expression values for unrelated HapMap CEPH samples was obtained from Affymetrix ExonArray data downloaded from NCBI’s Gene Expression Omnibus database (GSE7761)). Analysis of genotype–phenotype association among HapMap samples The results of genotype–phenotype correlation in unrelated CEPH samples using LIG1 ExonArray gene expression data generated using NCBI’s Gene Expression Omnibus database (GSE7761) are provided in figure 4. At a P value cutoff of P<0.05, the SNPs rs16981519 and rs3730837 were found to be significantly associated with LIG1 expression. Genotypes with the minor alleles rs16981519T and rs3730837C were associated with lower LIG1 gene expression compared to the homozygous wild-type genotype. Discussion DNA ligases are a large family of evolutionarily related proteins that play important roles in a wide range of DNA transactions, including chromosomal DNA replication, DNA repair and recombinations in all kingdoms of life. LIG1 is a ATP-dependent DNA ligase that catalyses the joining of single-stranded breaks (nicks) in the phosphodiester backbone of double-stranded DNA through condensation reaction (Timson et al. 2000). Evidences from LIG1 deficiency syndromes indicate its distinct role in DNA damage repair pathway (Petrini et al. 1995). Although LIG1 polymorphisms have been dealt with in NCBI’s dbSNP and HapMap databases on various world populations (YRI, JPT, CHB, CEU), so far, no study has been carried out on the overall distribution of LIG1 SNPs in the Indian population which has huge sociocultural, linguistic and biological diversity (Genomes Project Consortium 2010, 2012; The International HapMap Consortium 2003, 2010). Therefore, we found the need to create an India-specific SNP resource providing detailed information on polymorphisms existing in the gene LIG1 including subpopulation-specific allele and genotype frequencies. We feel, this is a prerequisite for future studies on the association of LIG1 polymorphisms with any complex disorder within Indian subpopulations. Therefore, with the help of trained anthropologists and cues from published literatures on the structure and variability of the Indian population (Risley 1915; Malhotra 1978; Gadgil et al. 1998; Pattanayak et al. 1998; Roychoudhury et al. 2001) a well chalked out two-stage study was designed to fish out and validate SNPs in LIG1 among representative samples from the Indian subpopulations belonging to the major population clusters of India. Among the 27 amplicons (PCR products of primers designed for exonic and flanking intronic regions) studied within the gene LIG1, 30 SNPs were identified following DNA resequencing. SNP validation studies were performed on 10 SNPs based on the potential importance in context of the function of the gene or to fill additional gaps in the genome, as described earlier. SNP validation showed that the SNPs rs3730862 and rs20580 were abundant among Indian subpopulations while four other SNPs were found restricted to less than or equal to three subpopulations. MAFs showed huge variation across Indian subpopulations, exemplifying the enormous human population diversity of India. Further, MAFs of both the LIG1 SNPs, rs20580 and rs3730862 depicted a reversal of status from minor to major allele in some subpopulations, a feature possibly indicative of gradual stabilization of the SNP in course of time (Miller and Kwok 2001). A stark difference in allele frequencies was observed both within and between linguistic subpopulation clusters of India. Higher MAFs were mostly observed in the AustroAsiatic subpopulation clusters and the IPs or isolated tribal populations across different linguistic subgroups. Moreover, the MAFs of the Austro-Asiatic and Tibeto-Burman linguistic clusters were comparable with each other and also mostly higher than their Dravidian and Indo-European counterparts. The MAFs isolated tribal populations were strikingly higher than the LPs or large endogamous populations, which suggested a faster rate of stabilization of the polymorphism in highly endogamous isolated subpopulations. The reason behind the higher MAFs among tribal populations might be due to genetic drift that occurs rapidly in small isolated populations resulting in quicker accumulation of distinctive allele frequencies (Rosenberg et al. 2001). The MAFs for most of the polymorphisms seem to follow a similar pattern across the subpopulation clusters with the Dravidian and Indo-European subpopulation clusters being Journal of Genetics, Vol. 93, No. 2, August 2014 Amit Kumar Mitra et al. very different from the Indo-European and Austro-Asiatic clusters. Subsequently, the MAF data generated on the Indian linguistic subpopulation clusters in the current study were compared with the corresponding data on major world populations included in the International HapMap database (The International HapMap Consortium 2003). The minor allele frequencies of the large endogamous populations among the Indo-European clusters were consistently comparable with the data available for the population CEU that consists of individuals from European (Caucasian) ancestry. On the other hand, the Tibeto-Burman linguistic subpopulation cluster matched quite closely with that of JPT (Japanese). Bioinformatic analysis involving prediction of the change in protein function owing to the occurrence of the identified nonsynonymous SNPs revealed that the amino acid substitutions resulting from the SNPs rs4987070 (D72G), rs4987181 (P52L) and rs11666150 (Q892H) have potentially damaging/adverse effect on protein function which is also supported by a recent publication (Singh et al. 2011). Since these SNPs were found to be private polymorphisms restricted to a few subpopulations, this finding may be significant while investigating related complex disorders in these subpopulations. Genotype–phenotype association analysis between LIG1 SNPs and LIG1 gene expression among the unrelated CEPH samples showed significant association of the SNPs rs16981519 and rs3730837 with LIG1 expression. This shows that polymorphisms in LIG1 may affect its expression and may therefore change its function. Conclusion The results obtained from this two-stage SNP exploration study point out towards the uniqueness of the Indian subpopulation clusters with respect to the presence or absence of certain SNPs, the genotype and allele frequency patterns among various populations and also highlight the differences between the linguistic subdivisions of the country. The findings of the present study underscore the fact that Indian population must be investigated for its plausible existence as a separate entity from the commonly inferred major global population clusters including Africa, Eurasia (Europe, Middle East and Central/South Asia), East Asia, Oceania and America (Rosenberg et al. 2002). This is the first report studying the presence of LIG1 polymorphisms in such a large scale in any subpopulation of the world and the Indian population in particular. The strength of the current study lies in the comprehensive collection of information on various factors through a well prepared robust questionnaire, designed with the help of trained anthropologists, clinicians and scientists, which eliminates the potential for selection bias. The data generated from this study may have wide-ranging applications for further epidemiological and public health related research on the Indian population. Acknowledgements AKM and AS are recipients of Senior Research Fellowship from Council of Scientific and Industrial Research, India. The work was supported by CSIR network projects CMM0016, CMM0018, NWP0034. This paper bears communication number 7511 of CDRI. References Bohr V. A. 1995 DNA repair fine structure and its relations to genomic instability. Carcinogenesis 16, 2885–2892. Cartharius K., Frech K., Grote K., Klocke B., Haltmeier M., Klingenhoff A. et al. 2005 MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 21, 2933–2942. Cheng L., Eicher S. A., Guo Z., Hong W. K., Spitz M. R. and Wei Q. 1998 Reduced DNA repair capacity in head and neck cancer patients. Cancer Epidemiol. Biomarkers Prev. 7, 465–468. Gadgil M., Joshi N. V., Prasad U. V., Manoharan S. and Patil S. 1998 In the Indian human heritage. pp. 100-129, Universities Press, Hyderabad India. Genomes Project Consortium 2010 A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073. Genomes Project Consortium 2012 An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65. Grierson G. A. 1927 A linguistic survey of India. Superintendent of Government Printing, Calcutta, India. Indian Genome Variation Consortium 2005 The Indian Genome Variation database (IGVdb): a project overview. Hum. Genet. 118, 1–11. Indian Genome Variation Consortium 2008 Genetic landscape of the people of India: a canvas for disease gene exploration. J. Genet. 87, 3–20. International HapMap Consortium 2010 Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58. The International HapMap Consortium 2003 The International HapMap Project. Nature 426, 789–796. Ma L., Hoeijmakers J. H. and van der Eb A. J. 1995 Mammalian nucleotide excision repair. Biochim. Biophys. Acta 1242, 137– 163. Malhotra K. C. 1978 Morphological composition of the people of India. J. Hum. Evol. 7, 45–63. Miller R. D. and Kwok P. Y. 2001 The birth and death of human single-nucleotide polymorphisms: new experimental evidence and implications for human history and medicine. Hum. Mol. Genet. 10, 2195–2198. Miller S. A., Dykes D. D. and Polesky H. F. 1988 A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 16, 1215. Ng P. C. and Henikoff S. 2003 SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814. Nickerson D. A., Tobe V. O. and Taylor S. L. 1997 PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res 25, 2745–2751. Pattanayak D. P., Balasubramanian D. and Rao N. A. 1998 The language heritage of India. The Indian human heritage. pp. 95-99, University Press, Hyderabad India. Petrini J. H., Xiao Y. and Weaver D. T. 1995 DNA ligase I mediates essential functions in mammalian cells. Mol. Cell Biol. 15, 4303– 4308. Ramensky V., Bork P. and Sunyaev S. 2002 Human nonsynonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894–3900. Risley H. H. 1915 The people of India. Thacker Spink, Calcutta, India. Journal of Genetics, Vol. 93, No. 2, August 2014 LIG1 polymorphisms: the Indian scenario Rosenberg N. A., Burke T., Elo K., Feldman M. W., Freidlin P. J., Groenen M. A. et al. 2001 Empirical evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds. Genetics 159, 699–713. Rosenberg N. A., Pritchard J. K., Weber J. L., Cann H. M., Kidd K. K., Zhivotovsky L. A. et al. 2002 Genetic structure of human populations. Science 298, 2381–2385. Roychoudhury S., Roy S., Basu A., Banerjee R., Vishwanathan H., Usha Rani M. V. et al. 2001 Genomic structures and population histories of linguistically distinct tribal groups of India. Hum. Genet. 109, 339–350. Saitou N. and Nei M. 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. Singh A. A., Sivakumar D. and Somvanshi P. 2011 Cataloguing functionally relevant polymorphisms in gene DNA ligase I: a computational approach. 3 Biotech. 1, 47–56. Singh K. S. 2002 People of India: introduction national series. Oxford University Press, Delhi, India. Stephens M. and Scheet P. 2005 Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449–462. Timson D. J., Singleton M. R. and Wigley D. B. 2000 DNA ligases in the repair and replication of DNA. Mutat. Res. 460, 301– 318. Tlsty T. D., Briot A., Gualberto A., Hall I., Hess S., Hixon M. et al. 1995 Genomic instability and cancer. Mutat. Res. 337, 1–7. Received 6 August 2013, accepted 10 April 2014 Published online: 14 August 2014 Journal of Genetics, Vol. 93, No. 2, August 2014
© Copyright 2024 ExpyDoc