LIG1 polymorphisms - Indian Academy of Sciences

c Indian Academy of Sciences
RESEARCH ARTICLE
LIG1 polymorphisms: the Indian scenario
AMIT KUMAR MITRA1,2 , ASHOK SINGH2 , INDIAN GENOME VARIATION CONSORTIUM3 and
SRIKANTA KUMAR RATH2 ∗
1
Institute of Human Genetics, Department of Genetics, Cell Biology and Development, University of Minnesota,
Minneapolis, MN 55455, USA
2
CSIR-Central Drug Research Institute, Lucknow 226 031, India
3
Nodal Laboratory, CSIR-Institute of Genomics and Integrative Biology, New Delhi 110 007, India
Abstract
Elucidation of the genetic diversity and relatedness of the subpopulations of India may provide a unique resource for future
analysis of genetic association of several critical community-specific complex diseases. We performed a comprehensive exploration of single nucleotide polymorphisms (SNPs) within the gene DNA ligase 1 (LIG1) among a multiethnic panel of Indian
subpopulations representative of the ethnic, linguistic and geographical diversity of India using a two-stage design involving DNA resequencing-based SNP discovery followed by SNP validation using sequenom-based genotyping. Thirty SNPs
were identified in LIG1 gene using DNA resequencing including three promoter SNPs and one coding SNP. Following SNP
validation, the SNPs rs20580/C19008A and rs3730862/C8804T were found to have the most widespread prevalence with
noticeable variations in minor allele frequencies both between the Indian subpopulation groups and also from those reported
on other major world populations. Subsequently, SNPs found in Indian subpopulations were analysed using bioinformaticsbased approaches and compared with SNP data available on major world populations. Further, we also performed genotype–
phenotype association analysis of LIG1 SNPs with publicly available data on LIG1 mRNA expression in HapMap samples. Results showed polymorphisms in LIG1 affect its expression and may therefore change its function. Our results stress
upon the uniqueness of the Indian population with respect to the worldwide scenario and suggest that any epidemiological
study undertaken on the global population should take this distinctiveness in consideration and avoid making generalized
conclusions.
[Mitra A. K., Singh A., Indian Genome Variation Consortium and Rath S. K. 2014 LIG1 polymorphisms: the Indian scenario. J. Genet. 93,
xx–xx ]
Introduction
India occupies only 2.4% of the world’s land area, while
it supports over 17.5% of the world’s population (Census
of India 2001). India has more than 2000 ethnic groups,
all major religions of the world, four major language families (Indo-European, Dravidian, Austro-Asiatic and TibetoBurman) and the morphological subgroups Caucasoid, Mongoloid, Australoid and Negritos (Grierson 1927; Malhotra
1978). Further complexity is lent by the great variation that
occurs across this population on social parameters such as
income and education. Such a rich repertoire of diversity
provides valuable repositories for genetic association studies. Elucidation of the genetic diversity of the numerous,
large, localized, isolated subpopulation groups within India
is therefore a primary prerequisite for trying to understand
∗ For correspondence. E-mail: [email protected]; [email protected].
the genetic basis of several critical complex disorders. However, genome projects undertaken on the world populations
so far have historically overlooked the major Indian population subgroups (Genomes Project Consortium 2010, 2012;
International HapMap Consortium 2003, 2010).
DNA ligase I (LIG1) is the main replicative ligase of
eukaryotes active in the replication forks of dividing chromosomes where it is critical for joining of Okazaki fragments
and completing DNA synthesis and also to complete base
excision repair (BER) pathway that involves resynthesis of
DNA by sealing the gaps in the resynthesized regions, since
unsealed nicks in genomic DNA are potentially dangerous
for the cell (Ma et al. 1995; Timson et al. 2000). Therefore,
DNA ligase is an essential enzyme not only in the synthesis
of new DNA during cell division, but also in maintaining the
integrity of the genome. Defects in DNA repair pathway is
considered to play a central role in cancer biology, whereby
some individuals are at very high risk of cancer due to
Keywords. LIG1; MAF; SNP; HapMap.
Journal of Genetics, Vol. 93, No. 2, August 2014
Amit Kumar Mitra et al.
Table 1. Detailed information on the Indian subpopulations included in the study (each population was labelled on the basis of linguistic
group).
Three
letter code
Subpopulation
type
Kurku
Sahariya
Baiga
Santhal
Munda
KKU
SHY
BAI
STL
MUN
Tribe
Tribe
Tribe
Tribe
Caste
AA-E-IP3
AA-NE-IP1
AA-W-IP1
DR-C-IP1
DR-C-IP2
Juang
Khasi
Kolis
Gond
Vaidiki Brahmin
JNG
KHS
KLS
GND
VKB
Tribe
Caste
Caste
Tribe
Tribe
DR-C-LP1
DR-E-IP1
DR-S-IP1
DR-S-IP2
DR-S-IP3
DR-S-IP4
IE-E-LP1
IE-E-LP2
IE-E-LP3
IE-E-LP4
IE-NE-IP1
IE-NE-LP1
Bison Horn Maria
Madia
Paniyan
Chenchu
Halakki
Kuruman
Chik Baraik
Kayastha (WB)
Mahishya
Oriya Brahmin
Hajong
Namsudra
BHM
MDA
PNY
CNC
HLK
KRM
CIB
KWB
MHA
ORB
HJG
NSD
Tribe
Tribe
Tribe
Tribe
Tribe
Tribe
Caste
Caste
Tribe
Caste
Tribe
Caste
IE-N-IP1
IE-N-IP2
IE-N-LP1
Kannet (HP)
Tharu
Chamar (Hp)
KNT
THR
CMR
Tribe
Tribe
Caste
IE-N-LP10
IE-N-LP11
Saryuparin Brahmin
Khatri
SPB
KHT
Caste
Tribe
IE-N-LP2
Jats
JAT
Caste
IE-N-LP3
Kanyakubj Brahmin
KKB
Caste
IE-N-LP5
IE-N-LP6
Kashmiri Pandit
Kayastha (Up)
KSP
KUP
Caste
Caste
IE-N-LP7
Koli (Hp)
KOL
Caste
IE-N-LP8
IE-N-LP9
Rajput (Uttarakhand)
Rajput (HP)
RJU
RJH
Caste
Caste
IE-N-SP1
IE-N-SP2
IE-N-SP3
IE-N-SP4
IE-N-SP5
IE-S-IP1
IE-W-IP1
IE-W-IP2
IE-W-LP1
IE-W-LP2
IE-W-LP3
IE-W-LP4
TB-NE-LP1
TB-N-IP1
TB-N-SP1
Aggarwals
Ramgariah Sikh
Sunni
Shia
Syed (Sunni)
Hakkipikki
Bhil
Dongri Bhil
Deshastha Brahmins
Kokanastha Brahmins
Paliwal Brahmin
Patidar
Meitei
Spiti (HP)
Buddhists
AGL
RAM
SUI
SHI
SYD
HPK
BHL
DBH
DEB
KOB
PAL
PTD
MEI
SPT
BUD
Caste
Religious group
Religious group
Religious group
Religious group
Tribe
Tribe
Tribe
Caste
Caste
Caste
Caste
Caste
Tribe
Religious group
TB-N-SP2
Buddhist heterogenous
BUH
Religious group
Population ID
Subpopulation name
AA-C-IP1
AA-C-IP4
AA-C-IP5
AA-E-IP1
AA-E-IP2
Geographical location within India
Madhya Pradesh, Maharashtra
Madhya Pradesh, Rajasthan, Uttar Pradesh
Madhya Pradesh, Chhattisgarh
Jharkhand, Bihar, Orissa, West Bengal, Tripura, Assam
Bihar, Jharkhand, West Bengal, Orissa,
Madhya Pradesh, Tripura
Orissa
Meghalaya, Tripura, Assam
Madhya Pradesh, Gujarat, Maharashtra, Daman & Diu
Chhattisgarh
Chhattisgarh, Madhya Pradesh, Andhra Pradesh,
Maharashtra, Bihar, Orissa, Assam
Chattisgarh
Orissa
Kerala, Tamil Nadu
Andhra Pradesh
Karnataka
Kerala, Tamil Nadu
Jharkhand, West Bengal
West Bengal, Delhi, Jharkhand, Tripura, Assam, Orissa
West Bengal, Bihar, Orissa
Orissa
Meghalaya, Assam
Assam, West Bengal, Tripura, Orissa, Mizoram, Manipur,
Meghalaya, Bihar, Madhya Pradesh
Himachal Pradesh, Punjab, Haryana
Uttarakhand, Uttar Pradesh, Bihar
Himachal Pradesh, Uttar Pradesh, Bihar, Delhi, Haryana,
Punjab, Tripura, Gujarat, Rajasthan, Madhya Pradesh,
Orissa, West Bengal, Goa, Daman and Diu
Uttar Pradesh, Madhya Pradesh, Bihar, Delhi
Punjab, Haryana, Delhi, Himachal Pradesh, Jammu and
Kashmir, Uttar Pradesh, Bihar, Rajasthan, Gujarat,
Maharashtra, Tamil Nadu
Delhi, Haryana, Uttar Pradesh, Punjab, Rajasthan, Himachal
Pradesh, Jammu and Kashmir, Madhya Pradesh
Uttar Pradesh, Jharkhand, Bihar, Madhya Pradesh,
Chhattisgarh, Delhi
Jammu and Kashmir, Delhi
Uttar Pradesh, Delhi, Bihar, Madhya Pradesh,
Rajasthan, Punjab
Himachal Pradesh, Punjab, Haryana, Delhi, Maharashtra,
Uttar Pradesh, Rajasthan, Karnataka
Uttarakhand, Uttar Pradesh, Gujarat, Rajasthan
Himachal Pradesh, Punjab, Haryana, Delhi, Orissa, Bihar,
West Bengal, Assam, Karnataka, Andhra Pradesh
Haryana
Punjab, Delhi, Rajasthan
Uttar Pradesh
Uttar Pradesh
Jammu and Kashmir
Karnataka
Gujarat, Rajasthan, Madhya Pradesh, Maharashtra, Tripura
Gujarat, Rajasthan
Maharashtra
Maharashtra, Goa, Karnataka
Rajasthan
Gujarat
Manipur, Assam
Himachal Pradesh
Jammu and Kashmir, Himachal Pradesh, Sikkim,
Arunachal Pradesh, Assam
Jammu and Kashmir
IE, Indo-European; TB, Tibeto-Burman; DR, Dravidian; AA, Austro-Asiatic, followed by geographical zone E, east; W, west; N, north;
NE, northeast; S, south, and size and nature of population LP, large population; SP, small population; IP, isolated population; The OG-W-IP
derived from an African population was used as outlier.
Journal of Genetics, Vol. 93, No. 2, August 2014
LIG1 polymorphisms: the Indian scenario
inherited single-nucleotide polymorphisms (SNPs) in genes
controlling DNA repair which may impair their function and
contribute to genetic susceptibility towards various cancers
(Bohr 1995; Ma et al. 1995; Tlsty et al. 1995; Cheng et al.
1998).
The present study focusses on the identification and validation of SNPs within the exonic and flanking regions of the
gene LIG1 (gene name ligase I, DNA, ATP-dependent; gene
ID, 3978; OMIM ID, 126391; gene length, 54858 bp; chromosomal location, 19q13.2–19p13.3) within selected Indian
subpopulations representative of the ethnic, linguistic and
geographic diversity of India to generate an India-specific
information resource of the allele and genotype frequencies
of SNPs in LIG1.
Materials and methods
Population identification and sample collection
Details on population identification, study design and sample
collection are available elsewhere (Indian Genome Variation
Consortium 2005, 2008). Briefly, a two-stage study design
was employed which involved an initial step of SNP discovery within all the 27 exons and flanking regions of the gene
LIG1 carried out by bidirectional DNA sequencing on a set
of 43 samples representative of the linguistic groups, morphological, regional and religious diversity of India. This was
followed by validation of SNPs identified in the SNP discovery phase using Sequenom’s MassArray genotyping technology (Sequenom, San Diego, USA) on a larger multiethnic
Indian panel of 576 representative samples from 24 Indian
subpopulations which included 14 from Indo-European (IE),
three from Tibeto-Burman (TB), four from Dravidian (DR)
and two from Austro-Asiatic (AA) linguistic subpopulation
clusters drawn from geographically and ethnically diverse
subpopulations. Each population was labelled on the basis
of language, followed by geographical zone and ethnic category, e.g. IE-E-LP1, where IE stands for Indo-European
linguistic subgroup, E stands for eastern part of India and
LP signifies large population. Details of the subpopulations
included in our study are provided in table 1. The study
was approved by Medical Ethics Committee of Central Drug
Research Institute (CDRI).
DNA isolation
Genomic DNA from population samples was extracted from
peripheral blood leucocytes using modified salting out procedure (Miller et al. 1988), quantitated and stored at −20◦ C.
Primer designing and PCR
For sequence analysis among population samples, LIG1
DNA sequence was obtained from NCBI RefSeq sequence
database (GenBank accession no. NT_011109.13). PCR
primers were designed for all the exons and flanking regions
of the gene LIG1 using the PrimerSelect module of Lasergene ver. 6.0 (DNASTAR, Madison, USA) as shown in
figure 1. Detailed list of the primers designed is available
on request. The primer sequences were verified using NCBI
BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and UCSC
in silico PCR (http://genome-mirror.duhs.duke.edu/cgi-bin/
hgPcr) to eradicate the possibility of amplification of any
nonspecific DNA sequences. All the PCR reactions were carried out using reagents from Fermentas Life Sciences (Fisher
Scientific, Pittsburgh, USA) in a total reaction volume of 50
μL containing nearly 100 ng genomic DNA, 1.5 U Taq polymerase in 1× PCR buffer, 1.5 mM MgCl2 , 0.2 mM dNTPs
Figure 1. Snapshot from UCSC genome browser depicting the human DNA ligase 1 (LIG1) genes and the PCR amplicons (AMP).
Journal of Genetics, Vol. 93, No. 2, August 2014
Amit Kumar Mitra et al.
and 15 pM of each primer. Thermal cycling conditions were
as follows: initial denaturation step at 95◦ C for 10 min, 31
cycles of PCR consisting of denaturation at 94◦ C for 1 min,
annealing at annealing temperatures optimized using gradient PCR for 1 min and extension at 72◦ C for 1 min, followed
by a final extension step at 72◦ C for 5 min. The PCR products
were visualized by electrophoresis on 1.5% agarose gel.
at the site of the substitution (Ng and Henikoff 2003) and
PolyPhen (Polymorphism Phenotyping, http://genetics.bwh.
harvard.edu/pph/index.html) that predicts the functional
importance of the amino acid substitution by merging the
conservation score with physicochemical differences and
structural features of the polymorphic variants (Ramensky
et al. 2002).
DNA resequencing and SNP detection
Comparison with HapMap database
Bidirectional DNA sequencing was performed on ABI 3100
and ABI3730 capillary-based sequencers (Applied Biosystems, Foster City, USA). Sequence assembly and SNP analysis were performed using the Phred-Phrap-Consed package (University of Washington; Seattle, WA; http://www.
phrap.org/phredphrapconsed.html) that detects the presence of heterozygous single nucleotide substitutions following fluorescence-based sequencing of PCR products
(Nickerson et al. 1997) and the Multiple Sequence Alignment module Seqman of Lasergene 6.0 software (DNA
STAR, Madison, USA).
LIG1 SNP data obtained in our study on the Indian subpopulations were further compared with genotype data available on the major world populations available through the
HapMap database belonging to Chinese (CHB), Japanese
(JPT), African (YRI) or CEPH (European) ancestries (The
International HapMap Consortium 2003). The reason behind
including major world populations from the International
HapMap project was to use the publicly available SNP data
as well as genomewide gene expression data (GSE7761)
on HapMap samples for subsequent genotype–phenotype
association analysis.
SNP validation
Analysis of association of LIG1 SNPs with its expression in
HapMap populations
SNPs were selected for the next phase of study (SNP validation) based on strict selection criteria which include minor
allele frequencies (MAFs) of the SNPs greater than 10% in
the first stage of SNP exploration, location within exonic
or regulatory regions and spacing between SNPs (> 1 kb).
Some reported SNPs were force-included into the SNP validation procedure, owing to potential importance in context of the function of the gene or to fill additional gaps
in the genome to account for uniform spacing between
SNPs. Genotyping was performed using matrix assisted
laser desorption ionization (MALDI) time of flight (TOF)
mass spectroscopy (MS) technology-based chemistry on the
Sequenom’s MassArray Platform (Sequenom, San Diego,
USA). Quality control (QC) performed prior to consideration
of each SNP for analysis included Hardy–Weinberg checks
using Fisher’s exact test at 5% significance level.
Bioinformatic analysis
The SNPs investigated in the discovery and validation phases
of the study were analysed bioinformatically using the following tools: (i) Transcription factor binding sites promoter SNPs were evaluated for the loss/gain of known cis
regulatory motif-binding sites using MatInspector 8.0 ver.
software (Genomatix Software GmbH, Munich, Germany)
(Cartharius et al. 2005). (ii) Prediction of consequences on
protein structure and/or function: Phenotypic implications of
the nonsynonymous SNPs (nsSNPs) found within LIG1 were
predicted using programs such as SIFT (Sorting Intolerant
From Tolerant, http://sift.bii.a-star.edu.sg/) that predicts the
impact of any amino acid substitution on protein function and
computes a score representing the likelihood of mutability
To further investigate the role of LIG1 SNPs, SNP information for all the LIG1 SNPs genotyped in the CEPH populations was retrieved from the HapMap SNP database.
The CEPH population subgroup consists of 90 samples
from 30 trios (two parents and one child) belonging to
the European ancestry. LIG1 gene expression data generated using Affymetrix ExonArray was publicly available
and downloaded from NCBI’s Gene Expression Omnibus
database (GSE7761). Genotype–phenotype association was
performed between LIG1 SNPs and its expression within all
the unrelated (parents only) samples.
Statistical analysis
Genotype frequencies were calculated using the program
Gencount while Maxlik1 was used to calculate maximum
likelihood estimates of allele frequencies, standard deviations and Hardy–Weinberg chi-square values. Allhet was
used to generate chi-square values for testing the homogeneity of the data on SNPs between the Indian subpopulations included in the study. Gencount, Maxlik1 and
Allhet programs were designed in-house by collaborators
for the analysis. Dispan (http://iubio.bio.indiana.edu/soft/
molbio/ibmpc/), a genetic distance and phylogenetic analysis program, was used to calculate average heterozygosity
(H0 ) and its standard error for every SNP in each subpopulation under study, Genetic diversity (Ht ) and its associated
parameters such as interpopulation gene variation (GST ) and
intrapopulation gene variation (Hs ), etc. The program also
has inbuilt options for performing bootstrapping (Saitou and
Nei 1987).
Journal of Genetics, Vol. 93, No. 2, August 2014
LIG1 polymorphisms: the Indian scenario
Haplotype analysis was performed from phase unknown
genotype data using the software Phase ver. 2.1, that uses
a Bayesian Markov chain–Monte Carlo algorithm (Stephens
and Scheet 2005). The population OG-W-IP derived from an
African population (Singh 2002) was used as an outlier in the
phylogenetic tree.
The Kruskal–Wallis test and Wilcoxon’s rank-sum test
were used to perform genotype–phenotype association analysis between HapMap SNPs and LIG1 gene expression.
Results
Results of two-stage SNP exploration study
Thirty SNPs including three promoter SNPs and one SNP
in the coding region were identified in LIG1 gene during the first stage of SNP exploration process using resequencing (table 2). Eleven SNPs were found in the HapMap
database within major world populations while seven SNPs
were found in the NCBI’s dbSNP database but they were
not present in HapMap. Detailed comparison of minor allele
frequencies of the SNPs identified in the discovery phase and
the SNP data available on major HapMap subpopulations is
provided in figure 2. Twelve novel SNPs were found that
were not present in HapMap or NCBI’s dbSNP database at
the time of commencement of this study.
Based on the criteria for selection of SNPs described in
the Materials and methods section, the final list for SNP
validation study using sequenom-based genotyping included
10 SNPs; rs3730862, rs4987181, rs4987070, rs12981963,
rs20580, rs11879148, rs3730933, rs3730966, rs3731003 and
rs11666150.
During validation, four of the LIG1 SNPs (rs3730933,
rs3730966, rs11879148 and rs12981963) were found to be
monomorphic. Four SNPs (rs11666150/A54472C, rs37310
03/C42344T, rs4987070/A8945G and rs4987181/C8885T)
were restricted to less than three subpopulations of India.
Minor alleles of rs11666150 were found in the subpopulations IE-E-LP2 and OG-W-IP, rs3731003 mutant was found
in OG-W-IP, rs4987070 mutant was found in AA-C-IP5,
while the SNP rs4987181 was observed only in one sample
from the subpopulation DR-S-LP3. Consequently, two SNPs
Table 2. Frequencies of LIG1 SNPs identified during discovery phase following DNA sequencing. Results were analysed using
Phred/Phrap/Polyphred/Consed pipeline and DNAStar.
SNP ID
rs439132
rs20579
rs3730849
rs3730853
rs56224917
rs3730861
rs3730862
rs20580
rs3730931
rs2288878
rs2288880
28668C/T
rs3730976
rs2288882
29629A/G
31601G/C
34247C/A
rs392891
rs3730944
34547T/G
34569C/T
34607G/T
37611C/T
rs3731004
37782A/T
38112C/T
38188G/A
rs3731007
42189C/A
rs3731009
Position
from ATG
Location in LIG1
gene (NT_011109.13
reverse complement)
Alleles
MAF
Consequence to transcript
Validation status
–132
–48
–13
2897
2966
3954
3985
14230
21542
21900
28549
28668
29311
29548
29629
31601
34247
34464
34485
34547
34569
34607
37611
37652
37782
38112
38188
42179
42189
42394
4647
4731
4766
7675
7744
8732
8763
19008
26320
26678
33327
33446
34089
34326
34407
36379
39025
39242
39263
39325
39347
39385
42389
42430
42560
42890
42966
46957
46967
47172
A:G
C:T
C:T
T:A
T:G
C:T
C:T
A:C
A:G
G:A
G:C
C:T
A:A
C:T
A:G
G:C
C:A
T:A
G:A
T:G
C:T
G:T
C:T
G:C
A:T
C:T
G:A
C:T
C:A
G:A
0.167
0.045
0.208
0.071
0.024
0.093
0.426
0.462
0.045
0.281
0.167
0.05
0.5
0.3
0.028
0.042
0.056
0.365
0.058
0.019
0.019
0.071
0.017
0.017
0.158
0.024
0.02
0.175
0.083
0.083
5PRIME_UTR
5PRIME_UTR
5PRIME_UTR
Intronic
Intronic
Intronic
Intronic
Synonymous_coding
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
Intronic
HapMap
HapMap
HapMap
Not in HapMap
Not in HapMap
Not in HapMap
HapMap
HapMap
HapMap
HapMap
HapMap
Novel
HapMap
Not in HapMap
Novel
Novel
Novel
HapMap
Not in HapMap
Novel
Novel
Novel
Novel
Not in HapMap
Novel
Novel
Novel
HapMap
Novel
Not in HapMap
NCBI GenBank accession no. NT_011109.13 (reverse complement); MAF, minor allele frequency.
Journal of Genetics, Vol. 93, No. 2, August 2014
Amit Kumar Mitra et al.
1
0.9
Minor allele frequencies
0.8
0.7
MAF_DSNP
0.6
MAF_CEPH
0.5
MAF_YRI
MAF_CHB
0.4
MAF_JPT
0.3
0.2
0.1
0
rs20579
rs20580
rs2288878 rs3730849 rs3730862 rs3730931 rs3730976 rs3731007
rs392891
rs439132
Figure 2. Comparison of minor allele frequencies (MAFs) of LIG1 SNPs between discovery phase (DSNP) results and data available on
major HapMap subpopulations.
(rs20580/C19008A and rs3730862/C8804T) were found to
be present in more than three subpopulations across the country. Allele and genotype frequencies of the SNPs, rs20580
and rs3730862 are provided in tables 3 and 4, respectively.
The overall MAFs of the SNPs rs20580 (A) and rs3730862
(T) in the Indian population were 0.48 and 0.39, respectively.
Considerable variations in MAFs were noted across the four
linguistic lineages and geographical zones of India, ranging
from 0.24 in IE-N-LP9 to 0.75 in IE-NE-IP1 for rs20580 (A)
and from as low as 0.11 in IE-W-LP3 to 0.77 in IE-NE-IP1
for SNP rs3730862 (T). Among the linguistic groups, the
highest MAF for rs20580 (A) was observed in the AustroAsiatic and Tibeto-Burman populations (0.63), whereas the
highest MAF observed for rs3730862 (T) was in the AustroAsiatic (0.58) subpopulation. The Dravidian linguistic cluster exhibited lowest MAF for both of the polymorphisms,
0.43 and 0.37 for rs20580 (A) and rs3730862 (T), respectively. The mean heterozygosity values, considering both
Table 3. Genotype and allele frequencies for the SNP rs20580 (LIG1) among the Indian subpopulations following SNP validation.
Subpopulations
n
CC
AA-C-IP5
AA-E-IP3
DR-C-IP2
DR-S-IP4
DR-S-LP2
DR-S-LP3
IE-E-IP1
IE-E-LP2
IE-E-LP4
IE-NE-IP1
IE-NE-LP1
IE-N-IP2
IE-N-LP1
IE-N-LP5
IE-N-LP9
IE-N-SP4
IE-W-LP1
IE-W-LP2
IE-W-LP3
IE-W-LP4
OG-W-IP
TB-NE-LP1
TB-N-IP1
TB-N-SP1
Overall (India)
17
21
19
23
23
23
22
22
23
22
23
23
18
22
23
22
21
20
23
23
19
22
22
23
519
0.00
0.14
0.37
0.30
0.43
0.52
0.14
0.36
0.43
0.05
0.39
0.09
0.56
0.27
0.61
0.32
0.29
0.45
0.52
0.30
0.42
0.09
0.23
0.13
0.31
Genotype frequencies
CA
AA
0.71
0.52
0.32
0.30
0.30
0.35
0.55
0.45
0.35
0.41
0.35
0.35
0.33
0.50
0.30
0.55
0.38
0.40
0.43
0.39
0.42
0.36
0.36
0.57
0.41
0.29
0.33
0.32
0.39
0.26
0.13
0.32
0.18
0.22
0.55
0.26
0.57
0.11
0.23
0.09
0.14
0.33
0.15
0.04
0.30
0.16
0.55
0.41
0.30
0.28
2n
34
42
38
46
46
46
44
44
46
44
46
46
36
44
46
44
42
40
46
46
38
44
44
46
1038
Journal of Genetics, Vol. 93, No. 2, August 2014
Allele frequencies
p (C)
q (A)
0.35
0.40
0.53
0.46
0.59
0.70
0.41
0.59
0.61
0.25
0.57
0.26
0.72
0.52
0.76
0.59
0.48
0.65
0.74
0.50
0.63
0.27
0.41
0.41
0.52
0.65
0.60
0.47
0.54
0.41
0.30
0.59
0.41
0.39
0.75
0.43
0.74
0.28
0.48
0.24
0.41
0.52
0.35
0.26
0.50
0.37
0.73
0.59
0.59
0.48
SD
0.014
0.012
0.013
0.011
0.011
0.010
0.011
0.011
0.011
0.010
0.011
0.010
0.012
0.011
0.009
0.011
0.012
0.012
0.010
0.011
0.013
0.010
0.011
0.011
0.000
LIG1 polymorphisms: the Indian scenario
Table 4. Genotype and allele frequencies for the SNP rs3730862 (LIG1) within the Indian subpopulations following SNP validation.
Subpopulations
n
WW (CC)
AA-C-IP5
AA-E-IP3
DR-C-IP2
DR-S-IP4
DR-S-LP2
DR-S-LP3
IE-E-IP1
IE-E-LP2
IE-E-LP4
IE-NE-IP1
IE-NE-LP1
IE-N-IP2
IE-N-LP1
IE-N-LP5
IE-N-LP9
IE-N-SP4
IE-W-LP1
IE-W-LP2
IE-W-LP3
IE-W-LP4
OG-W-IP
TB-NE-LP1
TB-N-IP1
TB-N-SP1
Overall (India)
11
21
16
21
20
21
20
17
22
22
22
22
21
21
23
20
23
22
22
22
23
21
22
23
498
0.18
0.19
0.56
0.38
0.50
0.57
0.25
0.53
0.64
0.05
0.50
0.09
0.67
0.38
0.70
0.40
0.43
0.41
0.77
0.45
0.91
0.19
0.23
0.26
0.43
Genotype frequencies
WM (CT)
MM (TT)
0.36
0.52
0.44
0.24
0.25
0.33
0.50
0.41
0.32
0.36
0.32
0.45
0.29
0.38
0.26
0.40
0.39
0.41
0.23
0.50
0.09
0.38
0.32
0.52
0.36
the LIG1 polymorphisms together, ranged from 0.296 (IE-ELP4) to 0.512 (TB-NE-LP1) (table 5). Gst , Ht and Hs values
Table 5. Average heterozygosity (H0 ) values for the gene LIG1
among Indian subpopulations.
Subpopulations
AA-C-IP5
AA-E-IP3
DR-C-IP2
DR-S-IP4
DR-S-LP2
DR-S-LP3
IE-E-IP1
IE-E-LP2
IE-E-LP4
IE-NE-IP1
IE-NE-LP1
IE-N-IP2
IE-N-LP1
IE-N-LP5
IE-N-LP9
IE-N-SP4
IE-W-LP1
IE-W-LP2
IE-W-LP3
IE-W-LP4
OG-W-IP
TB-NE-LP1
TB-N-IP1
TB-N-SP1
rs3730862
rs20580
LIG1 SNPs_combined
(rs3730862,rs20580)
0.485
0.315
0.401
0.353
0.359
0.413
0.510
0.496
0.507
0.495
0.384
0.511
0.513
0.511
0.433
0.467
0.495
0.477
0.494
0.396
0.485
0.512
0.502
0.471
0.512
0.334
0.460
0.426
0.294
0.487
0.483
0.406
0.084
0.502
0.372
0.394
0.481
0.394
0.495
0.495
0.496
0.207
0.478
0.492
0.485
0.511
0.444
0.507
0.499
0.324
0.431
0.389
0.326
0.45
0.497
0.451
0.296
0.499
0.378
0.452
0.497
0.453
0.464
0.481
0.495
0.342
0.486
0.444
0.485
0.512
0.473
0.489
0.45
0.29
0.00
0.38
0.25
0.10
0.25
0.06
0.05
0.59
0.18
0.45
0.05
0.24
0.04
0.20
0.17
0.18
0.00
0.05
0.00
0.43
0.45
0.22
0.21
2n
Allele frequencies
p (C)
q (T)
22
42
32
42
40
42
40
34
44
44
44
44
42
42
46
40
46
44
44
44
46
42
44
46
996
0.36
0.45
0.78
0.50
0.63
0.74
0.50
0.74
0.80
0.23
0.66
0.32
0.81
0.57
0.83
0.60
0.63
0.61
0.89
0.70
0.96
0.38
0.39
0.52
0.61
0.64
0.55
0.22
0.50
0.38
0.26
0.50
0.26
0.20
0.77
0.34
0.68
0.19
0.43
0.17
0.40
0.37
0.39
0.11
0.30
0.04
0.62
0.61
0.48
0.39
SD
0.022
0.012
0.013
0.012
0.012
0.010
0.013
0.013
0.009
0.010
0.011
0.011
0.009
0.012
0.008
0.012
0.010
0.011
0.007
0.010
0.004
0.012
0.011
0.011
0.000
for the LIG1 SNPs taken together were 0.122, 0.491 and
0.431, respectively while the Gst , Ht and Hs values for individual SNPs were 0.102, 0.497 and 0.446, respectively for
rs3730462 and 0.142, 0.486 and 0.417, respectively, for
rs20580.
Comparison with HapMap database
When compared to the world population data on the HapMap
database, the overall MAF of rs20580 (A) was found to be
in close agreement with the central European [CEU] (0.46)
population but very different from the Japanese [JPT] (0.65),
Han Chinese [CHB] (0.58) and Yoruba [YRI] (0.53) populations. Consistently, the MAF of rs3730862 (T) observed
in the present study (0.37) was also found to be closer to
that of the central European [CEU] (0.33) and very different from the MAFs observed in Japanese [JPT] (0.58), Han
Chinese [CHB] (0.51) and Yoruba [YRI] (0.01) populations
(The International HapMap Consortium 2003) (table 6). Further analysis using the Indo-European linguistic subpopulation cluster alone demonstrated comparable MAFs for both
rs20580A and rs3730862T among LPs, while the isolated
populations (IPs/tribals) showed strikingly high MAFs.
Results of bioinformatic analysis
Bioinformatic analysis of the promoter SNPs rs439132A/G,
rs20579C/T and rs3730849C/T using MatInspector showed
differential transcription factor binding potential between
Journal of Genetics, Vol. 93, No. 2, August 2014
Amit Kumar Mitra et al.
Table 6. Comparison of mutant allele frequencies within the Indian population and between the Indian linguistic subpopulation clusters
and the world populations (data obtained from HapMap project).
rs20580 (LIG1)
SD
rs3730862 (LIG1)
SD
AA
DR
IE
TB
IND
0.62
0.43
0.46
0.63
0.48
0.056
0.037
0.020
0.042
0.016
0.58
0.35
0.37
0.57
0.39
0.062
0.038
0.020
0.043
0.015
CEU
CHB
JPT
YRI
0.46
0.58
0.65
0.53
0.047
0.053
0.051
0.054
0.33
0.51
0.58
0.01
0.043
0.053
0.053
0.008
Populations in the current experiment
Populations in the HapMap database
(AA, Austro-Asiatic; D, Dravidian; IE, Indo-Europeans; TB, Tibeto-Burman; IND, Overall mutant allele frequency of the Indian population;
CEU, CEPH (Utah residents with ancestry from northern and western Europe); YRI, Yoruba in Ibadan, Nigeria; JPT, Japanese in Tokyo,
Japan; CHB, Han Chinese in Beijing, China)
the wild type and mutant alleles (figure 3). Analysis of
the consequences of the nonsynonymous SNPs on protein
function using the protein prediction softwares SIFT and
polyphen showed that rs4987070 (D72G), rs4987181 (P52L)
and rs11666150 (Q892H) were potentially damaging, while
rs3731003 (T614I) was benign/tolerated.
Figure 3. Analysis of transcription factor-binding potential of the promoter SNPs
rs439132A/G, rs20579C/T and rs3730849C/T using MatInspector 8.0. (SNPs +/−
10 bp were considered for analysis. Detailed information and colour keys for matrix
families are provided).
Journal of Genetics, Vol. 93, No. 2, August 2014
LIG1 polymorphisms: the Indian scenario
p = 0.02
Log (LIG1 mRNA expression)
Log (LIG1 mRNA expression)
p = 0.024
Figure 4. Association of LIG1 SNPs with LIG1 mRNA expression among unrelated HapMap CEPH
samples. (LIG1 SNP information was obtained from HapMap database (http://www.HapMap.org);
LIG1 mRNA expression values for unrelated HapMap CEPH samples was obtained from Affymetrix
ExonArray data downloaded from NCBI’s Gene Expression Omnibus database (GSE7761)).
Analysis of genotype–phenotype association among HapMap
samples
The results of genotype–phenotype correlation in unrelated
CEPH samples using LIG1 ExonArray gene expression data
generated using NCBI’s Gene Expression Omnibus database
(GSE7761) are provided in figure 4. At a P value cutoff of
P<0.05, the SNPs rs16981519 and rs3730837 were found to
be significantly associated with LIG1 expression. Genotypes
with the minor alleles rs16981519T and rs3730837C were
associated with lower LIG1 gene expression compared to the
homozygous wild-type genotype.
Discussion
DNA ligases are a large family of evolutionarily related proteins that play important roles in a wide range of DNA
transactions, including chromosomal DNA replication, DNA
repair and recombinations in all kingdoms of life. LIG1
is a ATP-dependent DNA ligase that catalyses the joining of single-stranded breaks (nicks) in the phosphodiester
backbone of double-stranded DNA through condensation
reaction (Timson et al. 2000). Evidences from LIG1 deficiency syndromes indicate its distinct role in DNA damage repair pathway (Petrini et al. 1995). Although LIG1
polymorphisms have been dealt with in NCBI’s dbSNP and
HapMap databases on various world populations (YRI, JPT,
CHB, CEU), so far, no study has been carried out on the
overall distribution of LIG1 SNPs in the Indian population
which has huge sociocultural, linguistic and biological diversity (Genomes Project Consortium 2010, 2012; The International HapMap Consortium 2003, 2010). Therefore, we
found the need to create an India-specific SNP resource
providing detailed information on polymorphisms existing
in the gene LIG1 including subpopulation-specific allele and
genotype frequencies. We feel, this is a prerequisite for future
studies on the association of LIG1 polymorphisms with any
complex disorder within Indian subpopulations. Therefore,
with the help of trained anthropologists and cues from published literatures on the structure and variability of the Indian
population (Risley 1915; Malhotra 1978; Gadgil et al. 1998;
Pattanayak et al. 1998; Roychoudhury et al. 2001) a well
chalked out two-stage study was designed to fish out and
validate SNPs in LIG1 among representative samples from
the Indian subpopulations belonging to the major population
clusters of India.
Among the 27 amplicons (PCR products of primers
designed for exonic and flanking intronic regions) studied
within the gene LIG1, 30 SNPs were identified following
DNA resequencing. SNP validation studies were performed
on 10 SNPs based on the potential importance in context
of the function of the gene or to fill additional gaps in the
genome, as described earlier. SNP validation showed that the
SNPs rs3730862 and rs20580 were abundant among Indian
subpopulations while four other SNPs were found restricted
to less than or equal to three subpopulations. MAFs showed
huge variation across Indian subpopulations, exemplifying
the enormous human population diversity of India. Further, MAFs of both the LIG1 SNPs, rs20580 and rs3730862
depicted a reversal of status from minor to major allele in
some subpopulations, a feature possibly indicative of gradual
stabilization of the SNP in course of time (Miller and Kwok
2001). A stark difference in allele frequencies was observed
both within and between linguistic subpopulation clusters of
India. Higher MAFs were mostly observed in the AustroAsiatic subpopulation clusters and the IPs or isolated tribal
populations across different linguistic subgroups. Moreover,
the MAFs of the Austro-Asiatic and Tibeto-Burman linguistic clusters were comparable with each other and also
mostly higher than their Dravidian and Indo-European counterparts. The MAFs isolated tribal populations were strikingly higher than the LPs or large endogamous populations,
which suggested a faster rate of stabilization of the polymorphism in highly endogamous isolated subpopulations.
The reason behind the higher MAFs among tribal populations might be due to genetic drift that occurs rapidly in
small isolated populations resulting in quicker accumulation of distinctive allele frequencies (Rosenberg et al. 2001).
The MAFs for most of the polymorphisms seem to follow
a similar pattern across the subpopulation clusters with the
Dravidian and Indo-European subpopulation clusters being
Journal of Genetics, Vol. 93, No. 2, August 2014
Amit Kumar Mitra et al.
very different from the Indo-European and Austro-Asiatic
clusters.
Subsequently, the MAF data generated on the Indian linguistic subpopulation clusters in the current study were
compared with the corresponding data on major world populations included in the International HapMap database (The
International HapMap Consortium 2003). The minor allele
frequencies of the large endogamous populations among
the Indo-European clusters were consistently comparable
with the data available for the population CEU that consists of individuals from European (Caucasian) ancestry. On
the other hand, the Tibeto-Burman linguistic subpopulation
cluster matched quite closely with that of JPT (Japanese).
Bioinformatic analysis involving prediction of the change
in protein function owing to the occurrence of the identified nonsynonymous SNPs revealed that the amino acid
substitutions resulting from the SNPs rs4987070 (D72G),
rs4987181 (P52L) and rs11666150 (Q892H) have potentially damaging/adverse effect on protein function which is
also supported by a recent publication (Singh et al. 2011).
Since these SNPs were found to be private polymorphisms
restricted to a few subpopulations, this finding may be significant while investigating related complex disorders in these
subpopulations.
Genotype–phenotype association analysis between LIG1
SNPs and LIG1 gene expression among the unrelated
CEPH samples showed significant association of the SNPs
rs16981519 and rs3730837 with LIG1 expression. This
shows that polymorphisms in LIG1 may affect its expression
and may therefore change its function.
Conclusion
The results obtained from this two-stage SNP exploration
study point out towards the uniqueness of the Indian subpopulation clusters with respect to the presence or absence
of certain SNPs, the genotype and allele frequency patterns among various populations and also highlight the differences between the linguistic subdivisions of the country.
The findings of the present study underscore the fact that
Indian population must be investigated for its plausible existence as a separate entity from the commonly inferred major
global population clusters including Africa, Eurasia (Europe,
Middle East and Central/South Asia), East Asia, Oceania and
America (Rosenberg et al. 2002).
This is the first report studying the presence of LIG1
polymorphisms in such a large scale in any subpopulation
of the world and the Indian population in particular. The
strength of the current study lies in the comprehensive collection of information on various factors through a well prepared robust questionnaire, designed with the help of trained
anthropologists, clinicians and scientists, which eliminates
the potential for selection bias. The data generated from this
study may have wide-ranging applications for further epidemiological and public health related research on the Indian
population.
Acknowledgements
AKM and AS are recipients of Senior Research Fellowship from
Council of Scientific and Industrial Research, India. The work
was supported by CSIR network projects CMM0016, CMM0018,
NWP0034. This paper bears communication number 7511 of CDRI.
References
Bohr V. A. 1995 DNA repair fine structure and its relations to
genomic instability. Carcinogenesis 16, 2885–2892.
Cartharius K., Frech K., Grote K., Klocke B., Haltmeier M.,
Klingenhoff A. et al. 2005 MatInspector and beyond: promoter
analysis based on transcription factor binding sites. Bioinformatics 21, 2933–2942.
Cheng L., Eicher S. A., Guo Z., Hong W. K., Spitz M. R. and Wei
Q. 1998 Reduced DNA repair capacity in head and neck cancer
patients. Cancer Epidemiol. Biomarkers Prev. 7, 465–468.
Gadgil M., Joshi N. V., Prasad U. V., Manoharan S. and Patil S. 1998
In the Indian human heritage. pp. 100-129, Universities Press,
Hyderabad India.
Genomes Project Consortium 2010 A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073.
Genomes Project Consortium 2012 An integrated map of genetic
variation from 1,092 human genomes. Nature 491, 56–65.
Grierson G. A. 1927 A linguistic survey of India. Superintendent of
Government Printing, Calcutta, India.
Indian Genome Variation Consortium 2005 The Indian Genome
Variation database (IGVdb): a project overview. Hum. Genet.
118, 1–11.
Indian Genome Variation Consortium 2008 Genetic landscape of
the people of India: a canvas for disease gene exploration. J.
Genet. 87, 3–20.
International HapMap Consortium 2010 Integrating common and
rare genetic variation in diverse human populations. Nature 467,
52–58.
The International HapMap Consortium 2003 The International
HapMap Project. Nature 426, 789–796.
Ma L., Hoeijmakers J. H. and van der Eb A. J. 1995 Mammalian
nucleotide excision repair. Biochim. Biophys. Acta 1242, 137–
163.
Malhotra K. C. 1978 Morphological composition of the people of
India. J. Hum. Evol. 7, 45–63.
Miller R. D. and Kwok P. Y. 2001 The birth and death of human
single-nucleotide polymorphisms: new experimental evidence
and implications for human history and medicine. Hum. Mol.
Genet. 10, 2195–2198.
Miller S. A., Dykes D. D. and Polesky H. F. 1988 A simple salting
out procedure for extracting DNA from human nucleated cells.
Nucleic Acids Res. 16, 1215.
Ng P. C. and Henikoff S. 2003 SIFT: Predicting amino acid changes
that affect protein function. Nucleic Acids Res. 31, 3812–3814.
Nickerson D. A., Tobe V. O. and Taylor S. L. 1997 PolyPhred:
automating the detection and genotyping of single nucleotide
substitutions using fluorescence-based resequencing. Nucleic
Acids Res 25, 2745–2751.
Pattanayak D. P., Balasubramanian D. and Rao N. A. 1998 The language heritage of India. The Indian human heritage. pp. 95-99,
University Press, Hyderabad India.
Petrini J. H., Xiao Y. and Weaver D. T. 1995 DNA ligase I mediates
essential functions in mammalian cells. Mol. Cell Biol. 15, 4303–
4308.
Ramensky V., Bork P. and Sunyaev S. 2002 Human nonsynonymous SNPs: server and survey. Nucleic Acids Res. 30,
3894–3900.
Risley H. H. 1915 The people of India. Thacker Spink, Calcutta,
India.
Journal of Genetics, Vol. 93, No. 2, August 2014
LIG1 polymorphisms: the Indian scenario
Rosenberg N. A., Burke T., Elo K., Feldman M. W., Freidlin P.
J., Groenen M. A. et al. 2001 Empirical evaluation of genetic
clustering methods using multilocus genotypes from 20 chicken
breeds. Genetics 159, 699–713.
Rosenberg N. A., Pritchard J. K., Weber J. L., Cann H. M., Kidd
K. K., Zhivotovsky L. A. et al. 2002 Genetic structure of human
populations. Science 298, 2381–2385.
Roychoudhury S., Roy S., Basu A., Banerjee R., Vishwanathan H.,
Usha Rani M. V. et al. 2001 Genomic structures and population
histories of linguistically distinct tribal groups of India. Hum.
Genet. 109, 339–350.
Saitou N. and Nei M. 1987 The neighbor-joining method: a new
method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4,
406–425.
Singh A. A., Sivakumar D. and Somvanshi P. 2011 Cataloguing
functionally relevant polymorphisms in gene DNA ligase I: a
computational approach. 3 Biotech. 1, 47–56.
Singh K. S. 2002 People of India: introduction national series.
Oxford University Press, Delhi, India.
Stephens M. and Scheet P. 2005 Accounting for decay of linkage
disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449–462.
Timson D. J., Singleton M. R. and Wigley D. B. 2000 DNA ligases in the repair and replication of DNA. Mutat. Res. 460, 301–
318.
Tlsty T. D., Briot A., Gualberto A., Hall I., Hess S., Hixon M.
et al. 1995 Genomic instability and cancer. Mutat. Res. 337,
1–7.
Received 6 August 2013, accepted 10 April 2014
Published online: 14 August 2014
Journal of Genetics, Vol. 93, No. 2, August 2014