The origins and affinities of the ∼1 billion people living on the subcontinent
of India have long been contested. This is owing, in part, to the many
different waves of immigrants that have influenced the genetic structure of
India. In the most recent of these waves, Indo-European-speaking people from
West Eurasia entered India from the Northwest and diffused throughout the
subcontinent. They purportedly admixed with or displaced indigenous
Dravidic-speaking populations. Subsequently they may have established the Hindu
caste system and placed themselves primarily in castes of higher rank. To
explore the impact of West Eurasians on contemporary Indian caste populations,
we compared mtDNA (400 bp of hypervariable region 1
and 14
restriction site polymorphisms) and Y-chromosome (20
biallelic polymorphisms and 5 short tandem repeats) variation in ∼265 males from eight castes of different
rank to ∼750 Africans,
Asians, Europeans, and other Indians. For maternally inherited mtDNA, each
caste is most similar to Asians. However, 20%–30% of Indian
mtDNA haplotypes belong to West Eurasian haplogroups, and the frequency
of these haplotypes is proportional to caste rank, the highest frequency of
West Eurasian haplotypes being found in the upper castes. In contrast, for paternally
inherited Y-chromosome variation each caste is more similar to Europeans than
to Asians. Moreover, the affinity to Europeans is proportionate to caste rank,
the upper castes being most similar to Europeans, particularly East Europeans.
These findings are consistent with greater West Eurasian male admixture with
castes of higher rank. Nevertheless, the mitochondrial genome and the Y
chromosome each represents only a single haploid locus and is more susceptible
to large stochastic variation, bottlenecks, and selective sweeps. Thus, to
increase the power of our analysis, we assayed 40 independent, biparentally
inherited autosomal loci (1 LINE-1 and 39 Alu
elements) in all of the caste and continental populations (∼600 individuals). Analysis of these data
demonstrated that the upper castes have a higher affinity to Europeans
than to Asians, and the upper castes are significantly more similar to
Europeans than are the lower castes. Collectively, all five datasets show a
trend toward upper castes being more similar to Europeans, whereas lower castes
are more similar to Asians. We conclude that Indian castes are most likely to
be of proto-Asian origin with West Eurasian admixture resulting in rank-related
and sex-specific differences in the genetic affinities of castes to Asians and
Europeans.
Shared Indo-European languages (i.e., Hindi and most
European languages) suggested to linguists of the nineteenth and twentieth
centuries that contemporary Hindu Indians are descendants of primarily West
Eurasians who migrated from Europe, the Near East, Anatolia, and the Caucasus 3000–8000
years ago (Poliakov 1974; Renfrew 1989a,b).
These nomadic migrants may have consolidated their power by admixing with
native Dravidic-speaking (e.g., Telugu) proto-Asian populations who controlled
regional access to land, labor, and resources (Cavalli-Sforza et al. 1994),
and subsequently established the Hindu caste hierarchy to legitimize and
maintain this power (Poliakov 1974; Cavalli-Sforza et al. 1994).
It is plausible that these West Eurasian immigrants also appointed
themselves to predominantly castes of higher rank. However, archaeological
evidence of the diffusion of material culture from Western Eurasia into India
has been limited (Shaffer 1982). Therefore, information on the
genetic relationships of Indians to Europeans and Asians could contribute
substantially to understanding the origins of Indian populations.
Previous genetic studies of Indian castes have failed to
achieve a consensus on Indian origins and affinities. Various results have
supported closer affinity of Indian castes either with Europeans or with
Asians, and several factors underlie this inconsistency. First, erratic or
limited sampling of populations has limited inferences about the relationships
between caste and continental populations (i.e., Africans, Asians, Europeans).
These relationships are further confounded by the wide geographic dispersal of
caste populations. Genetic affinities among caste populations are, in part,
inversely correlated with the geographic distance between them (Malhotra and
Vasulu 1993), and it is likely that affinities between caste and
continental populations are also geographically dependent (e.g., different
between North and South Indian caste populations). Second, it has been
suggested that castes of different rank may have originated from or admixed
with different continental groups (Majumder and Mukherjee 1993). Third,
the size of caste populations varies widely, and the effects of genetic drift
on some small, geographically isolated castes may have been substantial.
Fourth, most of the polymorphisms assayed over the last 30 years
are indirect measurements of genetic variation (e.g., ABO typing), have been
sampled from only a few loci, and may not be selectively neutral. Finally, only
rarely have systematic comparisons been made with continental populations using
a large, uniform set of DNA polymorphisms (Majumder 1999).
To investigate the origin of contemporary castes, we
compared the genetic affinities of caste populations of differing rank (i.e.,
upper, middle, and lower) to worldwide populations. We analyzed mtDNA
(hypervariable region 1 [HVR1] sequence and 14
restriction-site polymorphisms [RSPs]), Y-chromosome (5
short-tandem repeats [STRs] and 20 biallelic polymorphisms),
and autosomal (1 LINE-1 and 39 Alu inserts) variation in ∼265 males from eight different
Telugu-speaking caste populations from the state of Andhra Pradesh in
South India (Bamshad et al. 1998). Comparisons were made to ∼400 individuals from tribal and
Hindi-speaking caste and populations distributed across the Indian subcontinent
(Mountain et al. 1995; Kivisild
et al. 1999) and to ∼350 Africans, Asians, and Europeans
(Jorde et al. 1995, 2000; Seielstad et al. 1999).
RESULTS
Analysis of mtDNA Suggests a Proto-Asian Origin of Indians
MtDNA HVR1 genetic distances between caste
populations and Africans, Asians, and Europeans are significantly different
from zero (p < 0.001) and reveal that, regardless of
rank, each caste group is most closely related to Asians and is most dissimilar
from Africans (Table (Table1).1). The genetic distances from
major continental populations (e.g., Europeans) differ among the three caste
groups, and the comparison reveals an intriguing pattern. As one moves from
lower to upper castes, the distance from Asians becomes progressively larger.
The distance between Europeans and lower castes is larger than the distance
between Europeans and upper castes, but the distance between Europeans and
middle castes is smaller than the upper caste-European distance. These trends
are the same whether the Kshatriya and Vysya are included in the upper castes,
the middle castes, or excluded from the analysis. This may be owing, in part,
to the small sample size (n = 10) of each of these castes. Among the
upper castes the genetic distance between Brahmins and Europeans (0.10)
is smaller than that between either the Kshatriya and Europeans (0.12)
or the Vysya and Europeans (0.16). Assuming that
contemporary Europeans reflect West Eurasian affinities, these data indicate
that the amount of West Eurasian admixture with Indian populations may have
been proportionate to caste rank.
Conventional estimates of the standard errors of genetic
distances assume that polymorphic sites are independent of each other, that is,
unlinked. Because mtDNA polymorphisms are in complete linkage disequilibrium
(as are polymorphisms on the nonrecombining portions of the Y chromosome), this
assumption is violated. Alternatively, the mtDNA genome can be treated as a
single locus with multiple haplotypes. However, even if this assumption is
made, mtDNA distances do not differ significantly from one another even at the
level of the three major continental populations (Nei and Livshits 1989),
the standard errors being greater than the genetic distances. Considering that
the distances between castes and continental populations are less than those
between different continental populations, the estimated mtDNA genetic
distances between upper castes and Europeans versus lower castes and Europeans
would not be significantly different from each other. Therefore, to resolve
further the relationships of Europeans and Asians to contemporary Indian
populations, we defined the identities of specific mtDNA restriction-site
haplotypes.
The presence of the mtDNA restriction sites DdeI10,394
and AluI10,397 defines a haplogroup (a group of haplotypes that
share some sequence variants), M, that was originally identified in populations
that migrated from mainland Asia to Southeast Asia and Australia (Ballinger et
al. 1992;
Chen et al. 1995; Passarino et al. 1996) and is found at much
lower frequency in European and African populations. Most of the common
haplotypes found in Telugu- and Hindi-speaking caste populations belong to
haplogroup M (Table (Table2)2) and do not differentiate into
language-specific clusters in a phylogenetic reconstruction (Fig. (Fig.1).1).
Furthermore, these Indian haplogroup-M haplotypes are distinct from
those found in other Asian populations (Fig. (Fig.2)2) and
indicate the existence of Indian-specific subsets of haplogroup M (e.g., M3).
As expected if the lower castes are more similar to Asians than to
Europeans, and the upper castes are more similar to Europeans than to Asians,
the frequencies of M and M3 haplotypes are inversely
proportional to caste rank (Table (Table2).2).
Of the non-Asian mtDNA haplotypes found in Indian
populations, most are of West Eurasian origin (Table (Table2;2;
Torroni et al. 1994; Richards et al. 1998). However, most of these
Indian West-Eurasian haplotypes belong to an Indian-specific subset of
haplogroup U, that is, U2i (Kivisild et al. 1999),
the oldest and second most common mtDNA haplogroup found in Europe (Torroni et
al. 1994).
In agreement with the HVR1 results, the frequency of
West Eurasian mtDNA haplotypes is significantly higher in upper castes than in
lower castes (p < 0.05), the frequency of U2i
haplotypes increasing as one moves from lower to higher castes. In addition,
the frequency of mtDNA haplogroups with a more recent coalescence estimate
(i.e., H, I, J, K, T) was fivefold higher in upper castes (6.8%) than
in lower castes (1.4%). These haplotypes are derivatives of haplogroups
found throughout Europe (Richards et al. 1998), the Middle East (Di
Rienzo and Wilson 1991), and to a lesser extent Central
Asia (Comas et al. 1998). Collectively, the mtDNA
haplotype evidence indicate that contemporary Indian mtDNA evolved largely from
proto-Asian ancestors with Western Eurasian admixture accounting for 20%–30%
of mtDNA haplotypes.
Y-Chromosome Variation Confirms Indo-European Admixture
Genetic distances estimated from Y-chromosome STR
polymorphisms differ significantly from zero (p < 0.001) and
reveal a distinctly different pattern of population relationships (Table
(Table3).3). In contrast to the mtDNA distances, the Y-chromosome
STR data do not demonstrate a closer affinity to Asians for each caste group.
Upper castes are more similar to Europeans than to Asians, middle castes are
equidistant from the two groups, and lower castes are most similar to Asians.
The genetic distance between caste populations and Africans is progressively larger
moving from lower to middle to upper caste groups (Table (Table3).3).
Genetic distances estimated from Y-chromosome biallelic
polymorphisms differ significantly from zero (p < 0.05),
and the patterns differ from the mtDNA results even more strikingly than the
Y-chromosome STRs. For Y-chromosome biallelic polymorphism data, each caste
group is more similar to Europeans (Table (Table4),4),
and as one moves from lower to middle to higher castes the genetic distance to
Europeans diminishes progressively. This pattern is further accentuated by
separating the European population into Northern, Southern, and Eastern
Europeans; each caste group is most closely related to Eastern Europeans.
Moreover, the genetic distance between upper castes and Eastern Europeans is
approximately half the distance between Eastern Europeans and middle or lower
castes. These results suggest that Indian Y chromosomes, particularly upper
caste Y chromosomes, are more similar to European than to Asian Y chromosomes.
This underscores the close affinities between Hindu Indian and Indo-European Y
chromosomes based on a previously reported analysis of three Y-chromosome
polymorphisms (Quintana-Murci et al. 1999b).
Overall, these results indicate that the affinities of
Indians to continental populations varies according to caste rank and depends
on whether mtDNA or Y-chromosome data are analyzed. However, conclusions drawn
from these data are limited because mtDNA and the Y chromosome is each
effectively a single haploid locus and is more sensitive to genetic drift,
bottlenecks, and selective sweeps compared to autosomal loci. These limitations
of our analysis can be overcome, in part, by analyzing a larger set of
independent autosomal loci. Consequently, we assayed 1 LINE-1
and 39
unlinked Alu polymorphisms.
Affinities to Europeans and Asians Stratified by Caste Rank
Genetic distances estimated from autosomal Alu elements
correspond to caste rank, the genetic distance between the upper and lower
castes being more than 2.5 times larger than the distance
between upper and middle or middle and lower castes (upper to middle, 0.0069;
upper to lower, 0.018; middle to lower, 0.0071). These trends are the
same whether the Kshatriya and Vysya are included in the upper castes, the
middle castes, or excluded from the analysis (data not shown). Furthermore, a
neighbor-joining network of genetic distances between separate castes (Fig.
(Fig.3)3) clearly differentiates castes of different rank into
separate clusters. This is similar to the relationship between genetic distances
and caste rank estimated from mtDNA (Bamshad et al. 1998). It
is important to note, however, that the autosomal genetic distances are
estimated from 40 independent loci. This afforded us the opportunity to
test the statistical significance of the correspondence between genetic
distance and caste status. The Mantel correlation between interindividual
genetic distances and distances based on social rank was low but highly significant
for individuals ranked into upper, middle, and lower groups (r = 0.08;
p < 0.001) and into eight separate castes (r = 0.07;
p < 0.001). Given the resolving power of this autosomal
dataset, we next tested whether we could reconcile the results of the analysis
of mtDNA and Y-chromosome markers in castes and continental populations.
Genotypic differentiation was significantly different from
zero (p < 0.0001) between each pair of caste populations and between
each caste and continental population. Similar to the results of both the mtDNA
and Y-chromosome analyses, the distance between upper castes and European
populations is smaller than the distance between lower castes and Europeans
(Table (Table5).5). However, in contrast to the mtDNA results but similar
to the Y-chromosome results, the affinity between upper castes and Europeans is
higher than that of upper castes and Asians (Table (Table5).5). If
the Kshatriya and Vysya are excluded from the analysis or included in the
middle castes, the genetic distance between the upper caste (Brahmins) and
Europeans remains smaller than the distance between the lower castes and
Europeans and the distance between upper castes and Asians (Table (Table5).5).
Analysis of each caste separately reveals that the genetic distance
between the Brahmins and Europeans (0.013) is less than the
distance between Europeans and Kshatryia (0.030) or Vysya (0.020).
Nevertheless, each separate upper caste is more similar to Europeans
than to Asians.
Because historical evidence suggests greater affinity
between upper castes and Europeans than between lower castes and Europeans
(Balakrishnan 1978, 1982; Cavalli-Sforza et al. 1994),
it is appropriate to use a one-tailed test of the difference between the
corresponding genetic distances. The 90% confidence limits of Nei's
standard distances estimated between upper castes and Europeans (0.006–0.016)
versus lower castes and Europeans (0.017–0.037) do
not overlap, indicating statistical significance at the 0.05 level.
Significance at 0.05 is not achieved if the Kshatriya and Vysya are
excluded. These results offer statistical support for differences in the
genetic affinity of Europeans to caste populations of differing rank, with
greater European affinity to upper castes than to lower castes.
Previous genetic studies have found evidence to support
either a European or an Asian origin of Indian caste populations, with
occasional indications of admixture with African or proto-Australoid
populations (Chen et al. 1995; Mountain et al. 1995; Bamshad
et al. 1996, 1997; Majumder et al. 1999;
Quintana-Murci et al. 1999a). Our results demonstrate that
for biparentally inherited autosomal markers, genetic distances between upper,
middle, and lower castes are significantly correlated with rank; upper castes
are more similar to Europeans than to Asians; and upper castes are
significantly more similar to Europeans than are lower castes. This result
appears to be owing to the amalgamation of two different patterns of
sex-specific genetic variation.
The majority of Indian mtDNA restriction-site haplotypes
belong to Indian-specific subsets (e.g., M3) of a predominantly Asian
haplogroup M, although a substantial minority of mtDNA restriction site
haplotypes belong to West Eurasian haplogroups. A higher proportion of
proto-Asian mtDNA restriction-site haplotypes is found in lower castes compared
to middle or upper castes, whereas the frequency of West Eurasian haplotypes is
positively correlated with caste rank, that is, is highest in the upper castes.
For Y-chromosome STR variation the upper castes exhibit greatest similarity
with Europeans, whereas the lower caste groups are most similar to Asians. For
Y biallelic polymorphism variation, each caste group is more similar to
Europeans than to Asians, and the affinity to Europeans is proportional to
caste rank, that is, is highest in the upper castes.
Importantly, five different types of data (mtDNA HVR1
sequence, mtDNA RSPs, Y-chromosome STRs, Y-chromosome biallelic polymorphisms,
and autosomal Alu polymorphisms) support the same general pattern: relatively
smaller genetic distances from European populations as one moves from lower to
middle to upper caste populations. Genetic distances from Asian populations
become larger as one moves from lower to middle to upper caste populations. It
is especially noteworthy that the analysis of Y biallelic polymorphisms, which
involved an independent set of comparative Asian, European, and African
populations, again indicated the same pattern. Additional support is offered by
the fact that the autosomal polymorphisms yielded a statistically significant
difference between the upper-caste–European and lower-caste–European genetic
distances. With additional loci, other differences (e.g., the distances between
different caste groups and Asians) may also reach statistical significance.
The most likely explanation for these findings, and the one
most consistent with archaeological data, is that contemporary Hindu Indians
are of proto-Asian origin with West Eurasian admixture. However, admixture with
West Eurasian males was greater than admixture with West Eurasian females,
resulting in a higher affinity to European Y chromosomes. This supports an
earlier suggestion of Passarino et al. (1996), which was based on a
comparison of mtDNA and blood group results. Furthermore, the degree of West
Eurasian admixture was proportional to caste rank. This explanation is
consistent with either the hypothesis that proportionately more West Eurasians
became members of the upper castes at the inception of the caste hierarchy or
that social stratification preceded the West Eurasian incursion and that West
Eurasians tended to insert themselves into higher-ranking positions. One
consequence is that shared Indo-European languages may not reflect a common
origin of Europeans and most Indians, but rather underscores the transfer of
language mediated by contact between West Eurasians and native proto-Indians.
West Eurasian admixture in Indian populations may have been
the result of more than one wave of immigration into India. Kivisild et al. (1999)
determined the coalescence (∼50,000 years before present) of the Indian-specific subset of the
West Eurasian haplotypes (i.e., U2i) and suggested that West
Eurasian admixture may have been much older than the purported Dravidian and
Indo-European incursions. Our analysis of Indian mtDNA restriction-site
haplotypes that do not belong to the U2i subset of West Eurasian
haplotypes (i.e., H, I, J, K, T) is consistent with more recent West Eurasian
admixture. It is also possible that haplotypes with an older coalescence were
introduced by Dravidians, whereas haplotypes with a more recent coalescence
belonged to Indo-Europeans. This hypothesis can be tested by a more detailed
comparison to West Eurasian mtDNA haplotypes from Iran, Anatolia, and the
Caucasus. Alternatively, the coalescence dates of these haplotypes may predate
the entry of West Eurasians populations into India. Regardless of their origin,
West Eurasian admixture resulted in rank-related differences in the genetic
affinities of castes to Europeans and Asians. Furthermore, the frequency of
West Eurasian haplotypes in the founding middle and upper castes may be
underestimated because of the upward social mobility of women from lower castes
(Bamshad et al. 1998). These women were presumably more likely to introduce
proto-Asian mtDNA haplotypes into the middle and upper castes.
Our analysis of 40 autosomal markers indicates
clearly that the upper castes have a higher affinity to Europeans than to
Asians. The high affinity of caste Y chromosomes with those of Europeans
suggests that the majority of immigrating West Eurasians may have been males.
As might be expected if West Eurasian males appropriated the highest positions
in the caste system, the upper caste group exhibits a lower genetic distance to
Europeans than the middle or lower castes. This is underscored by the
observation that the Kshatriya (an upper caste), whose members served as
warriors, are closer to Europeans than any other caste (data not shown).
Furthermore, the 32-bp deletion polymorphism in CC chemokine receptor 5,
whose frequency peaks in populations of Eastern Europe, is found only in two
Brahmin males (M. Bamshad and S.K. Ahuja, unpubl.). The stratification of
Y-chromosome distances with Europeans could also be caused by male-specific
gene flow among caste populations of different rank. However, we and others
have demonstrated that there is little sharing of Y-chromosome haplotypes among
castes of different rank (Bamshad et al. 1998; Bhattacharyya et al. 1999).
The affinity of caste populations to Europeans is more
apparent for Y-chromosome biallelic polymorphisms than Y-chromosome STRs. This
could be attributed to the use of different European populations in comparisons
using STRs and biallelic polymorphisms. Alternatively, it may reflect, in part,
the effects of high mutation rates for the Y-chromosome STRs, which would tend
to obscure relationships between caste and continental populations. A lack of
consistent clustering at the continental level has been observed in several
studies of Y-chromosome STRs (Deka et al. 1996; Torroni et al. 1996;
de Knijff et al. 1997). The autosomal Alu and biallelic Y-chromosome
polymorphisms, in contrast, have a slower rate of drift than Y-chromosome STRs
because of a higher effective population size, and their mutation rate is very
low. Thus, the Y-chromosome biallelic polymorphisms and autosomal Alu markers
may serve as more stable markers of worldwide population affinities.
Our analysis may help to explain why estimates of the
affinities of caste groups to worldwide populations have varied so widely among
different studies. Analyses of recent caste history based on only mtDNA or
Y-chromosome polymorphisms clearly would suggest that castes are more closely
related to Asians or to Europeans, respectively. Furthermore, we attempted to
minimize the confounding effect of geographic differences between populations
by sampling from a highly restricted region of South India. Because of the
ubiquity of the caste system in India's history, it is reasonable to predict
similar patterns in caste populations living in other areas. Indeed, any
genetic result becomes more compelling when it is replicated in other
populations. Therefore, comparable studies in caste populations from other
regions of India must be completed to test the generality of these results.
The dispersal and subsequent growth of Indian populations
since the Neolithic Age is one of the most important events to shape the
history of South Asia. However, the origin and dispersal route of the
aboriginal inhabitants of the Indian subcontinent is unclear. Our findings
suggest a proto-Asian origin of the Indian-specific haplogroup-M haplotypes.
Haplogroup-M haplotypes are also found at appreciable frequencies in some East
African populations— ∼18% of Ethiopians (Quintana-Murci et al. 1999a) and 16% of Kenyans
(M. Bamshad and L.B. Jonde, unpubl.). A comparison of haplogroup-M
haplotypes from East Africa and India has suggested that this southern route
may have been one of the original dispersal pathways of anatomically modern
humans out of Africa (Quintana-Murci et al. 1999a). Together, these data
support our previous suggestion (Kivisild et al. 1999) that
India may have been inhabited by at least two successive late Pleistocene
migrations, consistent with the hypothesis of Lahr and Foley (1994). It
also adds to the growing evidence that the subcontinent of India has been a
major corridor for the migration of people between Africa, Western Asia, and
Southeast Asia (Cavalli-Sforza et al. 1994).
It should be emphasized that the DNA variation studied here
is thought to be selectively neutral and thus represents only the effects of
population history. These results permit no inferences about phenotypic
differences between populations. In addition, alleles and haplotypes are shared
by different caste populations, reflecting a shared history. Indeed, these
findings underscore the longstanding appreciation that the distribution of
genetic polymorphisms in India is highly complex. Further investigation of the
spread of anatomically modern humans throughout South Asia will need to
consider that such complex patterns may be the norm rather than the exception.
Sample Collection
All studies of South Indian populations were performed with
the approval of the Institutional Review Board of the University of Utah,
Andhra University, and the government of India. Adult males living in the
district of Visakhapatnam, Andhra Pradesh, were questioned about their caste
affiliations and surnames and the birthplaces of their parents. Those who were
unrelated to any other subject by at least three generations were considered
eligible to participate.
We classified caste populations based upon the traditional
ranking of these castes by varna (defined below), occupation, and socioeconomic
status. According to various Sanskrit texts, Hindu populations were partitioned
originally into four categories or varna: Brahmin, Kshatriya, Vysya, and Sudra
(Tambia 1973; Elder 1996). Those in each varna
performed occupations assigned to their category. Brahmins were priests;
Kshatriya were warriors; Vysya were traders; and Sudra were to serve the three
other varna (Tambia 1973; Elder 1996). Each
varna was assigned a status; Brahmin, Kshatriya, and Vysya were considered of
higher status than the Sudra because the Brahmin, Kshatriya, and Vysya are
considered the twice-born castes and are differentiated from all other castes
in the caste hierarchy. This is the rationale behind classifying them as the
upper group of castes (Tambia 1973).
The Kapu and the Yadava are called once-born castes that
have traditionally been classified in the Sudra, the lowest of the original
four varna. However, the status of the Sudra was actually higher than that of a
fifth varna, the Panchama. This fifth varna was added at a later date to
include the so-called untouchables, who were excluded from the other four varna
(Elder 1996). The untouchable varna includes the Mala and Madiga.
The position of the Relli in the caste hierarchy is somewhat ambiguous, but
they have usually been classified in the lower caste group. Therefore, prior to
the collection of any data, males from eight different Telugu-speaking castes
(n = 265) were ranked into upper (Niyogi and Vydiki Brahmin,
Kshatriya, Vysya [n = 80]), middle (Telega and Turpu Kapu,
Yadava [n = 111]), and lower (Relli, Madiga, Mala [n = 74]) groups
(Bamshad et al. 1998). This ranking has been used by previous investigators
(Krishnan and Reddy 1994).
After obtaining informed consent, ∼8 mL of whole blood or 5 plucked scalp hairs were collected from
each participant. Extractions were performed at Andhra University using
established methods (Bell et al. 1981).
MtDNA Polymorphisms
The mtDNA data consisted of 68, 116, and
73
HVR1
sequences and 79, 159, and 72
restriction-site haplotypes from largely the same individuals in upper, middle,
and lower castes, respectively. These data were compared to data from 143
Africans (15 Sotho-Tswana, 7 Tsonga, 14
Nguni, 24 San, 5 Biaka Pygmies, 33 Mbuti
Pygmies, 9 Alur, 18 Hema, and 18
Nande), 78 Asians (12 Cambodians, 17
Chinese, 19 Japanese, 6 Malay, 9
Vietnamese, 2 Koreans, and 13 Asians of mixed ancestry),
and 99
Europeans (20 unrelated males of the French CEPH kindreds, 69
unrelated Utah males of Northern European descent, and 10
Poles) (Jorde et al. 1995, 1997). Mitochondrial sequence
data from these 597 individuals are available at:
http://www.genome.org/supplemental/.
In addition to our samples, the phylogenetic analyses also
included data from 98 published HVR1
sequences from two castes (48 Havlik and 43
Mukri), and a tribal population (7 Kadar) living in
south-western India (Mountain et al. 1995) and restriction-site
haplotypes from one caste (62 Lobana) from Northern India, three
tribal populations from Northern (12 Tharu and 18
Bhoksa) and Southern (86 Lambadi) India, and 122
individuals from various caste populations in Uttar Pradesh (Kivisild et al. 1999).
Phylogenetic relationships of HVR1 sequences assigned to
haplogroup M were estimated for Indians (this study), Turks (this study),
Central Asian populations (Comas et al. 1998), Mongolians (Kolman et
al. 1996),
Chinese (Horai et al. 1996), and Japanese (Horai et al. 1996;
Seo et al. 1998).
The mtDNA HVR1 sequence was determined by
fluorescent Sanger sequencing using a Dye terminator cycle sequencing kit
(Applied Biosystems) according to the manufacturer's specifications (Bamshad et
al. 1998).
Sequencing reactions were resolved on an ABI 377
automated DNA sequencer, and sequence data were analyzed using ABI DNA analysis
software and SEQUENCHER software (Genecodes). To identify mtDNA haplotypes and
haplogroups (a group of haplotypes that share some sequence variants), major
continent-specific genotypes (Torroni et al. 1994, 1996;
Wallace 1995) for the following polymorphic mtDNA restriction sites
were determined: HpaI3592, DdeI10394,
AluI10397,
AluI13262,
BamHI13366, AluI5176, HaeIII4830,
AluI7025,
HinfI12308, AccI14465, AvaII8249,
AluI10032,
BstOI13704, and HaeII9052.
Y-Chromosome and Autosomal Polymorphisms
Y-chromosome-specific STRs (DYS19, DYS288,
DYS388,
DYS389A,
DYS390)
were amplified using published conditions (Hammer et al. 1998). PCR
products were separated on an ABI 377 automated sequencer and
scored using ABI Genotyper software. Y-chromosome STR data were collected from 622
males including 280 South Indians, ∼200 Africans (Seielstad et al. 1999; this study), 40 Asians, and 102 Europeans.
Autosomal data were collected from 608 individuals including 265
South Indians, 155 Africans, 70 Asians, and 118
Europeans.
The Y-chromosome-specific biallelic polymorphisms tested
included: DYS188792, DYS194469, DYS211105,
DYS221136,
DYS257108,
DYS287,
M3,
M4,
M9,
M12,
M15,
SRY4064,
SRY10831.1,
SRY10831.2,
p12f2,
PN1,
PN2,
PN3,
RPS4Y711,
and Tat (Hammer and Horai 1995; Hammer et al. 1997, 1998,
2000;
Underhill et al. 1997; Zerjal et al. 1997; Karafet et al. 1999).
All individuals tested negative for the Y Alu insert (DYS287). A
complete description of the Y-chromosome STR loci can be found in Kayser et al.
(1997).
A table of the biallelic Y-chromosome haplotype frequencie in the upper,
middle, and lower castes is available at http://www.genome.org/supplemental/.
For the Y-chromosome biallelic dataset, comparisons were
made to a different set of worldwide populations including: East Asians from
Japan, Korea, China, and Vietnam (n = 460); Western Europeans from
Britain and Germany (n = 77); Southern Europeans from Italy and
Greece (n = 148); and Eastern Europeans from Russia and Romania (n = 102)
(M.F. Hammer, unpubl.). The complete dataset of Indians consisted of 55
Brahmin, 111 Yadava and Kapu, and 74 Relli, Mala, and Madiga.
Autosomal polymorphisms were amplified using conditions
specifically optimized for each system. Further information on these conditions
is available at the Web site: http://www.genetics.utah.edu/∼swatkins/pub/Alu_data.htm or http://www.genome.org/supplemental. With
minor exceptions caused by typing failures or other causes, the same
individuals from each population were used to create each dataset (i.e., mtDNA,
Y chromosome, and autosomal). The complete dataset of genotypes from all 40
autosomal loci is available at: http://www.genome.org/supplemental/.
Statistical Analyses
Genetic distances for Y-chromosome STRs were estimated using
the method of Shriver et al. (1995), which assumes a stepwise
mutation model. Genetic distances for mitochondrial and autosomal markers were
calculated as pairwise FST distances, using the ARLEQUIN package (Schneider et
al. 1997).
For autosomal polymorphisms, Nei's standard distances and their standard
errors were estimated using DISPAN (http://www.bio.psu.edu/IMEG); and 90%
confidence intervals were estimated by multiplying the standard error by 1.65.
The significance of the FST distances between populations was estimated
by generating a null distribution of pairwise FST distances by permuting
haplotypes between populations. The p-value of the test is the proportion of
permutations leading to an FST value larger than or equal to the observed one.
Genotypic differentiation was estimated using GENEPOP (Raymond and Rousset 1995)
vers. 3.2 (http://www.cefe.cnrs-mop.fr/). The null hypothesis
tested is that there is a random distribution of K different haplotypes among r
populations (the contingency table). All potential states of the contingency
table are explored with a Markov chain, and the probability of observing a
table less than or equally likely to the observed sample configuration is
estimated.
Estimates of significance for the correlation between
interindividual caste rank differences and interindividual autosomal genetic
distances were made by forming two n × n matrices, where n is the number of
individuals. For the first matrix, interindividual genetic distances were based
on the proportion of Alu insertions/deletions shared by each pair of
individuals. To form the second matrix, each individual was assigned a score
according to his rank in the caste hierarchy for caste groups (i.e., upper
caste = 1, middle caste = 2, lower caste = 3) and
also for separate castes (i.e., Brahmin = 1, Kshatriya = 2,
Vysya = 3, Kapu = 4, Yadava = 5,
Relli = 6, Mala = 7, and Madiga = 8). An
interindividual matrix of score distances was formed by comparing the absolute
value of the difference between the scores of each pair of individuals. The
matrix of genetic distances was compared to 10,000 permuted matrices of
score distances using a Mantel matrix comparison test (Mantel 1967).
To illustrate phylogenetic relationships we constructed
reduced median (Bandelt et al. 1995) and neighbor-joining
networks (Felsenstein 1989). Coalescence times were
calculated as in Forster et al. (1996), using the estimator ρ,
which is the average transitional distance from the founder haplotype.
Acknowledgments
We thank all participants, the faculty and staff of Andhra
University for their discussion and technical assistance, as well as Henry
Harpending for comments and criticisms. We acknowledge the contributions of an
anonymous reviewer who suggested that the Kshatriya and Vysya be analyzed
separately from the other upper castes. Genetic distances between STRs were
estimated by the program DISTNEW, kindly provided by L. Jin. This work was
supported by NSF SBR-9514733, SBR-9700729,
SBR-9818215,
NIH grants GM-59290 and PHS MO1–00064,
the Estonian Science Fund (1669 and 2887),
and the Newcastle University small grants committee.
The publication costs of this article were defrayed in part
by payment of page charges. This article must therefore be hereby marked
“advertisement” in accordance with 18 USC section 1734
solely to indicate this fact.
Footnotes
E-MAIL ude.hatu.sciteneg@ekim; FAX (801) 585-9148.
Article published on-line before print: Genome Res., 10.1101/gr.173301.
REFERENCES
Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson
AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, et al. Sequence and
organization of the human mitochondrial genome. Nature. 1981;290:457–465.
[PubMed]
Balakrishnan V. A preliminary study of genetic distances
among some populations of the Indian sub-continent. J Hum Evol. 1978;7:67–75.
————— . Admixture as an evolutionary force in populations of
the Indian sub-continent. In: Malhotra KC, Basu A, editors. Proceedings of the
Indian Statistical Institute Golden Jubilee International Conference on Human
Genetics and Adaptation. I. Calcutta: Indian Statistical Institute; 1982.
pp. 103–145.
Ballinger SW, Schurr TG, Torroni A, Gan YY, Hodge JA, Hassan
K, Chen KH, Wallace DC. Southeast Asian mitochondrial DNA analysis reveals
genetic continuity of ancient Mongoloid migrations. Genetics. 1992;130:139–152.
[PMC free article] [PubMed]
Bamshad M, Fraley AE, Crawford MH, Cann RL, Busi BR, Naidu
JM, Jorde LB. mtDNA variation in caste populations of Andhra Pradesh, India.
Hum Biol. 1996;68:1–28. [PubMed]
Bamshad M, Bhaskara RB, Naidu JM, Prasad BVR, Watkins S,
Jorde L. Letters to the editor. Hum Biol. 1997;69:432–435.
Bamshad MJ, Watkins WS, Dixon ME, Bhaskara BR, Naidu JM,
Rasanayagam A, Hammer ME, Jorde LB. Female gene flow stratifies Hindu castes.
Nature. 1998;395:651–652. [PubMed]
Bandelt HJ, Forster P, Sykes BC, Richards MB. Mitochondrial
portraits of human populations using median networks. Genetics. 1995;141:743–753.
[PMC free article] [PubMed]
Bell GI, Karem JH, Rutter JR. Polymorphic DNA region
adjacent to the 5′ end of
the human insulin gene. Proc Natl Acad Sci USA. 1981;78:5759–5763.
[PMC free article] [PubMed]
Bhattachayya NP, Basu P, Das M, Pramanik S, Banerjee R, Roy
B, Roychoudhury S, Majumder P. Negligible male gene flow across ethnic
boundaries in India, revealed by analysis of Y-chromosomal DNA polymorphisms.
Genome Res. 1999;9:711–719. [PubMed]
Cavalli-Sforza LL, Menozzi P, Piazza A. The history and
geography of human genes. Princeton, NJ: Princeton University Press; 1994.
Chen YS, Torroni A, Excoffier L, Santachiara-Benerecetti AS,
Wallace DC. Analysis of mtDNA variation in African populations reveals the most
ancient of all human continent-specific haplogroups. Am J Hum Genet. 1995;57:133–149.
[PMC free article] [PubMed]
Comas D, Calafell F, Mateu E, Perez-Lezaun A, Bosch E,
Martinez-Arias R, Clarimon J, Facchini F, Fiori G, Luiselli D, et al. Trading
genes along the silk road: mtDNA sequences and the origin of Central Asian
populations. Am J Hum Genet. 1998;63:1824–1838. [PMC
free article] [PubMed]
Deka R, Jin L, Shriver MD, Yu LM, Saha N, Barrantes R,
Chakraborty R, Ferrell RE. Dispersion of human Y chromosome haplotypes based on
five microsatellites in global populations. Genome Res. 1996;6:1177–1184.
[PubMed]
de Knijff P, Kayser M, Caglia A, Corach D, Fretwell N,
Gehrig C, Graziosi G, Heidorn F, Herrmann S, Herzog B, et al. Chromosome Y
microsatellites: Population genetic and evolutionary aspects. Int J Legal Med. 1997;110:134–149.
[PubMed]
Di Rienzo A, Wilson AC. Branching pattern in the
evolutionary tree for human mitochondrial DNA. Proc Natl Acad Sci. 1991;88:1597–1601.
[PMC free article] [PubMed]
Elder J. Enduring stereotypes about South Asia: India's
caste system. Edu Asia. 1996;1:20–22.
Felsenstein J. PHYLIP—Phylogeny inference package (version 3.2)
Cladistics. 1989;5:164–166.
Forster P, Harding R, Torroni A, Bandelt HJ. Origin and
evolution of Native American mtDNA variation: A reappraisal. Am J Hum Genet. 1996;59:935–945.
[PMC free article] [PubMed]
Hammer MF, Horai S. Y chromosomal DNA variation and the
peopling of Japan. Am J Hum Genet. 1995;56:951–962.
[PMC free article] [PubMed]
Hammer MF, Spurdle AB, Karafet T, Bonner MR, Wood ET,
Novelletto A, Malaspina P, Mitchell RJ, Horai S, Jenkins T, et al. The
geographic distribution of human Y chromosome variation. Genetics. 1997;145:787–805.
[PMC free article] [PubMed]
Hammer MF, Karafet T, Rasanayagam A, Wood ET, Altheide TK,
Jenkins T, Griffiths RC, Templeton AR, Zegura SL. Out of Africa and back again:
Nested cladistic analysis of human Y chromosome variation. Mol Biol Evol. 1998;15:427–441.
[PubMed]
Hammer MF, Redd AJ, Wood ET, Bonner MR, Jarjanazi H, Karafet
T, Santachiara-Benerecetti S, Oppenheim A, Jobling MA, Jenkins T, et al. Jewish
and middle eastern non-Jewish populations share a common pool of Y-chromosome
biallelic haplotypes. Proc Natl Acad Sci. 2000;97:6769–6774.
[PMC free article] [PubMed]
Horai S, Murayama K, Hayasaka K, Matsubayashi S, Hattori Y,
Fucharoen G, Harihara S, Park KS, Omoto K, Pan IH. mtDNA polymorphism in East
Asian populations, with special reference to the peopling of Japan. Am J Hum
Genet. 1996;59:579–590. [PMC free article]
[PubMed]
Jorde L B, Bamshad MJ, Watkins WS, Zenger R, Fraley AE,
Krakowiak PA, Carpenter KD, Soodyall H, Jenkins T, Rogers AR. Origins and
affinities of modern humans: A comparison of mitochondrial and nuclear genetic
data. Am J Hum Genet. 1995;57:523–538. [PMC
free article] [PubMed]
Jorde LB, Rogers AR, Bamshad M, Watkins WS, Krakowiak P, Sung
S, Kere J, Harpending HC. Microsatellite diversity and the demographic history
of modern humans. Proc Natl Acad Sci. 1997;94:3100–3103.
[PMC free article] [PubMed]
Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE,
Seielstad MT, Batzer MA. The distribution of human genetic diversity: A
comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Hum Genet. 2000;66:979–988.
[PMC free article] [PubMed]
Karafet TM, Zegura SL, Posukh O, Osipova L, Bergen A, Long
J, Goldman D, Klitz W, Harihara S, de Knijff P, et al. Ancestral Asian
source(s) of New World Y-chromosome founder haplotypes. Am J Hum Genet. 1999;64:817–831.
[PMC free article] [PubMed]
Kayser M, de Knijff P, Dieltjes P, Krawczak M, Nagy M,
Zerjal T, Pandya A, Tyler-Smith C, Roewer L. Applications of
microsatellite-based Y chromosome haplotyping. Electrophoresis. 1997;18:1602–1607.
[PubMed]
Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E,
Reidla M, Laos S, Parik J, Watkins WS, Dixon ME, et al. Deep common ancestry of
Indian and western Eurasian mtDNA lineages. Curr Biol. 1999;9:1331–1334.
[PubMed]
Kolman CJ, Sambuughin N, Bermingham E. Mitochondrial DNA
analysis of Mongolian populations and implications for the origin of New World
founders. Genetics. 1996;142:1321–1334. [PMC
free article] [PubMed]
Krishnan T, Reddy BM. Geographical and ethnic variability of
finger ridge-counts: Biplots of male and female Indian samples. Ann Hum Biol. 1994;21:155–169.
[PubMed]
Lahr MM, Foley RA. Multiple dispersals and modern human
origins. Evol Anthr. 1994;3:48–60.
Majumder PP. People of India: Biological diversity and
affinities. Evol Anthr. 1999;6:100–110.
Majumder PP, Mukherjee BN. Genetic diversity and affinities
among Indian populations: An overview. In: Majumder PP, editor. Human
population genetics. New York: Plenum Press; 1993. pp. 255–275.
Majumder PP, Roy B, Banerjee S, Chakraborty M, Dey B,
Mukherjee N, Roy M, Thakurta PG, Sil SK. Human-specific insertion/deletion
polymorphisms in Indian populations and their possible evolutionary implications.
Eur J Human Genet. 1999;7:435–446. [PubMed]
Mantel N. The detection of disease clustering and a
generalized regression approach. Cancer Res. 1967;27:209–220.
[PubMed]
Mlhotra KC, Vasulu TS. Structure of human populations in
India. In: Majumder PP, editor. Human population genetics. New York: Plenum
Press; 1993. pp. 207–233.
Mountain JL, Hebert JM, Bhattacharyya S, Underhill PA,
Ottolenghi C, Gadgil M, Cavalli-Sforza LL. Demographic history of India and
mtDNA-sequence diversity. Am J Hum Genet. 1995;56:979–992.
[PMC free article] [PubMed]
Nei M, Livshits G. Genetic relationships of Europeans,
Asians and Africans and the origin of modern Homo sapiens. Hum Hered. 1989;39:276–281.
[PubMed]
Passarino G, Semino O, Bernini LF, Santachiara-Benerecetti
AS. Pre-Caucasoid and Caucasoid genetic features of the Indian population
revealed by mtDNA polymorphisms. Am J Hum Genet. 1996;59:927–934.
[PMC free article] [PubMed]
Poliakov L. The Aryan Myth. New York: Basic Books; 1974.
Quintana-Murci L, Semino O, Poloni ES, Liu A, Van Gijn M,
Passarino G, Brega A, Nasidze IS, Maccioni L, Cossu G, et al. Y-Chromosome
specific YCAII, DYS19 and YAP polymorphisms in human
populations: A comparative study. Ann Hum Genet. 1999a;63:153–166.
[PubMed]
Quintana-Murci L, Semino O, Bandelt HJ, Passarino G,
McElreavey K, Santachiara-Benerecetti AS. Genetic evidence of an early exit of
Homo sapiens sapiens from Africa through eastern Africa. Nature Genet. 1999b;23:437–441.
[PubMed]
Raymond M, Rousset F. GENEPOP (version 1.2): Population
genetics software for exact tests and ecumenism. J Heredity. 1995;86:248–249.
Renfrew C. Before Babel: Speculations on the origins of
linguistic diversity. Camb Archaeol J. 1989a;1:3–23.
————— The origins of Indo-European languages. Sci Am. 1989b;261:82–90.
Richards MB, Macaulay VA, Bandelt HJ, Sykes BC.
Phylogeography of mitochondrial DNA in Western Europe. Ann Hum Genet. 1998;61:251–254.
[PubMed]
Schneider S, Rosslie D, Excoffier L. Arlequin ver 2.000:
A software for population genetics data analysis. Geneva: Genetics and
Biometry Laboratory, University of Geneva; 1997.
Seielstad M, Bekele E, Ibrahim M, Toure A, Traore M. A view
of modern human origins from Y chromosome microsatellite variation. Genome Res.
1999;9:558–567.
[PMC free article] [PubMed]
Seo Y, Stradmann-Bellinghausen B, Rittner C, Takahama K,
Schneider PM. Sequence polymorphism of mitochondrial DNA control region in
Japanese. Forensic Sci. 1998;97:155–164. [PubMed]
Shaffer JG. Harappan culture: A reconsideration. In: Possehl
GL, editor. Harappan civilization: A contemporary perspective. New Delhi,
India: American Institute of Indian Studies, Oxford and IBH Publishers; 1982.
pp. 41–50.
Shriver MD, Jin L, Boerwinkle E, Deka R, Ferrell RE,
Chakraborty R. A novel measure of genetic distance for highly polymorphic
tandem repeat loci. Mol Biol Evol. 1995;12:914–920.
[PubMed]
Tambia SJ. In: The character of kinship. Goody J, editor.
Cambridge, UK: Cambridge University Press; 1973.
Torroni A, Lott MT, Cabell MF, Chen YS, Lavergne L, Wallace
DC. mtDNA and the origin of Caucasians: Identification of ancient
Caucasian-specific haplogroups, one of which is prone to a recurrent somatic
duplication in the D-loop region. Am J Hum Genet. 1994;55:760–776.
[PMC free article] [PubMed]
Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L,
Scozzari R, Obinu D, Savontaus ML, Wallace DC. Classification of European
mtDNAs from an analysis of three European populations. Genetics. 1996;144:1835–1850.
[PMC free article] [PubMed]
Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath
D, Davis RW, Cavalli-Sforza LL, Oefner PJ. Detection of numerous Y chromosome
biallelic polymorphisms by denaturing high-performance liquid chromatography.
Genome Res. 1997;7:996–1005. [PMC free article]
[PubMed]
Wallace DC. 1994 William Allan Award
Address. Mitochondrial DNA variation in human evolution, degenerative disease,
and aging. Am J Hum Genet. 1995;57:201–223. [PMC
free article] [PubMed]
Zerjal T, Dashnyam B, Pandya A, Kayser M, Roewer L, Santos
FR, Schiefenhovel W, Fretwell N, Jobling MA, Harihara S, et al. Genetic
relationships of Asians and Northern Europeans, revealed by Y-chromosomal DNA
analysis. Am J Hum Genet. 1997;60:1174–1183. [PMC
free article] [PubMed]