The signature of language and geography on the genetic structure of human populations in Africa


Recent evidence shows that the serial founder effects model is the best portrait of human genetic variation on a global scale. The model provides less insight into genetic variation in Africa, from which the founding population for the rest of the world arose. To help fill this gap, we studied the genetic structure of African populations in relation to geography and language. We examined 652 autosomal microsatellite loci from 2413 individuals in 118 African populations collected and made publicly available by Sarah Tishkoff (Science, 2009 324:1035). We analyzed correlations and partial correlations between matrices of Nei's genetic distance, geographic distance, and linguistic distance. For all pairs of populations, we used great circle geographic distances, and a linguistic distance based on positions of languages in a standard classification (i.e., languages in branches, in subgroups, in groups, in primary branches, and in families). Geography independently accounted for 12% of the variation in genetic distance (p=0.0000). Language independently accounted for 9% of the variation in genetic distance (p=0.0000). To explore the role of language further, we partitioned the linguistic distance into components defined at each level of the classification. We found that language family membership alone accounted for nearly all of the correlation between linguistic and genetic distance, with modest additions from primary branches and groups. It appears in this light that language groups reveal genetic signatures of ancient divisions in African populations. The correlation of genetics and geography, independent of language, reveals more recent gene flow and population movement.

