The 84th Annual Meeting of the American Association of Physical Anthropologists (2015)


Group Specific Alleles and Ascertainment Bias in Genomic Diversity Sets

SARA D. NIEDBALSKI and JEFFREY C. LONG.

Department of Anthropology, University of New Mexico

March 26, 2015 2:15, Lindbergh Add to calendar

It is well established that genes can be used to predict ancestry, but in order for ancestry to accurately predict genotype there must exist high frequency, ancestry, or group, specific alleles (GSAs). The purpose of the research presented here is twofold; first we test whether commonly used race groups contain GSAs or if ancestry groups predicted by serial founder effects might be more informative. Our second goal was to identify biases related to GSA identification that occur when researchers merge databases to increase sample size.

We queried ~500,000 SNP loci across 31 populations in the CEPH-HGDP dataset for GSAs. Our results show that while 1,022 African and 9 American GSAs were identified, no other biogeographic region contained GSAs. Yet, when evolutionary lineages were predicted by SFE, it proved a more useful construct for identifying GSAs; we found 387 GSAs unique to a Non African group. These results indicate that using race groups as a proxy for populations is not sufficient to capture unique, high frequency variation. In this light, health scientists interested in using ancestry to predict genotype will be more successful with true evolutionary lineages rather than race groups. These results are consistent with data from an original program written to perform coalescent simulations on both microsatellite and SNP data.

We next compared the identified GSAs to a subset of loci sequenced in all individuals of the HAPMAP sample. When this subset was considered, all GSAs had been omitted, indicating an ascertainment bias favoring common variation.

This research was supported in part by NSF0850997. Note, all interpretations and errors are those of the presenter.