1Department of Ecology and Evolutionary Biology, Brown University, 2Center for Computational Molecular Biology, Brown University
April 14, 2016 2:15, Imperial Ballroom B
Here, we introduce a novel framework for detecting hard selective sweeps, which leave behind three major genomic signatures: long-range haplotype blocks, altered site frequency spectra, and population differentiation. Composite methods generally gain substantial power by drawing on multiple sources of information, but inherent redundancies in statistics measuring similar signatures can introduce bias. Our method combines multiple statistics from across all three genomic signatures in a classification framework that returns probabilistically interpretable results, deals naturally with loci for which one or more of the component statistics are undefined, and accounts for the correlations among component statistics, using a machine-learning tool called an Averaged One-Dependence Estimator (AODE).
Our classifier infers the probability that a locus has undergone a hard sweep based on joint component statistic distributions learned from extensive demographic simulations and simulations of hard sweeps, conditional on a demographic model, of varying strengths and time periods. In simulation, we show that this classifier vastly outperforms other methods in localizing of sweep signals. Our classifier performs particularly well when identifying completed sweeps and fast sweeps. In data from the 1000 Genomes Project, we recover known sweep regions, with high scores localized near previously validated adaptive mutations, including the genes DARC in West Africans, EDAR in East Asians, and SLC24A5 in Europeans. Our methods produce fewer false positives and negatives compared to existing approaches, thus identifying promising novel targets of selection.
Sohini Ramachandran is a Pew Scholar in the Biomedical Sciences, supported by The Pew Charitable Trusts.