The 89th Annual Meeting of the American Association of Physical Anthropologists (2020)


A mixed-methods approach to identify factors influencing non-human primate genomic sequence data generation

MARGARITA HERNANDEZ1 and GEORGE PERRY1,2.

1Department of Anthropology, Penn State University, 2Department of Biology, Penn State University

April 17, 2020 2:45PM, Diamond 3 Add to calendar

Massively-parallel genomic sequencing methods have facilitated powerful investigations into the phylogenetic relationships, population dynamics, and evolutionary biology of diverse species. Yet for non-human primates (NHP), the availability of genomic data appears to be unevenly distributed among taxa, perhaps limiting subsequent developments of knowledge on these topics across the order. Our goal was to test whether variables including publication history, geographic range, and IUCN Red List status were significantly correlated with publicly-available genomic sequence data in the NCBI Sequence Read Archive. We also conducted a qualitative analysis, where we interviewed 33 authors of genomic data-producing publications to learn their motivations when selecting species for study. Of the 180 terabases (Tb) of publicly-available genomic sequence data for 519 NHP species, 135 Tb (~75%) are from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees, and crab-eating macaques. We also found that total number of publications focusing on each species (R2=0.37;P=6.15x10-12) and representation in the medical literature (R2=0.27;P=9.27x10-9) were the strongest predictors of the amount of genomic sequence data available. Evaluating the most common themes to emerge across interviews (grounded theory analysis), authors frequently mentioned that their choices of species were motivated by sample accessibility, extent of prior published work, and perceived relevance (especially health-related) to humans. Our mixed-methods approach identifies and contextualizes driving factors regarding the decisions made when devising research studies. Through this, we hope to bring about an awareness of these processes for consideration when aligning research goals with species data generation where it is needed most.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1255832 to M.H.


Slides/Poster (pdf)