The 89th Annual Meeting of the American Association of Physical Anthropologists (2020)

Performance of adaptive boosting classifier based on incomplete dataset in biological sex prediction using postcranial bones


1Forensic medicine and clinical toxicology, Faculty of medicine, Alexandria University, 2Department of Biomedical Engineering, Medical research institute, Alexandria University-Egypt, 3Institute for Intelligent Systems Research and Innovation., Deakin University, Australia

April 16, 2020 , Platinum Ballroom Add to calendar

In forensic anthropology, group and period specific methods are crucial for reliable sex estimation. However, variations in discriminating efficiencies from training to validation datasets in the same region were reported. Cross-population testing showed more fluctuations due to unacceptable sex bias. Data science provides alternatives to classical classifiers e.g., neural networking (NN), support vector machine (SVM), and quadratic discriminant analysis (QDA) classifiers. Flawed outputs may occur because of modeling techniques and poor quality of data e.g., missing values, multicolinearity, and overfitting. Therefore, researchers draw confusing conclusions about interpopulation differences or the need to renew standards. Adaptive boosting (Adaboost) combines multiple models to improve the performance by focusing on incorrectly classified observations. Challenging sex estimation task was designed to compare the performance of adaboost to other machine learning algorithms using training sample of ancient native Americans from Goldman’s dataset (n=539) and validated on modern Thai (n=104) and Hong Kong (n=77) populations which were donated by Christopher king; based on the shared historical Asian ancestry. 13 standard measurements from left humerus, femur, and tibia were the input variables. Despite extensive missing data in the training sample, the accuracy ranged from 68% (QDA) to 86% (adaboost). In the test populations, the average accuracy in Thai sample ranged from 54% (SVM) to 89% (adaboost) and in Hong Kong sample ranged from 54% (SVM) to 88% (adaboost). Adaboost is a powerful classifier which outperformed other classifiers in both datasets and showed notable stability despite the interpopulation differences and temporal separation.

Slides/Poster (pdf)