A new study reveals flaws in a common analytical method within population genetics.
According to recent research from the Swedish University of Lund, the analytical method most commonly used in population genetics is deeply flawed. This may have caused erroneous results and misconceptions regarding ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, influencing the results of medical genetics and even commercial ancestry tests. The findings were recently published in the journal Scientific reports.
The pace at which scientific data can be collected is rapidly increasing, resulting in huge and very complex databases, which have been dubbed the “Big Data Revolution”. Researchers use statistical techniques to condense and simplify the data while keeping most of the important information in order to make the data more manageable. PCA (Principal Component Analysis) is perhaps the most widely used approach. Imagine the PCA as an oven with flour, sugar and eggs serving as input data. The oven can always do the same thing, but the final result, a cake, strongly depends on the proportions of the ingredients and how they are mixed.
“This method is expected to give correct results because it is used so frequently. But it is neither a guarantee of reliability nor does it produce statistically solid conclusions, “says Dr Eran Elhaik, Associate Professor of Molecular Cell Biology at Lund University.
According to Elhaik, the method contributed to the development of ancient beliefs about race and ethnicity. It plays a role in producing historical tales about who and where people come from, not only from the scientific community but also from companies of commercial ancestry. A well-known example is when a famous American politician used an ancestry test to back up their ancestral claims prior to the 2020 presidential campaign. Another example is the misconception of Ashkenazi Jews as an isolated group or a result-driven race. of the PCA.
“This study shows that those results were not reliable,” says Eran Elhaik.
PCA is used in many scientific fields, but Elhaik’s study focuses on its use in population genetics, where the explosion in dataset size is particularly acute, driven by the reduced costs of
Between 32,000 and 216,000 scientific papers on genetics alone have employed PCA to explore and visualize similarities and differences between individuals and populations and based their conclusions on these findings.
“I believe these results need to be reassessed,” says Elhaik.
He hopes the new study will develop a better approach to question the findings and thus help make the science more reliable. He has spent much of the last decade experimenting with methods such as Geographic Population Structure (GPS) to predict biogeography from DNA and Pairwise Matcher to improve case-control matches used in genetic testing and drug trials.
“Techniques that offer such flexibility encourage bad science and are particularly dangerous in a world where there is intense pressure to publish. If a researcher performs PCA multiple times, the temptation will always be to select the output that makes the story better, ”adds Professor William Amos of the University of Cambridge, who was not involved in the study.
Reference: “Principal Component Analysis (PCA) based results in population genetic studies are highly biased and need to be re-evaluated” by Eran Elhaik, 29 August 2022, Scientific reports.
DOI: 10.1038 / s41598-022-14395-4