In a recent study published in the bioRxiv* preprint server, Researchers developed and validated an approach for joint inference of measurement noise and genetic drift by analyzing lineage frequency time-series data.
Random genetic drift in population-level infectious disease epidemic dynamics results from the randomness of transmission between hosts and the death or recovery of the host. Studies have reported strong genetic drift in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences resulting from superspreading events, which is expected to greatly influence the viral evolution and epidemiology of coronavirus disease 2019 ( COVID-19). Noise from the measurement process, including bias in obtaining data across location and time, could confound estimates of genetic drift.
About the studio
In the present study, researchers developed an approach to jointly infer measurement noise power and genetic drift from time-varying lineage frequency data that allowed measurement noise to be over-dispersed (instead of maintaining uniformity) and the power of overdispersion to vary over time (instead of being constant). They also validated the approach’s accuracy through simulations.
HMM (hidden Markov modeling) was used with observed states occurring continuously and hidden ones representing observed and true frequencies, respectively. The possibility of transition between hidden states was established by genomic drift, in which the average real frequency was based on the real frequencies determined in the previous period. For rare frequencies, the variance was related to mean values based on actual population size [Ne
The possibility of emission between the observed and hidden states was based on a measurement noise such that the mean value of the observed frequencies equaled the real frequencies. In the case of infrequent frequencies, the value of the variance in observed frequencies was related to the mean value denoting time-dependent deviations from uniform sampling. The modeling was done assuming that the people count and lineage frequencies were high enough to apply the central limit theorem.
The model generated “superlineages” by clustering lineages by phylogenetic distances so that the total value of lineage abundance and frequency exceeded the cutoff, yielding 486, 4083, 6,225, and 24,867 pre-B strains of SARS- CoV-2. 1.177, B.1.177, Alpha and Delta variants, respectively. The team hypothesized that the NoAnd
Subsequently, the parameters that most likely represent the dataset were determined. The model was validated by running simulations using NAnd
The induced NAnd
The potency of genetic drift was consistently higher than that estimated from the observed count of SARS-CoV-2 positive people in England by one to three orders of magnitude, over time, even after adjusting for measurement noise. The high genetic drift cannot be explained on the basis of superspreading, but can be partially explained by deme community structures in host contact networks. The discrepancy cannot be explained by corrections that take into account epidemiological dynamics (SIR or SEIR modeling).
Sampling of people infected with SARS-CoV-2 from the English population was largely uniform across the data set. The team found evidence of a spatial arrangement in the transmission dynamics of the B.1.177 variant, the Alpha variant and the Delta variant. The estimated NoAnd
Overall, the study results showed that the strength of genetic drift in SARS-CoV-2 transmission in England was greater than estimated and indicated that further modeling study methods are needed to better understand the mechanisms underlying the high levels of genetic drift for SARS-CoV-2 in England.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice/health-related behavior, or treated as established information.