The genetic diversity of SARS-CoV-2 in the United States

The spread of Coronavirus 2 Severe Acute Respiratory Syndrome (SARS-CoV-2) in the United States in 2020 is believed to have occurred in three “waves” or “phases”, marked by an increase in the number of new cases reported and a. migratory geographical distribution is marked.

During this time, a number of SARS-CoV-2 lines with higher transmissibility compared to wild-type were identified, known as variants of concern, raising concerns about the virus mutation rate and its role in acquired and manipulated immunity.

In a research paper recently uploaded to the preprint server medRxiv * by Capoferri et al. (June 4, 2021), the genetic diversity of SARS-CoV-2 will be examined in detail at each phase using publicly available genomic data available before 2021, which underscores the need to continuously monitor and evaluate the evolution of the virus in order to ensure the future effectiveness of the vaccines currently available.

The stages of the COVID-19 spread

SARS-CoV-2 was introduced to the USA from Europe and Asia in winter 2019, with cases rising rapidly up to spring 2020, the so-called phase 1.

The northeast was particularly hard hit, where many community broadcasts took place in this short period of time. Phase 2 began in the summer of 2020, this time with an increased case load in the American Southwest as non-pharmaceutical interventions began to loosen.

The Midwest saw the earliest spike in cases at the start of Phase 3 in the fall of 2020, although cases rose nationwide before widespread vaccine distribution began in early 2021.

The authors note some discrepancies in the distribution of cases in the US and the available SARS-CoV-2 genome sequences. For example, while the south carried most of the cases overall, most of the genome sequences were obtained from the west.

Overall, only 1.2% of all reported cases in the country had a matching virus sequence in 2020, compared with 8.1% in the UK and 6.2% in Australia. The median time from sample collection to complete acquisition of the genomic sequence is approximately 100 days. Therefore, at the time of writing, many samples from the final stages of Phase 3 were unavailable to the group, despite finding that the overall sequencing rate has improved in 2021 from 2020 levels.

Tracking of SARS-CoV-2 clades

GISAID is an international organization that monitors influenza and now SARS-CoV-2 and provides open access genomic data on the viruses. They categorize SARS-CoV-2 clades and lines based on differences in genetic sequence and assign them letter symbols for easy identification. The earliest clades assigned by GISAID were: G, GH, GR, S, L, and V, each of which was identified in Phase 1 in the US.

SARS-CoV-2 epidemic in the US in 2020 (A) Daily COVID-19 cases in the US in 2020 (B) Daily COVID-19 deaths in the US in 2020 (C) US region map colored by region (D) Number of COVID-19 cases in the USA in 2020 by region: Northeast, South, West and Midwest respectively.  (E) Number of COVID-19 deaths in the United States in 2020, by region.  (AB & DE) The phase separation is indicated by vertical dotted red lines.  The data were smoothed by a 3-day moving average.  (F) Proportion of COVID-19 cases by region during each phase and the total contribution to the US total in 2020. (G) Proportion of SARS-CoV-2 sequences accessed (filing from December 15, 2020) by region during each phase and the total contribution to the total US in 2020 (H) The number of sequences per case was determined by each region during each phase and the total US in 2020.  (FH) Highlights Phase 1, 2 and 3, followed by US total 2020. (I) Total number of sequences that GISAID submitted by December 15, 2020 from Great Britain, Australia and the USA.  (J) Submitted SARS-CoV-2 genomes normalized to the number of COVID-19 cases from the UK, Australia and the US

SARS-CoV-2 epidemic in the US in 2020 (A) Daily COVID-19 cases in the US in 2020 (B) Daily COVID-19 deaths in the US in 2020 (C) US region map colored by region (D) Number of COVID-19 cases in the USA in 2020 by region: Northeast, South, West and Midwest respectively. (E) Number of COVID-19 deaths in the United States in 2020, by region. (AB & DE) The phase separation is indicated by vertical dotted red lines. The data were smoothed by a 3-day moving average. (F) Proportion of COVID-19 cases by region during each phase and the total contribution to the US total in 2020. (G) Proportion of SARS-CoV-2 sequences accessed (filing from December 15, 2020) by region during each phase and the total contribution to the total US in 2020 (H) The number of sequences per case was determined by each region during each phase and the total US in 2020. (FH) Highlights Phase 1, 2 and 3, followed by US total 2020. (I) Total number of sequences that GISAID submitted by December 15, 2020 from Great Britain, Australia and the USA. (J) Submitted SARS-CoV-2 genomes normalized to the number of COVID-19 cases from the UK, Australia and the US

G-based clades are defined by the D614G mutation of the spike protein, which is more infectious and expresses better resistance to some monoclonal antibodies than wild-type, although the convalescent serum remains effective at neutralization and the clinical results are similar or even lower than for Wild-type SARS-CoV-2.

Over 99% of the sequences collected in Phase 2 were from a G-based clade, demonstrating the rapid rise to dominance of this highly transmissible strain.

The average pairwise spacing between G-based clades increased from 0.02% in phase 1 to 0.06% in phase 3, with an approximate rate of change of 1.95 nucleotides per month.

The clades GH and GR emerged from this clade and show even higher average mutation rates with 2.85 and 2.22 nucleotides per month, respectively.

Overall, the number of unique variants of the G-Klade increased by 14% over the course of 2020 and especially for the GR-Klade by 17%.

Interestingly, the GH clade had an 11% decrease in the number of variants while the difference between the variants increased.

The measure of the degree to which random populations of the virus do not remain divergent over time was also calculated for each clade, finding that G- and S-based clades diverged greatly during phases 1 and 2, which indicated that suggests that the virus development was directional. If there had been a strong unstructured mixing of different clades, the divergence would be less, which shows that SARS-CoV-2 has completely penetrated the human population.

The authors state that about half of the new mutations that appeared in the United States and passed at a frequency greater than 5% were unique. The nucleocapsid mutation S194L of clade G and the mutations L3352F, N1653D and R2613C of clade G to ORF1a or small available sample pool. Many of the determining mutations of the worrying SARS-CoV-2 variants were identified by the group during 2020 before they were officially recognized as separate lineages. These mutations were only present with a frequency of about 1% in phase 1, increasing to almost 5% in phase 3.

Future development of SARS-CoV-2

While SARS-CoV-2 has a high level of replication fidelity compared to many other RNA viruses, the global spread of the virus has opened up ample opportunity for mutation.

The authors characterize the development of SARS-CoV-2 as slow but unstoppable, mainly driven by genetic drift, with a slight selection pressure towards higher transferability and immune defense through competition with other strains.

Chronically infected immunocompromised individuals treated with neutralizing antibodies are considered an ideal environment for more significant mutations to occur, demonstrating an isolated container with greater selection pressure, and many of the more worrisome variants could have emerged this way.

Similarly, the general genetic diversity in SARS-CoV-2 has been fueled by the low adherence to non-pharmaceutical measures in the community, with some adherent populations providing isolated conditions suitable for mutation before moving on from the non-adherents to be spread.

The more people in the population are vaccinated, the greater the selection pressure on strains that are better able to evade the immune system. Continuous monitoring of the virus’ genome is therefore essential.

*Important NOTE

medRxiv publishes preliminary scientific reports that are not peer-reviewed and therefore are not considered conclusive, guide clinical practice / health-related behavior, or should be treated as established information.

Comments are closed.