Northeast India was a high burden area for Covid-19 both in the first and the second wave [
31]. Therefore, we sequenced 92 samples of SARS-CoV-2 over the time frame of March to July, 2021 from Assam, India, to keep track of mutant variants. All of these 92 samples were found to belong to the δ variant clade, which is also seen in other parts of India over that time frame [
32]. A comparison with the Pangolin database [
7] revealed that 29.34% of our samples belonged to the B.1.617.2 lineage, 28.26% to the AY.33 lineage, 26.08% to the AY.16, 13.04% to the AY.4 lineage, and 1.08% to the AY.34 and the AY.37 lineages; the δ variant includes the lineages B.1.617.2 + AY*. Notably, we found 12 cases of AY.4 δ variant, indicating that this region carries the SARS-CoV-2 AY.4.2 lineages of δ variants, which are suspected to cause severe illness or deaths in India [
33]. The genome-wide amino acid variant analysis revealed that a group of 13 variants transmitted together with a high frequency in Assam. This suggests that these variants were inherited together and represented a haplotype of the δ variant. So far, the spread of SARS-CoV-2 in different regions of the world has been tracked as VOC [
1] and except for a few studies, attention has not been given to the spread of haplotypes or sub-haplotypes in a region [
11,
34]. A previous study showed that there were clusters of sub-lineages of the δ variant across different regions of Germany and the United Kingdom [
35]. Therefore, we hypothesize that a specific haplotype of the δ variant highly transmitted in Assam. The set of variants of the haplotype changed in two ways. First, there was a selective sweep of 13 pre- existing haplotype variants, including 4 variants (2 in S protein and 2 in ORF8), which have a frequency <91% outside Assam in the same lineage and over the same time frame. The increase in frequency in Assam of the 4 variants might be due to selective advantage in this region, or else some of them are carried as hitchhiker because of their linkage with advantageous variants in S protein [
36]. As noted above, the selective advantage might be instead owing to the variant on ORF8, which is tightly linked to the S-protein locus. However, this needs further validation in future studies with more extensive data. Second, 10 variants of the haplotypes were reduced in frequency in Assam, likely by mutation or recombination perhaps due to their weak linkage with the S-protein (and the ORF8 locus). Recombination is widespread in coronavirus due to switching RNA-synthesizing genes from one template to another [
37,
38]. Many of the previous studies have shown that the successive evolution of SARS-CoV-2 variants involved repeated episodes of recombination [
20,
21,
39]. In India, especially in the second wave of infection, although the reported cases of infection from different states were mostly the δ variant, the rate of transmission and pathogenicity significantly differed among the states of the country. The actual reason for this phenomenon remains unclear, perhaps due to little understanding of haplotype transmissibility in different regions.
We calculated the dN and dS values for each of the genes of SARS-CoV-2 VOCs that evolved in different timelines. The calculated dN and the dS values are smaller compared to those observed in other RNA viruses [
41]. The low dS values may lead to overestimation of dN/dS. To minimize this chance, we only used the top one-third of the dS values with the additional condition of dS/SE < 1. Our average dN/dS values of SARS-CoV- 2 VOCs are not strikingly different from those in the previous studies [
42‐
44], implying that overall the SARS-CoV-2 genome is evolving under strong purifying selection. The calculated gene-wise dN/dS for FW, α, δ and omicron revealed that the gene for ORF1ab was under positive selection in the FW and omicron variants while S-protein was under positive selection in all the SARS-CoV-2 variants studied. ORF3a may be under relaxed negative selection in FW compared to other genes. Thus, the S-protein might have undergone positive selection in some of the VOCs [
45]. ORF1ab and ORF3a have been reported to undergo positive selection that drove the early evolution of SARS-CoV-2 [
44]. However, for ORF1ab, this was true only for the α and omicron variants. The structural genes, except S, and the immune evasion genes in SARS-CoV-2 were under strong purifying selection.
The expression pattern of the genes of SARS-CoV-2 relates to their characteristicpattern of evolution. Most importantly, we found that ORF6, which revealed high evolutionary conservation, showed differential expression levels among different samples collected from infected persons. Moreover, ORF6 showed a negative correlation with the Ct value of samples. This data tentatively indicates a positive correlation between upregulation of ORF6 and an increase in virus titre in a host. The cytokine profile and inflammatory response are different in the case of SARS-CoV-2 infection [
46]. Previous studies have shown that ORF6 regulates immune escape in the human host by inhibiting STAT1nuclear translocation to overcome the interferon mediated antiviral response and also by binding with Nup98-Rae1 complex thereby inhibiting the nuclear import pathway [
26]. Therefore, we may assume that upregulation of ORF6 is an essential determinant for the successful invasion of SARS-COV-2 in a human host, for this reason, this gene shows extraordinary functional conservation in evolution. However, this needs further validation from the future studies.