Indian Journal of Medical Research

: 2020  |  Volume : 151  |  Issue : 2  |  Page : 255--260

Genomic analysis of SARS-CoV-2 strains among Indians returning from Italy, Iran & China, & Italian tourists in India

Varsha Potdar1, Sarah S Cherian2, Gururaj Rao Deshpande1, Padinjaremattathil Thankappan Ullas3, Pragya D Yadav4, Manohar Lal Choudhary4, Rohan Gughe1, Veena Vipat1, Sheetal Jadhav1, Savita Patil1, Dimpal Nyayanit1, Triparna Majumdar1, Atul Walimbe1, Shivshankar Gaikwad1, Hitesh Dighe1, Anita Shete-Aich1, Sreelekshmy Mohandas1, Deepika Chowdhury1, Gajanan Sapkal1, Atanu Basu5, Nivedita Gupta6, Raman R Gangakhedkar6, Sidhartha Giri6, Lalit Dar7, Amita Jain8, Bharati Malhotra9, Priya Abraham10, NIC Team11,  
1 Influenza Group, ICMR-National Institute of Virology, Pune 411 001, Maharashtra, India
2 Bioinformatics & Data Management Group, ICMR-National Institute of Virology, Pune 411 001, Maharashtra, India
3 Diagnostic Virology Group, ICMR-National Institute of Virology, Pune 411 001, Maharashtra, India
4 Maximum Containment Laboratory, ICMR-National Institute of Virology, Pune 411 001, Maharashtra, India
5 Electron Microscopy & Histopathology Group, ICMR-National Institute of Virology, Pune 411 001, Maharashtra, India
6 Division of Epidemiology & Communicable Diseases, Indian Council of Medical Research, New Delhi 110 029, India
7 Department of Microbiology, All India Institute of Medical Sciences, New Delhi 110 029, India
8 Department of Microbiology, King George's Medical University, Lucknow 226 003, Uttar Pradesh &, India
9 Department of Microbiology, SMS Medical College, Jaipur 302 004, Rajasthan, India
10 ICMR-National Institute of Virology, Pune 411 001, Maharashtra, India

Correspondence Address:
Pragya D Yadav
Maximum Containment Laboratory, ICMR-National Institute of Virology, Pune 411 001, Maharashtra

How to cite this article:
Potdar V, Cherian SS, Deshpande GR, Ullas PT, Yadav PD, Choudhary ML, Gughe R, Vipat V, Jadhav S, Patil S, Nyayanit D, Majumdar T, Walimbe A, Gaikwad S, Dighe H, Shete-Aich A, Mohandas S, Chowdhury D, Sapkal G, Basu A, Gupta N, Gangakhedkar RR, Giri S, Dar L, Jain A, Malhotra B, Abraham P, NIC Team. Genomic analysis of SARS-CoV-2 strains among Indians returning from Italy, Iran & China, & Italian tourists in India.Indian J Med Res 2020;151:255-260

How to cite this URL:
Potdar V, Cherian SS, Deshpande GR, Ullas PT, Yadav PD, Choudhary ML, Gughe R, Vipat V, Jadhav S, Patil S, Nyayanit D, Majumdar T, Walimbe A, Gaikwad S, Dighe H, Shete-Aich A, Mohandas S, Chowdhury D, Sapkal G, Basu A, Gupta N, Gangakhedkar RR, Giri S, Dar L, Jain A, Malhotra B, Abraham P, NIC Team. Genomic analysis of SARS-CoV-2 strains among Indians returning from Italy, Iran & China, & Italian tourists in India. Indian J Med Res [serial online] 2020 [cited 2021 Mar 2 ];151:255-260
Available from:

Full Text


The single-stranded RNA genome of the 2019 novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) about 29.9 kb in length and encoding about 9860 amino acids, was annotated to possess 14 open reading frames (ORFs) and 27 proteins[1],[2]. The orf 1ab and orf 1a genes at the 5´-terminus of the genome encode the pp1ab and pp1a proteins, respectively, together form 15 non-structural proteins (nsps), nsp1-nsp10 and nsp12-nsp16. The 3´-terminus of the genome encodes four structural proteins, the spike surface glycoprotein (S), the small envelope protein (E), membrane protein (M) and nucleocapsid protein (N). There are eight accessory proteins denoted as 3a, 3b, p6, 7a, 7b, 8b, 9b and ORF14[2].

The epidemiology of the SARS-CoV-2 since its emergence in December 2019 has been ever expanding, with increase in the number of cases and its spread globally[3],[4]. The number of SARS-CoV-2 cases in India as on March 31, 2020 was 1,071, with mortality crossing 29[4]. In this context, it is vital to understand the genetic nature of circulating SARS-CoV-2. In India, as per the guideline of the Ministry of Health and Family Welfare, suspected samples of SARS-CoV-2 were collected and tested at the designated Viral Research and Diagnostic Laboratories (VRDL)[5]. As a part of this activity, a total of 15 SARS-CoV-2 positive specimens were obtained during the first week of March 2020, from Italian tourists and travellers from Italy and their contact cases in India. Further, in an effort to screen Indian nationals in Iran to enable their evacuation, during March 5 to 17, 2020, throat swabs were collected from 1,920 individuals; of whom 281 were positive. In addition, a team of Indian doctors visited Italy and collected a total of 380 swabs of Indian citizens; of whom four positive specimens were identified. In an earlier study, the authors identified the first three cases of SARS-CoV-2 in Kerala, India, as imported cases from Wuhan, China, and presented the first two full-genome sequences along with the potential B-cell and T-cell epitopes on the spike protein[6]. Further, in another study, the SARS-CoV-2 viruses were isolated in Vero CCL-81 cells[7]. The present study was undertaken to understand and compare the genetic makeup of representative samples of the imported cases of SARS-CoV-2 to India from Wuhan, China, those of Italian tourists in India and the Indians evacuated from Iran and Italy.

Throat swab/nasal swab specimens collected from the 1,920 individuals in Iran were tested at the Indian Council of Medical Research-National Institute of Virology (ICMR-NIV) Pune, using real-time reverse transcription-polymerase chain reaction (RT-PCR) protocols to detect RdRp (1), RdRp (2), E and N genes as described elsewhere[8]. Next-generation sequencing (NGS) was performed on a total of 41 SARS-CoV-2 positive clinical samples from Italy and Iran. [Table 1] presents the details of the full genomes obtained (n=19) as a part of this study as well as the two earlier genomes retrieved from the Kerala samples (n=2) from those who had the travel history from China[6],[7].{Table 1}

Multiple sequence alignment of 21 full genomes obtained and 1563 full-genome sequences [Supplementary Table [SUPPORTING:1]] available at the Global Initiative on Sharing All Influenza Data (GISAID) database (as of March 26, 2020) was carried out in MAFFT v.7.450[9]. The phylogenetic tree was constructed using MEGA v.6[10], employing the neighbour-joining method with the composite likelihood method and 1000 bootstrap replications. An initial tree was constructed based on a total of 1586 sequences. This tree was used to reduce the dataset to 121 sequences, on the basis of country and the genetic variant identified based on the GISAID classification. Comparison of the sequences of this study with respect to the Wuhan Hu-1 reference strain was done to identify unique mutations, if any.

Phylogenetic trees based on full-genome sequences deposited and available at GISAID revealed the diversification and the clustering of sequences into groups, based on the genetic variants. Specific amino acid substitutions in the nsp3 region, spike protein and ORF8, in general, lead to the formation of V, G and S genetic variants/clades, respectively. The S clade corresponds to the C28144T nucleotide polymorphism that results in a non-synonymous substitution Leu84Ser in ORF8. Clades V, G and a group of unclassified strains possess mainly C28144 and are referred to as the L type[11]. The phylogenetic analyses of the study strains and the other global sequences revealed that the SARS-CoV-2 sequences derived from Italy (n=8) in this study, clustered in clade G, while the SARS-CoV-2 sequences (n=11) of Indians evacuated from Iran belonged to the unclassified group which also included one of the SARS-CoV-2 sequences imported from Wuhan (hCoV-19/India/1-27/2020) [Figure 1]. The other sequences imported from Wuhan (hCoV-19/India/1-31/2020) possessed Leu84Ser in ORF8b, classifying it in clade S.{Figure 1}

The sequences of Italy origin were noted to segregate into at least two subgroups. The percentage nucleotide divergence (PND) within these sequences was found to be 0.01 per cent. The SARS-CoV-2 sequences from the Italian tourists (n=6) showed relatedness to other European SARS-CoV-2 sequences from Scotland, Finland, England, Spain, Ireland and the Czech Republic along with a Shanghai, China, strain as the outgroup (Figure). Two other sequences (hCoV-19/India/3118/2020 and hCoV-19/India/3239/2020) clustered more closely with sequences from Belgium and Switzerland. The two sequences (hCoV-19/India/31/2020 and hCoV-19/India/32/2020) from the Agra contacts of the Italy-returned Delhi based individuals were more distinct and showed clustering in a strongly supported subgroup consisting of strains from Brazil and the European countries including Switzerland, Germany, France, Hungary and The Netherlands.

The variable amino acid sites based on the alignment of the 21 sequences of this study with respect to Wuhan Hu-1 strain are shown in [Table 2]. All the Italy-origin sequences possessed the substitution D7711G/D614G in the S protein, characteristic of the G clade, along with another mutation P4715L (nsp12-323) that is also shared with many other countries. Mutation S1515F (nsp3-697) was specific to the Italian cohort strain; D8726G (M-3) was specific to hCoV-19/India/3118/2020 and hCoV-19/India/3239/2020 (Indian contacts of an Indian citizen having travel history to Italy), similar to sequences from Scotland, Belgium, Finland, Switzerland and England. The mutations, R9455K and G9456R (N-203 and 204), were found to be specific to the two strains, hCoV-19/India/31/2020 and hCoV-19/India/32/2020 but shared with a few more countries. A recent study has identified the earliest Italian importation of SARS-CoV-2 to a case from Shanghai, China, and has also identified at least two circulating variants in Italy[12]. Thus, it is likely that the former strain (Italian cohort) has its origin from China, whereas the latter strain (contacts in Agra, n=2) appears to have been from a European cluster involving an entry into Germany that preceded the first cases in Italy by almost a month[12],[13].{Table 2}

Analysis of the strains from the SARS-CoV-2 positives in Iran (Figure) showed that these sequences (n=11) clustered with other strains having a global spread inclusive of Canada, USA, several European countries, New Zealand, Australia and Southeast Asian countries noted in this group (moderate support of 64%). The PND among these study sequences was found to be 0.24 per cent. Common mutations shared among SARS-CoV-2 sequences in the group included R207C (nsp2-27), V378I (nsp2-198), M2796I (nsp4-33) and L3606F (nsp6-37). A mutation V9082F (ORF7a-74) was unique to four of the study sequences (hCoV-19/India/1073/2020, hCoV-19/India/1093/2020, hCoV-19/India/1115/2020 and hCoV-19/India/1100/2020) that clustered with a strain from Kuwait, KU12. The KU12 strain was also noted to possess this mutation. To date, there are no other sequences from Iran in the GISAID database. However, a phylogenetic study[14] of full-genome sequences has identified distinct SARS-CoV-2 link to travellers returning from Iran to Australia and New Zealand. Some of these representative sequences were included in this study as well.

In terms of the overall divergence of SARS-CoV-2, the strains in this study were 99.97 per cent identical to the earliest strain Wuhan Hu-1. However, it is vital to track the evolutionary dynamics of the strains vis-à-vis the strains circulating globally and monitor any specific changes in the functional sites of the major viral proteins.

Delineation of circulating strains into three major evolving clades has been reflected in GISAID, with clade G apparently being one of the dominant ones. From the start of the pandemic, severity or transmission patterns have not been associated with any clade in particular. A limitation of this study was the non-availability of full genomes from other parts of India. This would enable a pan-India comparison of the circulating strains in the country. Overall, the present study revealed genetic variants in India that were similar to strains circulating in the specific regions of their origin. Continued surveillance of SARS-CoV-2 strains in India is warranted to get the complete picture of all circulating strains and identify changes that could be associated with increased virulence.

Acknowledgment: Authors thank Prof. (Dr) Balram Bhargava, Director-General, Indian Council of Medical Research (ICMR) & Secretary, Department of Health Research (DHR), Ministry of Health & Family Welfare (MoHFW), New Delhi for the support. Authors acknowledge the support from Dr P. Ravindran, Director, Emergency Medical Response (EMR), MoHFW, Dr R. Lakshminarayan, ICMR and the team from the DHR, MoHFW, for the logistic support. The National Centre for Disease Control (NCDC) team is acknowledged for sample collection from Italy. Shri Santosh Jadhav, Bioinformatics Group, ICMR-National Institute of Virology, Pune, is thanked for his inputs.

Financial support & sponsorship: Financial support provided by the Indian Council of Medical Research, New Delhi, is acknowledged.

Conflicts of Interest: None.


1Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, et al. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect 2020; 9 : 221-36.
2Wu A, Peng Y, Huang B, Ding X, Wang X, Niu P, et al. Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe 2020; 27 : 325-8.
3Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020; 579 : 270-3.
4World Health Organization. Novel coronavirus (2019-nCoV) situation reports. WHO; 2020. Available from:, accessed on March 31, 2020.
5Gupta N, Potdar V, Praharaj I, Giri S, Sapkal G, Yadav P, et al. Laboratory preparedness for SARS-CoV-2 testing in India: Harnessing a network of virus research & diagnostic laboratories. Indian J Med Res 2020; 151. doi: 10.4103/ijmr.IJMR_594_20.
6Yadav PD, Potdar VA, Choudhary ML, Nyayanit DA, Agrawal M, Jadhav SM, et al. Full-genome sequences of the first two SARS-CoV-2 viruses from India. Indian J Med Res 2020; 151. doi:10.4103/ijmr.IJMR_663_20.
7Sarkale P, Patil S, Yadav PD, Nyayanit DA, Sapkal G, Baradkar S, et al. First isolation of SARS-CoV-2 from clinical samples in India. Indian J Med Res 2020; 151. doi:10.4103/ijmr.IJMR_1029_20.
8Choudhary ML, Vipat V, Jadhav S, Basu A, Cherian S, Abraham P, et al. Development of in vitro transcribed RNA as positive control for laboratory diagnosis of SARS-CoV-2 in India. Indian J Med Res 2020; 151. doi: 10.4103/ijmr.IJMR_671_20.
9Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002; 30 : 3059-66.
10Kumar S, Stecher G, Tamura K. MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 2016; 33 : 1870-4.
11Tang X, Wu C, Li X, Song Y, Yao X, Wu X, et al. On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev 2020.
12Zehender G, Lai A, Bergna A, Meroni L, Riva A, Balotta C, et al. Genomic characterisation and phylogenetic analysis of SARS-CoV-2 in Italy. medRxiv 2020. doi:2020.03.15.20032870.
13Spiteri G, Fielding J, Diercke M, Campese C, Enouf V, Gaymard A, et al. First cases of coronavirus disease 2019 (COVID-19) in the WHO European Region, 24 January to 21 February 2020. Euro Surveill 2020; 25 : 2000178.
14Eden J-S, Rockett R, Carter I, Rahman H, et al. An emergent clade of SARS-CoV-2 linked to returned travellers from Iran. bioRxiv. doi: 2020.03.15.992818.