Indan Journal of Medical Research Indan Journal of Medical Research Indan Journal of Medical Research
  Home About us Editorial board Search Ahead of print Current issue Archives Submit article Instructions Subscribe Contacts Login  
  Home Print this page Email this page Small font sizeDefault font sizeIncrease font size Users Online: 2666       
Year : 2020  |  Volume : 151  |  Issue : 1  |  Page : 93-103

Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data

1 Biomedical Informatics Centre, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India
2 Biomedical Informatics Centre; Department of Gastroenterology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Correspondence Address:
Dr Rakesh Aggarwal
Director, Jawaharlal Institute of Postgraduate Medical Education & Research, Dhanvantri Nagar, Puducherry 605 006
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/ijmr.IJMR_220_18

Rights and Permissions

Background & objectives: For bacterial community analysis, 16S rRNA sequences are subjected to taxonomic classification through comparison with one of the three commonly used databases [Greengenes, SILVA and Ribosomal Database Project (RDP)]. It was hypothesized that a unified database containing fully annotated, non-redundant sequences from all the three databases, might provide better taxonomic classification during analysis of 16S rRNA sequence data. Hence, a unified 16S rRNA database was constructed and its performance was assessed by using it with four different taxonomic assignment methods, and for data from various hypervariable regions (HVRs) of 16S rRNA gene. Methods: We constructed a unified 16S rRNA database (16S-UDb) by merging non-ambiguous, fully annotated, full-length 16S rRNA sequences from the three databases and compared its performance in taxonomy assignment with that of three original databases. This was done using four different taxonomy assignment methods [mothur Naïve Bayesian Classifier (mothur-nbc), RDP Naïve Bayesian Classifier (rdp-nbc), UCLUST, SortMeRNA] and data from 13 regions of 16S rRNA [seven hypervariable regions (HVR) (V2-V8) and six pairs of adjacent HVRs]. Results: Our unified 16S rRNA database contained 13,078 full-length, fully annotated 16S rRNA sequences. It could assign genus and species to larger proportions (90.05 and 46.82%, respectively, when used with mothur-nbc classifier and the V2+V3 region) of sequences in the test database than the three original 16S rRNA databases (70.88-87.20% and 10.23-24.28%, respectively, with the same classifier and region). Interpretation & conclusions: Our results indicate that for analysis of bacterial mixtures, sequencing of V2-V3 region of 16S rRNA followed by analysis of the data using the mothur-nbc classifier and our 16S-UDb database may be preferred.

Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)

 Article Access Statistics
    PDF Downloaded228    
    Comments [Add]    

Recommend this journal