6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      AcetoBase Version 2: a database update and re-analysis of formyltetrahydrofolate synthetase amplicon sequencing data from anaerobic digesters

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          AcetoBase is a public repository and database of formyltetrahydrofolate synthetase (FTHFS) sequences. It is the first systematic collection of bacterial FTHFS nucleotide and protein sequences from genomes and metagenome-assembled genomes and of sequences generated by clone library sequencing. At its publication in 2019, AcetoBase (Version 1) was also the first database to establish connections between the FTHFS gene, the Wood–Ljungdahl pathway and 16S ribosomal RNA genes. Since the publication of AcetoBase, there have been significant improvements in the taxonomy of many bacterial lineages and accessibility/availability of public genomics and metagenomics data. The update to the AcetoBase reference database described here (Version 2) provides new sequence data and taxonomy, along with improvements in web functionality and user interface. The evaluation of this latest update by re-analysis of publicly accessible FTHFS amplicon sequencing data previously analysed with AcetoBase Version 1 revealed significant improvements in the taxonomic assignment of FTHFS sequences.

          Database URL: https://acetobase.molbio.slu.se

          Related collections

          Most cited references51

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

          Background We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            VSEARCH: a versatile open source tool for metagenomics

            Background VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

              Abstract Summary The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes. Availability and implementation GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk. Supplementary information Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Journal
                Database (Oxford)
                Database (Oxford)
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press (UK )
                1758-0463
                2022
                16 June 2022
                16 June 2022
                : 2022
                : baac041
                Affiliations
                departmentDepartment of Molecular Sciences, BioCenter, Anaerobic Microbiology and Biotechnology Group, Swedish University of Agricultural Sciences , Almas Allé 5, Uppsala SE-750 07, Sweden
                departmentDepartment of Molecular Sciences, BioCenter, Anaerobic Microbiology and Biotechnology Group, Swedish University of Agricultural Sciences , Almas Allé 5, Uppsala SE-750 07, Sweden
                Author notes
                *Correspondence may also be addressed to Anna Schnürer. Tel: +46 18671000; Fax: +46 18672000; Email: anna.schnurer@ 123456slu.se and Abhijeet Singh. Tel: +46 18671000; Fax: +46 18672000; Email: abhijeet.singh@ 123456slu.se
                Author information
                https://orcid.org/0000-0002-0807-6080
                Article
                baac041
                10.1093/database/baac041
                9216588
                35708586
                81458acb-e66d-44f4-a03b-ef5764ecd167
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 20 January 2022
                : 03 May 2022
                : 04 May 2022
                : 04 May 2022
                : 16 June 2022
                Page count
                Pages: 9
                Funding
                Funded by: Sveriges Lantbruksuniversitet, DOI 10.13039/501100004360;
                Categories
                Original Article
                AcademicSubjects/SCI00960

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content205

                Cited by3

                Most referenced authors654