43
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PyDamage: automated ancient damage identification and estimation for contigs in ancient DNA de novo assembly

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          DNA de novo assembly can be used to reconstruct longer stretches of DNA (contigs), including genes and even genomes, from short DNA sequencing reads. Applying this technique to metagenomic data derived from archaeological remains, such as paleofeces and dental calculus, we can investigate past microbiome functional diversity that may be absent or underrepresented in the modern microbiome gene catalogue. However, compared to modern samples, ancient samples are often burdened with environmental contamination, resulting in metagenomic datasets that represent mixtures of ancient and modern DNA. The ability to rapidly and reliably establish the authenticity and integrity of ancient samples is essential for ancient DNA studies, and the ability to distinguish between ancient and modern sequences is particularly important for ancient microbiome studies. Characteristic patterns of ancient DNA damage, namely DNA fragmentation and cytosine deamination (observed as C-to-T transitions) are typically used to authenticate ancient samples and sequences, but existing tools for inspecting and filtering aDNA damage either compute it at the read level, which leads to high data loss and lower quality when used in combination with de novo assembly, or require manual inspection, which is impractical for ancient assemblies that typically contain tens to hundreds of thousands of contigs. To address these challenges, we designed PyDamage, a robust, automated approach for aDNA damage estimation and authentication of de novo assembled aDNA. PyDamage uses a likelihood ratio based approach to discriminate between truly ancient contigs and contigs originating from modern contamination. We test PyDamage on both on simulated aDNA data and archaeological paleofeces, and we demonstrate its ability to reliably and automatically identify contigs bearing DNA damage characteristic of aDNA. Coupled with aDNA de novo assembly, Pydamage opens up new doors to explore functional diversity in ancient metagenomic datasets.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Sequence Alignment/Map format and SAMtools

            Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fast gapped-read alignment with Bowtie 2.

              As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                peerj
                peerj
                PeerJ
                PeerJ Inc. (San Diego, USA )
                2167-8359
                27 July 2021
                2021
                : 9
                : e11845
                Affiliations
                [1 ]Microbiome Sciences Group, Max Planck Institute for the Science of Human History, Department of Archaeogenetics , Jena, Germany
                [2 ]Faculty of Biological Sciences, Friedrich-Schiller Universität Jena , Jena, Germany
                [3 ]Population Genetics Group, Max Planck Institute for the Science of Human History, Department of Archaeogenetics , Jena, Germany
                [4 ]ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide , Adelaide, Australia
                [5 ]Department of Anthropology, Harvard University , Cambridge, MA, United States of America
                Article
                11845
                10.7717/peerj.11845
                8323603
                34395085
                34728c59-3672-4567-932a-83cbafe6c5bc
                ©2021 Borry et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 29 March 2021
                : 1 July 2021
                Funding
                Funded by: DFG, German Research Foundation
                Award ID: 390713860
                Funded by: European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Program
                Award ID: 771234 –PALEoRIDER
                Funded by: Werner Siemens Foundation (“Paleobiotechnology”)
                Alexander Hübner was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (EXC 2051 –Project-ID 390713860). Adam B Rohrlach was funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program under grant agreement no. 771234 –PALEoRIDER. Maxime Borry and Christina Warinner were funded by the Werner Siemens Foundation (”Paleobiotechnology”). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Anthropology
                Bioinformatics
                Computational Biology
                Genomics
                Paleontology

                metagenomics,adna,ancient dna,assembly,damage,de novo,automated
                metagenomics, adna, ancient dna, assembly, damage, de novo, automated

                Comments

                Comment on this article