143
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference.

          Findings

          Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics.

          Conclusions

          GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Meta-analysis methods for genome-wide association studies and beyond.

          Meta-analysis of genome-wide association studies (GWASs) has become a popular method for discovering genetic risk variants. Here, we overview both widely applied and newer statistical methods for GWAS meta-analysis, including issues of interpretation and assessment of sources of heterogeneity. We also discuss extensions of these meta-analysis methods to complex data. Where possible, we provide guidelines for researchers who are planning to use these methods. Furthermore, we address special issues that may arise for meta-analysis of sequencing data and rare variants. Finally, we discuss challenges and solutions surrounding the goals of making meta-analysis data publicly available and building powerful consortia.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Genetic and epigenetic regulation of gene expression in fetal and adult human livers

            Background The liver plays a central role in the maintenance of homeostasis and health in general. However, there is substantial inter-individual variation in hepatic gene expression, and although numerous genetic factors have been identified, less is known about the epigenetic factors. Results By analyzing the methylomes and transcriptomes of 14 fetal and 181 adult livers, we identified 657 differentially methylated genes with adult-specific expression, these genes were enriched for transcription factor binding sites of HNF1A and HNF4A. We also identified 1,000 genes specific to fetal liver, which were enriched for GATA1, STAT5A, STAT5B and YY1 binding sites. We saw strong liver-specific effects of single nucleotide polymorphisms on both methylation levels (28,447 unique CpG sites (meQTL)) and gene expression levels (526 unique genes (eQTL)), at a false discovery rate (FDR) < 0.05. Of the 526 unique eQTL associated genes, 293 correlated significantly not only with genetic variation but also with methylation levels. The tissue-specificities of these associations were analyzed in muscle, subcutaneous adipose tissue and visceral adipose tissue. We observed that meQTL were more stable between tissues than eQTL and a very strong tissue-specificity for the identified associations between CpG methylation and gene expression. Conclusions Our analyses generated a comprehensive resource of factors involved in the regulation of hepatic gene expression, and allowed us to estimate the proportion of variation in gene expression that could be attributed to genetic and epigenetic variation, both crucial to understanding differences in drug response and the etiology of liver diseases. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-860) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button

              Background There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new *omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed. Methods The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS’ generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This ‘model-driven’ method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software. Results In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist’s satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the ‘ExtractModel’ procedure. Conclusions The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.
                Bookmark

                Author and article information

                Contributors
                patrickdeelen@gmail.com
                bonder.m.j@gmail.com
                joeriv@gmail.com
                westra.harmjan@outlook.com
                erwinwinder@gmail.com
                d.hendriksen@umcg.nl
                lude@ludesign.nl
                m.a.swertz@gmail.com
                Journal
                BMC Res Notes
                BMC Res Notes
                BMC Research Notes
                BioMed Central (London )
                1756-0500
                11 December 2014
                2014
                : 7
                : 1
                : 901
                Affiliations
                [ ]University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
                [ ]Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
                Article
                3473
                10.1186/1756-0500-7-901
                4307387
                25495213
                196ec8be-6f5a-4a45-9a0d-6f327f77a10a
                © Deelen et al.; licensee BioMed Central. 2014

                This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 7 November 2014
                : 3 December 2014
                Categories
                Technical Note
                Custom metadata
                © The Author(s) 2014

                Medicine
                gwas,imputation,meta-analysis,linkage disequilibrium
                Medicine
                gwas, imputation, meta-analysis, linkage disequilibrium

                Comments

                Comment on this article