Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference.

Findings

Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics.

Conclusions

GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: not found

Meta-analysis methods for genome-wide association studies and beyond.

Evangelos Evangelou, John Ioannidis (2013)

Meta-analysis of genome-wide association studies (GWASs) has become a popular method for discovering genetic risk variants. Here, we overview both widely applied and newer statistical methods for GWAS meta-analysis, including issues of interpretation and assessment of sources of heterogeneity. We also discuss extensions of these meta-analysis methods to complex data. Where possible, we provide guidelines for researchers who are planning to use these methods. Furthermore, we address special issues that may arise for meta-analysis of sequencing data and rare variants. Finally, we discuss challenges and solutions surrounding the goals of making meta-analysis data publicly available and building powerful consortia.

0 comments Cited 242 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Genetic and epigenetic regulation of gene expression in fetal and adult human livers

Marc Jan Bonder, Silva Kasela, Mart Kals … (2014)

Background The liver plays a central role in the maintenance of homeostasis and health in general. However, there is substantial inter-individual variation in hepatic gene expression, and although numerous genetic factors have been identified, less is known about the epigenetic factors. Results By analyzing the methylomes and transcriptomes of 14 fetal and 181 adult livers, we identified 657 differentially methylated genes with adult-specific expression, these genes were enriched for transcription factor binding sites of HNF1A and HNF4A. We also identified 1,000 genes specific to fetal liver, which were enriched for GATA1, STAT5A, STAT5B and YY1 binding sites. We saw strong liver-specific effects of single nucleotide polymorphisms on both methylation levels (28,447 unique CpG sites (meQTL)) and gene expression levels (526 unique genes (eQTL)), at a false discovery rate (FDR) < 0.05. Of the 526 unique eQTL associated genes, 293 correlated significantly not only with genetic variation but also with methylation levels. The tissue-specificities of these associations were analyzed in muscle, subcutaneous adipose tissue and visceral adipose tissue. We observed that meQTL were more stable between tissues than eQTL and a very strong tissue-specificity for the identified associations between CpG methylation and gene expression. Conclusions Our analyses generated a comprehensive resource of factors involved in the regulation of hepatic gene expression, and allowed us to estimate the proportion of variation in gene expression that could be attributed to genetic and epigenetic variation, both crucial to understanding differences in drug response and the etiology of liver diseases. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-860) contains supplementary material, which is available to authorized users.

0 comments Cited 68 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button

Morris A Swertz, Martijn Dijkstra, Tomasz Adamusiak … (2010)

Background There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new *omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed. Methods The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS’ generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This ‘model-driven’ method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software. Results In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist’s satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the ‘ExtractModel’ procedure. Conclusions The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.

0 comments Cited 54 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Patrick Deelen: patrickdeelen@gmail.com

Marc Jan Bonder: bonder.m.j@gmail.com

K Joeri van der Velde: joeriv@gmail.com

Harm-Jan Westra: westra.harmjan@outlook.com

Erwin Winder: erwinwinder@gmail.com

Dennis Hendriksen: d.hendriksen@umcg.nl

Lude Franke: lude@ludesign.nl

Morris A Swertz: m.a.swertz@gmail.com

Journal

Journal ID (nlm-ta): BMC Res Notes

Journal ID (iso-abbrev): BMC Res Notes

Title: BMC Research Notes

Publisher: BioMed Central (London )

ISSN (Electronic): 1756-0500

Publication date (Electronic): 11 December 2014

Publication date Collection: 2014

Volume: 7

Issue: 1

Electronic Location Identifier: 901

Affiliations

[ ]University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands

[ ]Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands

Article

Publisher ID: 3473

DOI: 10.1186/1756-0500-7-901

PMC ID: 4307387

PubMed ID: 25495213

SO-VID: 196ec8be-6f5a-4a45-9a0d-6f327f77a10a

License:

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 7 November 2014

Date accepted : 3 December 2014

Custom metadata

ScienceOpen disciplines: Medicine

Keywords: gwas,imputation,meta-analysis,linkage disequilibrium

Data availability:

ScienceOpen disciplines: Medicine

Keywords: gwas, imputation, meta-analysis, linkage disequilibrium

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 64

See all cited by

Most referenced authors 383

See all reference authors

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

Read this article at

Abstract

Background

Findings

Conclusions

Related collections

BIO Integration

Most cited references 10

Meta-analysis methods for genome-wide association studies and beyond.

Genetic and epigenetic regulation of gene expression in fetal and adult human livers

The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 65

Cited by 64

Most referenced authors 383