Month: April 2016

Rail-dbGaP

[PMID:27153614] [Bioinformatics]

Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce

Work from Ben Langmead. “To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data.” “The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce.”

rnaseqcomp

[PMID:27107712] [Genome Biology]

A benchmark for RNA-seq quantification pipelines

“Note that for the ROC analysis we show results for both gene level and transcript level analysis and the transcript level metrics were substantially worse. Previous publications [PMID:26201343] focusing on abundance found that all algorithms performed well. Here we found that if your focus is differential expression, then results are not as impressive and differences are found across algorithms.””Finally, note that our method is meant to assess the quantification method specifically. Because, in general, our method does not consider biological replicates, it is not meant to be used for comparisons of statistical methods such as DESeq2 and edgeR.”

CNVkit

[PMID:27100738] [PLoS Computational Biology]

CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing

CNVkit paper was finally 0ut. “uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes.”

GraphMap

[PMID:27079541] [Nature Communications]

Fast and sensitive mapping of nanopore sequencing reads with GraphMap

“Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). “”GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100bp to 4kbp, and species and strain-specific identification of pathogens using MinION reads.”

Heat*seq

[PMID:27378302] [Bioinformatics]

Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data

“Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments (ChIP-seq, RNA-seq and CAGE) provided by a user, to the data in the public domain. Heat*seq currently contains over 12,000 experiments across diverse tissue and cell types in human, mouse and dro-sophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contex-tualise user experiments.”

Monovar

[PMID:27088313] [Nature Methods]

Monovar: single-nucleotide variant detection in single cells

“Monovar, a statistical method for detecting and genotyping single-nucleotide variants in single-cell data.” “These variant callers, designed for bulk tissue samples, make many assumptions regarding the underlying properties of the data. This is problematic for SCS data, which, on account of extensive whole-genome amplification (WGA), have unique properties and error profiles, including nonuniform coverage depth, allelic dropout (ADO) events, false-positive (FP) errors and false-negative (FN) errors, making it difficult to call SNVs accurately. Consequently, these studies have been challenged by a large number of FP and FN variant calls, and they require extensive orthogonal validation.’