RNA-Seq Normalization

[PMID:26176014] [BioMed Research International]

The Impact of Normalization Methods on RNA-Seq Data Analysis

“we suggest the application of the following workflow to determine which normalization method is optimal for a specific data set: (i) normalize the data using considered methods, (ii) calculate the “bias” and “variance” and rank the methods based on these values, (iii) after each normalization perform differential analysis and determine DEG lists found by each normalization method, (iv) select a subset of genes that can serve as positive and negative controls to investigate the sensitivity and specificity of normalization methods and rank the methods based on these criteria, (v) calculate the percentage of the mean of the prediction errors obtained using chosen classifiers for DEGs found by each normalization method and rank them, (vi) draw Venn diagrams or balloon plots based on the number of differentially expressed genes and rank the methods based on the number of common DEG values, and (vii) based on the summary of ranks choose the most appropriate normalization method of the investigated data set.”

See also the article today @BMC Bioinformatics [], Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data, which “compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish).” “Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.”


