Evaluating single-subject study methods for personal transcriptomic interpretations to advance precision medicine

Samir Rachid Zaim^1-3, Colleen Kenost^1,2, Joanne Berghout^1,2,4, Helen Hao Zhang^3,5, Yves A. Lussier^1-4,6,*

1- The Center for Biomedical Informatics & Biostatistics of the University of Arizona Health Sciences

2- The Department of Medicine, College of Medicine Tucson

3- The Graduate Interdisciplinary Program in Statistics

4- The Center for Applied Genetic and Genomic Medicine

5- The Department of Mathematics, College of Sciences

6- The Arizona Cancer Center

The University of Arizona, 1230 N. Cherry Ave, Tucson, AZ, 85721, USA

Abstract

Background: Gene expression profiling has benefited medicine by providing clinically relevant insights at the molecular candidate and systems levels. However, to adopt a more ‘precision’ approach that integrates individual variability including ‘omics data into risk assessments, diagnoses, and therapeutic decision making, whole transcriptome expression analysis requires methodological advancements. One need is for users to confidently be able to make individual-level inferences from whole transcriptome data. We propose that biological replicates in isogenic conditions can provide a framework for testing differentially expressed genes (DEGs) in a single subject (ss) in absence of an appropriate external reference standard or replicates.

Methods: Eight ss methods for identifying genes with differential expression (NOISeq, DEGseq, edgeR, mixture model, DESeq, DESeq2, iDEG, and ensemble) were compared in Yeast (parental line versus snf2 deletion mutant; n=42/condition) and MCF7 breast-cancer cell (baseline and stimulated with estradiol; n=7/condition) RNA-Seq datasets where replicate analysis was used to build reference standards from NOISeq, DEGseq, edgeR, DESeq, DESeq2. Each dataset was randomly partitioned so that approximately two-thirds of the paired samples were used to construct reference standards and the remainder were treated separately as single-subject sample pairs and DEGs were assayed using ss methods. Receiver-operator characteristic (ROC) and precision-recall plots were determined for all ss methods against each RSs in both datasets (525 combinations).

Results: Consistent with prior analyses of these data, ~50% and ~15% DEGs were respectively obtained in Yeast and MCF7 reference standard datasets regardless of the analytical method. NOISeq, edgeR and DESeq were the most concordant and robust methods for creating a reference standard. Single-subject versions of NOISeq, DEGseq, and an ensemble learner achieved the best median ROC-area-under-the-curve to compare two transcriptomes without replicates regardless of the type of reference standard (>90% in Yeast, >0.75 in MCF7).

Conclusion: Better and more consistent accuracies are obtained by an ensemble method applied to single-subject studies across different conditions. In addition, distinct specific sing-subject methods perform better according to different proportions of DEGs. Single-subject methods for identifying DEGs from paired samples need improvement, as no method performs with both precision>90% and recall>90%.

http://www.lussiergroup.org/publications/EnsembleBiomarker

Keywords: Single-subject studies, precision medicine, genomic medicine, medical genomics, n-of-1, transcriptome, N-of-1 studies

Supplements

confidenceRegionPrecRecCurves_.R
different_refstandards_yeast.R
generate_boxplots_aucs.R
preprocess_yeastData.R
produce sample precrec roc curves.R
tabular_comparisons_cell.R
tabular_comparisons_yeast.R