Evaluating
single-subject study methods for personal transcriptomic interpretations to
advance precision medicine
Samir Rachid
Zaim1-3, Colleen Kenost1,2,
Joanne Berghout1,2,4, Helen Hao Zhang3,5,
Yves A. Lussier1-4,6,*
1- The Center for Biomedical
Informatics & Biostatistics of the University of Arizona Health Sciences
2- The Department of Medicine,
College of Medicine Tucson
3- The Graduate Interdisciplinary
Program in Statistics
4- The Center for Applied Genetic
and Genomic Medicine
5- The Department of Mathematics,
College of Sciences
6- The Arizona Cancer Center
The University of Arizona, 1230 N.
Cherry Ave, Tucson, AZ, 85721, USA
Abstract
Background:
Gene
expression profiling has benefited medicine by providing clinically relevant
insights at the molecular candidate and systems levels. However, to adopt a
more ‘precision’ approach that integrates individual variability including
‘omics data into risk assessments, diagnoses, and therapeutic decision making,
whole transcriptome expression analysis requires methodological advancements.
One need is for users to confidently be able to make
individual-level inferences from whole transcriptome data. We propose that
biological replicates in isogenic conditions can provide a framework for
testing differentially expressed genes (DEGs) in a single subject (ss) in absence of an appropriate external reference
standard or replicates.
Methods: Eight ss
methods for identifying genes with differential expression (NOISeq,
DEGseq, edgeR, mixture
model, DESeq, DESeq2, iDEG,
and ensemble) were compared in Yeast (parental line versus snf2 deletion
mutant; n=42/condition) and MCF7 breast-cancer cell (baseline and
stimulated with estradiol; n=7/condition) RNA-Seq
datasets where replicate analysis was used to build reference standards from NOISeq, DEGseq, edgeR, DESeq, DESeq2.
Each dataset was randomly partitioned so that approximately two-thirds of the
paired samples were used to construct reference standards and
the remainder were treated separately as single-subject sample pairs and
DEGs were assayed using ss methods. Receiver-operator
characteristic (ROC) and precision-recall plots were determined for all ss methods against each RSs in both datasets (525
combinations).
Results:
Consistent
with prior analyses of these data, ~50% and ~15% DEGs were
respectively obtained in Yeast and MCF7 reference standard datasets
regardless of the analytical method. NOISeq, edgeR and DESeq were the most
concordant and robust methods for creating a reference standard. Single-subject
versions of NOISeq, DEGseq,
and an ensemble learner achieved the best median ROC-area-under-the-curve to
compare two transcriptomes without replicates regardless of the type of
reference standard (>90% in Yeast, >0.75 in MCF7).
Conclusion:
Better
and more consistent accuracies are obtained by an
ensemble method applied to single-subject studies across different conditions.
In addition, distinct specific sing-subject methods perform better according to
different proportions of DEGs. Single-subject methods for identifying DEGs from
paired samples need improvement, as no method performs with both
precision>90% and recall>90%.
http://www.lussiergroup.org/publications/EnsembleBiomarker
Keywords: Single-subject
studies, precision medicine, genomic medicine, medical genomics, n-of-1,
transcriptome, N-of-1 studies
confidenceRegionPrecRecCurves_.R
different_refstandards_yeast.R
generate_boxplots_aucs.R
preprocess_yeastData.R
produce sample precrec roc curves.R
tabular_comparisons_cell.R
tabular_comparisons_yeast.R