Convergent downstream candidate mechanisms of independent intergenic polymorphisms between co-classified diseases implicate epistasis among noncoding elements
Jiali Han†, Jianrong Li† and Ikbel Achour†,&
Center for Biomedical Informatics and
Biostatistics (CB2)
Departments of Medicine and of Systems and
Industrial Engineering,
Lorenzo
Pesce and Ian
Foster
Computation Institute, Argonne
National Laboratory and University of Chicago, Chicago, IL 60637, USA
Email: lpesce@cs.uchicago.edu, foster@cs.uchicago.edu
Haiquan Li* and Yves A. Lussier*
CB2, BIO5 Institute, UACC, and Dept
of Medicine, The University of Arizona, Tucson, AZ 85721, USA
Email: haiquan@email.arizona.edu, yves@email.arizona.edu
† Authors
contributed equally to this work conducted at The Universities of Arizona and
of Illinois
& now
employed at AstraZeneca MedImmune
* Corresponding authors contributed equally to this work
Eighty
percent of DNA outside protein coding regions was shown biochemically
functional by the ENCODE project, enabling studies of their interactions.
Studies have since explored how convergent downstream mechanisms arise from
independent genetic risks of one complex disease. However, the cross-talk and
epistasis between intergenic risks associated with distinct complex diseases
have not been comprehensively characterized. Our recent integrative genomic
analysis unveiled downstream biological effectors of disease-specific polymorphisms buried in intergenic regions, and we
then validated their genetic synergy and antagonism in distinct GWAS. We extend
this approach to characterize convergent downstream candidate mechanisms of distinct
intergenic SNPs across distinct diseases
within the same clinical
classification. We construct a multipartite network consisting of 467 diseases
organized in 15 classes, 2,358 disease-associated SNPs, 6,301 SNP-associated
mRNAs by eQTL, and mRNA annotations to 4,538 Gene Ontology mechanisms. Functional
similarity between two SNPs (similar SNP pairs) is imputed using a nested
information theoretic distance model for which p-values are assigned by
conservative scale-free permutation of network edges without replacement (node
degrees constant). At FDR≤5%, we prioritized 3,870 intergenic SNP pairs
associated, among which, 755 that are associated with distinct diseases sharing
the same disease class, implicating 167 intergenic SNPs, 14 classes, 230 mRNAs,
and 134 GO terms. Co-classified SNP pairs were more likely to be prioritized as
compared to those of distinct classes confirming a noncoding genetic
underpinning to clinical classification (odds ratio ~3.8; p≤10-25).
The prioritized pairs were also enriched in regions bound to the
same/interacting transcription factors and/or interacting in long-range
chromatin interactions suggestive of epistasis (odds
ratio ~ 2,500; p≤10-25). This prioritized network implicates
complex epistasis between intergenic polymorphisms of co-classified diseases
and offers a roadmap for a novel therapeutic paradigm: repositioning
medications that target proteins within downstream mechanisms of intergenic
disease-associated SNPs. Supplementary information and software: http://lussiergroup.org/publications/disease_class
Keywords: SNP;
Intergenic; Noncoding; Disease class; Biological similarity; Enrichment.
Disease classes curated from disease/trait terms in the NHGRI GWAS catalog
Calculation of SNP ITS
permution of eQTL network