A method to predict the impact of regulatory variants from DNA sequence

Dongwon Lee*, David U. Gorkin*, Maggie Baker, Benjamin J. Strober, Alessandro L. Asoni, Andrew S. McCallion, Michael A. Beer

*These authors contributed equally to this work.
Correspondence should be addressed to: Michael A. Beer (mbeer AT jhu DOT edu) or Andrew S. McCallion (andy AT jhmi DOT edu)

Most variants implicated in common human disease by Genome-Wide Association Studies (GWAS) lie in non-coding sequence intervals. Despite the suggestion that regulatory element disruption represents a common theme, identifying causal risk variants within implicated genomic regions remains a major challenge. Here we present a new sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) which encodes cell-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic context and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. Previously validated GWAS SNPs yield large deltaSVM scores, and we predict new risk-conferring SNPs for several autoimmune diseases. Thus, deltaSVM provides a powerful computational approach to systematically identify functional regulatory variants.

Citation

If you use deltaSVM, please cite as:
Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, Beer MA. 2015. A method to predict the impact of regulatory variants from DNA sequence. Nat Genet. advance online publication. doi:10.1038/ng.3331

Notes

Supplementary data

If you have any questions, please contact Dongwon Lee at dwlee AT jhu DOT edu.

last updated: 9/5/2015