COPE-TFBS predictes variant effects on TFBSs by using a context-sensitive approach. Personalized genomic sequences reconstructed from input variants with phase information are used for variant annotation. There are two categories included in COPE-TFBS: TFBS breaking and TFBS gain. For TFBS breaking annotation, COPE-TFBS aims at identifying variants within known TFBSs that decrease the position weight matrix (PWM) scores, which could potentially modify the binding affinity of transcription factors, or even destroy the binding event. For TFBS gain annotation, COPE-TFBS scans and statistically evaluates all potential novel TFBSs created by mutated alleles in gene promoter regions. Different from traditional methods, COPE-TFBS will consider all variants within a known TFBS simultaneously.
By systematically reanalyzing the resequencing data from both the 1000 Genomes Project and the GTEx Project, we found that multiple variants within the same TFBS may interfere with each other and result in complex compound effects that differ from individual effects: The putative novel TFBS that is created by multiple variants within a non-TFBS locus; The transformed TFBS that is transformed from a known TFBS by multiple variants; The discordantly annotated TFBS that is created by multiple variants that influence the binding affinity of the TF and TFBS interaction differently. The brown arrow in discordantly annotated TFBS means variant effects on PWM score (up means increase PWM score and down means decrease PWM score).
Many tools (Coetzee et al., 2015; Fu et al., 2014; Kumar et al., 2017; Ward and Kellis, 2016; Zuo et al., 2015) aiming to predicting variant effects on TFBSs have been widely used to interpret the influence of variants within known TFBSs. However, on the assumption that each variant works in isolation, these tools annotate each variant independently, and fail to handle the compound effects caused by multiple variants at the same site. Compared with FunSeq2, a total of 1,502 putative novel TFBSs, 266 transformed events and 85,825 discordantly annotated TFBSs were identified by COPE-TFBS in the 1000 Genome Project and GTEx Project (Supplementary File).
Cheng SJ, Jiang S, Shi FY, Ding Y, Gao G. Systematically identify and annotate multiple-variant compound effect at transcription factor binding sites in human genome. (Under Review)