COPE-PCG predictes variant effects on protein-coding genes in a context-sensitive approach. Firstly, genetic variants are mapped to protein-coding genes derived from a user-supplied reference gene model such as RefSeq. Using the phase information, COPE-PCG handles two haplotypes separately. Then, COPE-PCG tries to reconstruct the “mutant peptide” from the entire inputted variant set. Briefly, COPE-PCG attempts to identify splicing-changing variants (i.e., variants that disrupt existing splice sites or create novel splice sites), and, if a splicing-changing variant is found, new isoforms are inferred accordingly. Finally, COPE-PCG translates all coding sequences into amino acid sequences and compares them against the reference sequence to obtain the final amino acid alterations.
Compared with current variant-centric annotation tools, such as Variant Effect Predictor (VEP), COPE-PCG could avoid several types of loss-of-function annotation errors through integrating the entire sequence context when annotating variants as shown in the following figure. Rescued stop-gained SNV refers to stop-gained variant that is rescued by another SNV in the same codon. Rescued frameshift indel refers to frameshift indel that is rescued by other frameshift indels. Splicing-rescued stop-gained or frameshift variant means stop-gained or frameshift variant that is rescued by novel splicing isoform. Rescued splice-disrupting variant means splice-disrupting variant that is rescued by nearby cryptic sites or novel splice sites. Stop-gained MNV means multi-nucleotide variants (MNV) that could result in a stop codon gain together. The asterisk in the figure strands for a stop codon.
Cheng SJ, Shi FY, Liu H, Ding Y, Jiang S, Liang N, Gao G: Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic acids research 2017. Full Text