COPE is mainly written by Perl. Several Perl modules are required as followed.
Besides, a function in COPE is implemented by a Python module NetworkX. Needle and bedtools are also included in the pipeline of COPE. For variants without phase information, we recommend our phasing script which bases on HapCUT and SHAPEIT2.
Unpack the pakeage you have downloaded.
tar -zxvf COPE.tar.gz
This will configure serveral paths for different requrements.
You should replace the following paths in COPE.pl
with the real destination in your own system.
my $bedtools_path="/lustre/tools_centos6.3/tools";
my $needle_path="/rd/build/EMBOSS/install/bin";
Usage:
perl COPE.pl --sample sampleName --variant example.vcf [--unphase] [--thread <int>] [--context <int>] [--splicegain] [--ensembl] [--hg38]
There should be three subdirectories in your working directory: input, resource and script. All input files should be stored in the subdirectory named input. The annotation resources used by COPE are all stored in resource subdirectory. Particularly, users should put a fasta file with reference genome sequence named hg19.fa or hg38.fa into resource subdirectory. The scripts are stored in script subdirectory.
Flag | Alternate | Description |
---|---|---|
--sample |
-s |
The name of the Project. COPE will make a directory based on the option and the output of annotation will be stored in this fold. |
--variant |
-v |
The VCF must be sorted by chromosome and position. Besides, the VCF file should be put in the |
--unphase |
-u |
This option force COPE to handle variants without phase information. Warning: we don't recommend this option. Default off |
--thread |
-t |
The number of CPUs used for prediction novel protein sequences. Default 10 |
--context
|
-c |
The region used for searching cryptic splicing sites. Default 100 |
--splicegain
|
|
The option is used to decide whether to predict novel splice site gain in the whole gene region. Default off |
--ensembl
|
|
The option is used to choose the Ensembl gene model. (default RefSeq) |
--hg38
|
|
The option is used to choose the hg38 version of human genome assembly. Default hg19 |
--bed
|
|
Write output in BED format. Default off |
--score
|
|
Report the splice score change for splice-disrupting variant. Default off |
--splicing_confident_score
|
|
Report the absolute splice motif score for identified novel potential splice site. Default off |
NM_001005484 OR4F5 0|1 S 305 113-113:F->C; 1:69428:69428:T:G:1|1 NM_001005484 OR4F5 1|0 S 305 113-113:F->C; 1:69428:69428:T:G:1|1 NM_152228 TAS1R3 0|1 O 852 757-757:C->R; 1:1269554:1269554:T:C:1|1,1:1268847:1268847:T:G:1|1 NM_152228 TAS1R3 1|0 O 852 757-757:C->R; 1:1269554:1269554:T:C:1|1,1:1268847:1268847:T:G:1|1
Each line contains the following data:
--splicegain
.--bed
option. This is an example:
1 69090 70008 Transcript=NM_001005484;Gene=OR4F5;Haplotype=0|1;Splicing_Code=S;Protein_Length=305;Amino_Acid=113-113:F->C;Variant=1:69428:69428:T:G:1|1; 1 69090 70008 Transcript=NM_001005484;Gene=OR4F5;Haplotype=1|0;Splicing_Code=S;Protein_Length=305;Amino_Acid=113-113:F->C;Variant=1:69428:69428:T:G:1|1; 1 1266725 1269844 Transcript=NM_152228;Gene=TAS1R3;Haplotype=0|1;Splicing_Code=O;Protein_Length=852;Amino_Acid=757-757:C->R;Variant=1:1269554:1269554:T:C:1|1,1:1268847:1268847:T:G:1|1; 1 1266725 1269844 Transcript=NM_152228;Gene=TAS1R3;Haplotype=1|0;Splicing_Code=O;Protein_Length=852;Amino_Acid=757-757:C->R;Variant=1:1269554:1269554:T:C:1|1,1:1268847:1268847:T:G:1|1;
Each line contains the following data: