CTK usage
From Zhang Laboratory
Contents
fastq_filter.pl
Usage: fastq_filter.pl [options] <in.fq> <output.fq (or out.fa)>
Arguments:
Argument | Description |
---|---|
<in.fq> | input file in FASTQ format |
<out.fq (or out.fa)> | output file of filtered FASTQ format |
Options:
Option | Description |
---|---|
-v | verbose |
-if [string] | fastq format (solexa or [sanger]) |
-index [string] | index position and sequence (e.g. 0:CAGT) |
-f [string] | quality score filter string (method1:start1-end1:score1,method2:start2-end2:score2, zero-based) |
-maxN [int] | max number of N in sequence (default off) |
-of [string] | output format (fasta or [fastq]) |
fastq2collapse.pl
Usage: fastq2collapse.pl [options] <in.fq> <out.fq>
Arguments:
Argument | Description |
---|---|
<in.fq> | input file in FASTQ format of filtered and trimmed reads |
<out.fq> | output file in FASTQ format in which exact PCR duplicates have been collapsed |
Options:
Option | Description |
---|---|
--tmp-dir [string] | temporary directory |
-v | verbose |
stripBarcode.pl
Usage: stripBarcode.pl [options] <in.fastq> <out.fastq>
Arguments:
Argument | Description |
---|---|
<in.fastq> | input file in FASTQ format that contains 5' linker degenerate barcodes |
<out.fastq> | output file in FASTQ format with barcodes removed and attached to sequence ids |
Options:
Option | Description |
---|---|
-len [int] | length of barcode sequences |
-format [string] | input format ([fasta] or fastq) |
--barcode-start-with [string] | filter sequences based on the starting nucleotides in the barcode |
--barcode-end-with [string] | filter sequences based on the ending nucleotides in the barcode |
--tmp-dir [string] | temporary directory |
--keep-cache | keep cache when the job is done |
-v | verbose |
parseAlignment.pl
Usage: parseAlignment.pl [options] <in.sam> <tag.bed>
Arguments:
Argument | Description |
---|---|
<in.sam> | input file in SAM format after alignment/mapping |
<tag.bed> | parsed output file in BED format of tag alignment |
Options:
Option | Description |
---|---|
--mutation-file [string] | file name to save mutations |
--map-qual [int] | MAPQ score (e.g. to keep only uniquely mapped reads) |
--min-len [int] | minimal read length to report |
--indel-to-end [int] | nucleotides from indel to end |
--split-del | split oligo deletion into single nucleotides |
--indel-in-score | count indels in as mismatches reported in the score column |
-v | verbose |
tag2collapse.pl
See more information about model in:
Darnell JC, et al. FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism. Cell. 2011; 146:247–261
Usage: tag2collapse.pl [options] <tag.bed> <tag.uniq.bed>
Arguments:
Argument | Description |
---|---|
<tag.bed> | input file in BED format of CLIP tags |
<tag.uniq.bed> | output file in BED format of unique CLIP tags |
Options:
Option | Description |
---|---|
Input options | |
-big | set when the input file is big |
-weight | consider the weight of each tag |
--weight-in-name | find weight in name |
EM options | |
--random-barcode | random barcode exists, no collapse for different barcodes |
-EM [int] | EM threshold to infer reliability of each collapsed read (when have random linker, -1=no EM) |
--seq-error-model [string] | sequencing error model to use (alignment or [em-local] or em-global or fix=0.01) |
--output-seq-error [file] | output sequencing errors estimated by the EM algorithm |
Output options | |
--keep-max-score | keep the tag with the most weight (instead of the longest one) as representative |
--keep-tag-name | do not change tag name (no extra information) |
Other options | |
-c [string] | cache directory |
--keep-cache | keep cache when the job is done |
-d | debug (on or [off]) |
-v | verbose |
selectRow.pl
Usage: selectRow.pl [options] <mutation.txt> <tag.uniq.bed> > <tag.uniq.mutation.txt>
Arguments:
Argument | Description |
---|---|
<mutation.txt> | input file in txt format of mutations from alignment parsing |
<tag.uniq.bed> | input file in BED format of unique CLIP tags |
<tag.uniq.mutation.txt> | output file in txt format of unique mutations |
Options:
Option | Description |
---|---|
-h | the file has a header to be included |
-q [int] | query column id (zero-based) (default=0) |
-f [int] | filter column id (zero-based) (default=0) |
-i | ignore case (default=off) |
-p | print the key without matches (default=off) |
-pt | the text to print when no match exist (No match found) |
-s | print in single line if there are multiple matches (default=off) |
-d | delimiter when -s is specified |
-ss | print only one match if there are multiple matches (default=off) |
bed2rgb.pl
Usage: bed2rgb.pl [options] <tag.uniq.bed> <tag.uniq.rgb.bed>
Arguments:
Argument | Description |
---|---|
<tag.uniq.bed> | input file in BED format of unique CLIP tags |
<tag.uniq.rgb.bed> | output file in BED format of rgb colored unique CLIP tags |
Options:
Option | Description |
---|---|
-col [string] | color by name or rgb (blue or red or green...r,g,b) |
-v | verbose |
bed2annotation.pl
Usage: bed2annotation.pl [options] <tag.uniq.annot.summary.txt (optional)><tag.uniq.rgb.bed (or tag.uniq.bed)> <tag.uniq.annot.txt>
Arguments:
Argument | Description |
---|---|
<tag.uniq.annot.summary.txt> (optional) | output file in txt format of a summary of the annotation if -summary option is used |
<tag.uniq.rgb.bed (or tag.uniq.bed)> | input file in BED format of unique CLIP tags (with or without rgb color) |
<tag.uniq.annot.txt> | output file in txt format of annotation |
Options:
Option | Description |
---|---|
-conf [file] | configuration file with input datasets |
-dbkey [string] | genome build name (hg19 or mm10) |
-ss | consider the two strands separately when possible |
-big | big file |
-gene | annotate overlapping gene (id and symbol) |
-rmsk | annotate overlapping RepeatMasked sequences (type and %) |
-miRNA | annotate miRNA (miRNA_name) |
-region | annotate genomic breakdown |
-custom [file] | annotate custom features in the provided BED file |
--custom-name [string] | naming the custom feature |
--custom-summary [string] | method to summarize custom annotation ([all]or max_num or min_num or max_overlap) |
-summary [file] | print summary information |
-c [string] | cache directory |
--keep-cache | keep cache when the job is done |
-v | verbose |
tag2peak.pl
Usage: tag2peak.pl [options] <tag.uniq.bed> <peak.bed>
Arguments:
Argument | Description |
---|---|
<tag.uniq.bed> (with or without rgb) | input file in BED format of unique CLIP tags |
<peak.bed> | output file in BED format of peaks called |
Optional arguments | |
<tag.uniq.peak.boundary.bed> | output file in BED format of cluster boundaries (if --out-boundary option used) |
<tag.uniq.peak.halfPH.bed> | output file in BED format of half peak height boundaries (if --out-half-PH used) |
Options:
Option | Description |
---|---|
-big | big input file |
-ss | separate the two strands |
--valley-seeking | find candidate peaks by valley seeking |
--valley-depth [float] | depth of valley if valley seeking (between 0.5 and 1, default=0.9) |
--out-boundary [string] | output cluster boundaries |
--out-half-PH [string] | output half peak height boundaries |
--dbkey [string] | species to retrieve the default gene bed file |
--gene [string] | custom gene bed file for scan statistics (will override --dbkey) |
--use-expr | use expression levels given in the score column in the gene bed file for normalization |
-p [float] | threshold of p-value to call peak (e.g. 0.01) |
--multi-test | do Bonferroni multiple test correction |
-minPH [int] | min peak height |
-maxPH [int] | max peak height |
-gap [int] | merge cluster peaks closer than the gap (-1, no merge if < 0) |
--prefix [string] | prefix of peak id (Peak) (so output file will look like Peak1, Peak2, etc) |
-c [dir] | cache dir |
--keep-cache | keep cache when the job done |
-v | verbose |
tag2profile.pl
Usage: tag2profile.pl [options] <tag.uniq.bed> <peak.sig.boundary.count.bed>
Arguments:
Argument | Description |
---|---|
<tag.uniq.bed> | input file in BED format of unique CLIP tags (with or without rgb color) |
<peak.sig.boundary.count.bed> | output file in BED format of peak profile |
<peak2.sig.boundary.count.bed> (Optional argument) | for wig format output, specify two files to separate the two strands |
Options:
Option | Description |
---|---|
Input options | |
-big | big file |
-minBlockSize [int] | minimum number of lines to read in each block for a big file |
-weight | weight counts according to the score of each tag |
-weight-avg | weight average the score of each tag |
-ss | separate strand |
-ext5 [int] | extension of tags at the 5' end |
-ext3 [int] | extension of tags at the 3' end |
-chrLen [string] | chrom length file |
Profile options | |
-region [file] | a bed file with regions to count tag numbers. If not specified, count in moving windows |
-exact | exact count at each nucleotide |
-w [int] | window size |
-s [int] | step size |
--normalize [string] | normalization ([none] or rpkm or multiply={1.3}) |
Output options | |
-of [string] | output format ([bed] or bedgraph or sgr) |
-nz | don't print zeroes (works for sgr and bed) |
-n [string] | track name |
-c [string] | cache directory |
--keep-cache | keep cache when the job is done |
-v | verbose |
CIMS.pl
Usage: CIMS.pl [options] <tag.uniq.bed> <CIMS.mutation.bed> <CIMS.mutation.txt>
Arguments:
Argument | Description |
---|---|
<tag.uniq.bed> | input file in BED format of unique CLIP tags (with or without rgb) |
<CIMS.mutation.bed> | output file in BED format of deletions, insertions, or substitutions (CIMS) |
<CIMS.mutation.txt> | output file in txt format of CIMS |
Options:
Option | Description |
---|---|
-big | big file |
-w [int] | mutation size |
-n [int] | number of iterations for permutation |
-p | track mutation position relative to read start |
--no-sparse-correct | no sparcity correction |
-FDR [float] | threshold of FDR |
-mkr [float] | threshold of m-over-k-ratio |
-c [dir] | cache directory |
--keep-cache | keep cache when the job is done |
-v | verbose |
removeRow.pl
Usage: removeRow.pl [options] <tag.uniq.bed> <mutation.bed> > <tag.uniq.clean.bed>
Arguments:
Argument | Description |
---|---|
<tag.uniq.bed> | input file in BED format of unique CLIP tags |
<mutation.bed> | input file in BED format of CIMS |
<tag.uniq.clean.bed> | output file in BED format in which CLIP tags with deletions are removed |
Options:
Option | Description |
---|---|
-q [int] | query column id (zero-based) (default=0) |
-f [int] | filter column id (zero-based) (default=0) |
-i | ignore case (default=off) |
-r | reverse mode |
-v | verbose |
bedExt.pl
Usage: bedExt.pl [options] <tag.uniq.clean.bed> <tag.uniq.clean.trunc.bed>
Arguments:
Argument | Description |
---|---|
<tag.uniq.clean.bed> | input file in BED format in which CLIP tags with deletions were removed |
<tag.uniq.clean.trunc.bed> | output file in BED format extended around start site as a potential cross link site that causes truncation |
Options:
Option | Description |
---|---|
-n | get neighbor region relative to <up or down or r=100> |
-l | extension on the left |
-r | extension on the right |
-chrLen [string] | chromosome length file |
-v | verbose |
tag2cluster.pl
Usage: tag2cluster.pl [options] <tag.uniq.bed> <cluster.bed>
Arguments:
Argument | Description |
---|---|
<tag.uniq.bed> | input file in BED format of unique CLIP tags |
<cluster.bed> | output file in BED format of clustered overlapping CLIP tags |
Options:
Option | Description |
---|---|
-big | big file |
-s | same strand required |
-weight | consider the weight of each tag |
--weight-in-name | find weight in name |
-maxgap [int] | the max gap to be considered as an overlap |
-collapse [int] | collapse mode (0: no collapse; 1: collapse if match both ends; 2: collapse if one is in another) |
-overlap [float] | overlap fraction, effective unless collapse |
-of [string] | output format ([bed] or wig) |
-c [string] | cache directory |
--keep-cache | keep cache when the job is done |
-d | debug (on or [off]) |
-v | verbose |