Difference between revisions of "CTK usage"

From Zhang Laboratory

Jump to: navigation, search
(Created page with "=fastq_filter.pl= <pre> Usage: fastq_filter.pl [options] <in.fq> <output.fq (or out.fa)> </pre> Arguments: {| class="wikitable" |- ! Argument ! Description |- | <in.fq> | inp...")
 
Line 172: Line 172:
  
 
=tag2collapse.pl=
 
=tag2collapse.pl=
 +
 +
See more information about model in:
 +
<pre>
 +
Darnell JC, et al. FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism. Cell. 2011; 146:247–261
 +
</pre>
 
<pre>
 
<pre>
 
Usage: tag2collapse.pl [options] <tag.bed> <tag.uniq.bed>
 
Usage: tag2collapse.pl [options] <tag.bed> <tag.uniq.bed>
Line 450: Line 455:
 
|-
 
|-
 
| --valley-depth [float]
 
| --valley-depth [float]
| depth of valley if valley seeking (>=0.9)
+
| depth of valley if valley seeking (between 0.5 and 1, default=0.9)
 
|-
 
|-
 
| --out-boundary [string]
 
| --out-boundary [string]
Line 491: Line 496:
 
| verbose
 
| verbose
 
|}
 
|}
 
  
 
=tag2profile.pl=
 
=tag2profile.pl=

Revision as of 10:57, 5 October 2016

fastq_filter.pl

Usage: fastq_filter.pl [options] <in.fq> <output.fq (or out.fa)>

Arguments:

Argument Description
<in.fq> input file in FASTQ format
<out.fq (or out.fa)> output file of filtered FASTQ format


Options:

Option Description
-v verbose
-if [string] fastq format (solexa or [sanger])
-index [string] index position and sequence (e.g. 0:CAGT)
-f [string] quality score filter string (method1:start1-end1:score1,method2:start2-end2:score2, zero-based)
-maxN [int] max number of N in sequence (default off)
-of [string] output format (fasta or [fastq])

fastq2collapse.pl

Usage: fastq2collapse.pl [options] <in.fq> <out.fq>

Arguments:

Argument Description
<in.fq> input file in FASTQ format of filtered and trimmed reads
<out.fq> output file in FASTQ format in which exact PCR duplicates have been collapsed

Options:

Option Description
--tmp-dir [string] temporary directory
-v verbose


stripBarcode.pl

Usage: stripBarcode.pl [options] <in.fastq> <out.fastq>

Arguments:

Argument Description
<in.fastq> input file in FASTQ format that contains 5' linker degenerate barcodes
<out.fastq> output file in FASTQ format with barcodes removed and attached to sequence ids

Options:

Option Description
-len [int] length of barcode sequences
-format [string] input format ([fasta] or fastq)
--barcode-start-with [string] filter sequences based on the starting nucleotides in the barcode
--barcode-end-with [string] filter sequences based on the ending nucleotides in the barcode
--tmp-dir [string] temporary directory
--keep-cache keep cache when the job is done
-v verbose


parseAlignment.pl

Usage: parseAlignment.pl [options] <in.sam> <tag.bed>

Arguments:

Argument Description
<in.sam> input file in SAM format after alignment/mapping
<tag.bed> parsed output file in BED format of tag alignment

Options:

Option Description
--mutation-file [string] file name to save mutations
--map-qual [int] MAPQ score (e.g. to keep only uniquely mapped reads)
--min-len [int] minimal read length to report
--indel-to-end [int] nucleotides from indel to end
--split-del split oligo deletion into single nucleotides
--indel-in-score count indels in as mismatches reported in the score column
-v verbose


tag2collapse.pl

See more information about model in:

Darnell JC, et al. FMRP Stalls Ribosomal Translocation on mRNAs Linked to Synaptic Function and Autism. Cell. 2011; 146:247–261
Usage: tag2collapse.pl [options] <tag.bed> <tag.uniq.bed>

Arguments:

Argument Description
<tag.bed> input file in BED format of CLIP tags
<tag.uniq.bed> output file in BED format of unique CLIP tags

Options:

Option Description
Input options
-big set when the input file is big
-weight consider the weight of each tag
--weight-in-name find weight in name
EM options
--random-barcode random barcode exists, no collapse for different barcodes
-EM [int] EM threshold to infer reliability of each collapsed read (when have random linker, -1=no EM)
--seq-error-model [string] sequencing error model to use (alignment or [em-local] or em-global or fix=0.01)
--output-seq-error [file] output sequencing errors estimated by the EM algorithm
Output options
--keep-max-score keep the tag with the most weight (instead of the longest one) as representative
--keep-tag-name do not change tag name (no extra information)
Other options
-c [string] cache directory
--keep-cache keep cache when the job is done
-d debug (on or [off])
-v verbose

selectRow.pl

Usage: selectRow.pl [options] <mutation.txt> <tag.uniq.bed> > <tag.uniq.mutation.txt>

Arguments:

Argument Description
<mutation.txt> input file in txt format of mutations from alignment parsing
<tag.uniq.bed> input file in BED format of unique CLIP tags
<tag.uniq.mutation.txt> output file in txt format of unique mutations

Options:

Option Description
-h the file has a header to be included
-q [int] query column id (zero-based) (default=0)
-f [int] filter column id (zero-based) (default=0)
-i ignore case (default=off)
-p print the key without matches (default=off)
-pt the text to print when no match exist (No match found)
-s print in single line if there are multiple matches (default=off)
-d delimiter when -s is specified
-ss print only one match if there are multiple matches (default=off)

bed2rgb.pl

Usage: bed2rgb.pl [options] <tag.uniq.bed> <tag.uniq.rgb.bed>

Arguments:

Argument Description
<tag.uniq.bed> input file in BED format of unique CLIP tags
<tag.uniq.rgb.bed> output file in BED format of rgb colored unique CLIP tags

Options:

Option Description
-col [string] color by name or rgb (blue or red or green...r,g,b)
-v verbose


bed2annotation.pl

Usage: bed2annotation.pl [options] <tag.uniq.annot.summary.txt (optional)><tag.uniq.rgb.bed (or tag.uniq.bed)> <tag.uniq.annot.txt>

Arguments:

Argument Description
<tag.uniq.annot.summary.txt> (optional) output file in txt format of a summary of the annotation if -summary option is used
<tag.uniq.rgb.bed (or tag.uniq.bed)> input file in BED format of unique CLIP tags (with or without rgb color)
<tag.uniq.annot.txt> output file in txt format of annotation

Options:

Option Description
-conf [file] configuration file with input datasets
-dbkey [string] genome build name (hg19 or mm10)
-ss consider the two strands separately when possible
-big big file
-gene annotate overlapping gene (id and symbol)
-rmsk annotate overlapping RepeatMasked sequences (type and %)
-miRNA annotate miRNA (miRNA_name)
-region annotate genomic breakdown
-custom [file] annotate custom features in the provided BED file
--custom-name [string] naming the custom feature
--custom-summary [string] method to summarize custom annotation ([all]or max_num or min_num or max_overlap)
-summary [file] print summary information
-c [string] cache directory
--keep-cache keep cache when the job is done
-v verbose

tag2peak.pl

Usage: tag2peak.pl [options] <tag.uniq.bed> <peak.bed>

Arguments:

Argument Description
<tag.uniq.bed> (with or without rgb) input file in BED format of unique CLIP tags
<peak.bed> output file in BED format of peaks called
Optional arguments
<tag.uniq.peak.boundary.bed> output file in BED format of cluster boundaries (if --out-boundary option used)
<tag.uniq.peak.halfPH.bed> output file in BED format of half peak height boundaries (if --out-half-PH used)

Options:

Option Description
-big big input file
-ss separate the two strands
--valley-seeking find candidate peaks by valley seeking
--valley-depth [float] depth of valley if valley seeking (between 0.5 and 1, default=0.9)
--out-boundary [string] output cluster boundaries
--out-half-PH [string] output half peak height boundaries
--gene [string] gene bed file for scan statistics
--use-expr use expression levels given in the score column in the gene bed file for normalization
-p [float] threshold of p-value to call peak (e.g. 0.01)
--multi-test do Bonferroni multiple test correction
-minPH [int] min peak height
-maxPH [int] max peak height
-gap [int] merge cluster peaks closer than the gap (-1, no merge if < 0)
--prefix [string] prefix of peak id (Peak) (so output file will look like Peak1, Peak2, etc)
-c [dir] cache dir
--keep-cache keep cache when the job done
-v verbose

tag2profile.pl

Usage: tag2profile.pl [options] <tag.uniq.bed> <peak.sig.boundary.count.bed> 

Arguments:

Argument Description
<tag.uniq.bed> input file in BED format of unique CLIP tags (with or without rgb color)
<peak.sig.boundary.count.bed> output file in BED format of peak profile
<peak2.sig.boundary.count.bed> (Optional argument) for wig format output, specify two files to separate the two strands

Options:

Option Description
Input options
-big big file
-minBlockSize [int] minimum number of lines to read in each block for a big file
-weight weight counts according to the score of each tag
-weight-avg weight average the score of each tag
-ss separate strand
-ext5 [int] extension of tags at the 5' end
-ext3 [int] extension of tags at the 3' end
-chrLen [string] chrom length file
Profile options
-region [file] a bed file with regions to count tag numbers. If not specified, count in moving windows
-exact exact count at each nucleotide
-w [int] window size
-s [int] step size
--normalize [string] normalization ([none] or rpkm or multiply={1.3})
Output options
-of [string] output format ([bed] or bedgraph or sgr)
-nz don't print zeroes (works for sgr and bed)
-n [string] track name
-c [string] cache directory
--keep-cache keep cache when the job is done
-v verbose


CIMS.pl

Usage: CIMS.pl [options] <tag.uniq.bed> <CIMS.mutation.bed> <CIMS.mutation.txt>

Arguments:

Argument Description
<tag.uniq.bed> input file in BED format of unique CLIP tags (with or without rgb)
<CIMS.mutation.bed> output file in BED format of deletions, insertions, or substitutions (CIMS)
<CIMS.mutation.txt> output file in txt format of CIMS

Options:

Option Description
-big big file
-w [int] mutation size
-n [int] number of iterations for permutation
-p track mutation position relative to read start
--no-sparse-correct no sparcity correction
-FDR [float] threshold of FDR
-mkr [float] threshold of m-over-k-ratio
-c [dir] cache directory
--keep-cache keep cache when the job is done
-v verbose


removeRow.pl

Usage: removeRow.pl [options] <tag.uniq.bed> <mutation.bed> > <tag.uniq.clean.bed>

Arguments:

Argument Description
<tag.uniq.bed> input file in BED format of unique CLIP tags
<mutation.bed> input file in BED format of CIMS
<tag.uniq.clean.bed> output file in BED format in which CLIP tags with deletions are removed

Options:

Option Description
-q [int] query column id (zero-based) (default=0)
-f [int] filter column id (zero-based) (default=0)
-i ignore case (default=off)
-r reverse mode
-v verbose

bedExt.pl

Usage: bedExt.pl [options] <tag.uniq.clean.bed> <tag.uniq.clean.trunc.bed>

Arguments:

Argument Description
<tag.uniq.clean.bed> input file in BED format in which CLIP tags with deletions were removed
<tag.uniq.clean.trunc.bed> output file in BED format extended around start site as a potential cross link site that causes truncation

Options:

Option Description
-n get neighbor region relative to <up or down or r=100>
-l extension on the left
-r extension on the right
-chrLen [string] chromosome length file
-v verbose

tag2cluster.pl

Usage: tag2cluster.pl [options] <tag.uniq.bed> <cluster.bed>

Arguments:

Argument Description
<tag.uniq.bed> input file in BED format of unique CLIP tags
<cluster.bed> output file in BED format of clustered overlapping CLIP tags

Options:

Option Description
-big big file
-s same strand required
-weight consider the weight of each tag
--weight-in-name find weight in name
-maxgap [int] the max gap to be considered as an overlap
-collapse [int] collapse mode (0: no collapse; 1: collapse if match both ends; 2: collapse if one is in another)
-overlap [float] overlap fraction, effective unless collapse
-of [string] output format ([bed] or wig)
-c [string] cache directory
--keep-cache keep cache when the job is done
-d debug (on or [off])
-v verbose