Revision as of 08:09, 5 May 2014

Introduction

Crosslinking induced mutation site or CIMS analysis is a computational method for HITS-CLIP data analysis to determine the exact protein-RNA crosslink sites and thereby map protein-RNA interactions at single-nucleotide resolution. This method is based on the observation that UV cross linked amino-acid-RNA adducts introduce reverse transcription errors in cDNAs at certain frequencies, which are captured by sequencing and comparison of CLIP tags with the reference genome. More details can be found in the following references:

Zhang, C. †, Darnell, R.B. † 2011. Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data.  Nat. Biotech. 29:607-614. 

Moore, J.*, Zhang, C.*, Grantman E.C., Mele, A., Darnell, J.C., Darnell, R.B. 2014. Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat Protocols. 9(2):263-93.  doi:10.1038/nprot.2014.012.

This brief document provides only the most critical information about how to run the program, which complements a more detailed, step-by-step guide described in the second reference above.

Versions

v1.0.1 ( 5-22-2013 ), current
- Minor internal extension
- Included joinWrapper.py which was missing in the previous version
v1.0.0 ( 12-14-2012 )
- The initial public release

Download

Source code:

czplib (perl): a perl library with various functions for genomic/bioinformatic analysis. (download from SourceForge.net)
CIMS (perl): the core algorithm. (download from SourceForge.net)

Installation

Prerequisites

This software is implemented with perl . It also relies on several standard linux/unix tools such as grep, cat, sort, etc. We have tested the software on RedHat Linux, although it is expected to work on most unix-like systems, including Mac OS X.

Steps to install the software

Download the perl library files czplib, if not already.

Decompress it and move it to a place you like

$tar zxvf czplib.v1.0.x.tgz
$mv czplib /usr/local/lib

Add the library path to the environment variable, so perl can find it.

PERL5LIB=/usr/local/lib/czplib

Download CIMS codes, if not already.

Decompress it and move it to a place you like

$tar zxvf CIMS.v1.0.x.tgz
$cd CIMS
$chmod 755 *.pl
$mv CIMS /usr/local/CIMS

Add the dir to your $PATH environment variable.

CIMS analysis

Input files

The key script one needs to run is CIMS.pl, which will take two BED files as input: a list of unique CLIP tags (properly mapped to the reference genome), and the coordinates of mutations (deletions, insertions, or substitutions) in the reference genome and relative the CLIP tags. It is critical to make sure:

analyze one type of mutations at a time.
the 4th column of the mutation BED file should match the name of the CLIP tag in the first BED file.
the coordinates of mutations relative to the CLIP tag (from the 5' end of the Watson strand, 0-based) is correctly specified in the 5' column of the second BED file.
only mutations in unique CLIP tags should be included.

Now you can run something like

perl /usr/local/CIMS/CIMS.pl -v -n 5 -p -FDR 0.001 -c ./cache_del  test.uniq.bed test.uniq.del.bed test.uniq.del.CIMS.txt

The output is a list of CIMS at FDR<0.001, one per line.

The first six columns of this file follow the definition of a BED file, including coordinates and strand of each CIMS. Columns 7-10 are k, m, FDR, and number of sites with m or more tags with mutations given k tags at that position in total (the denominator to calculate FDR, which gives an idea about the precision of the FDR value).

This file can be reordered with the following command:

sort test.uniq.del.CIMS.txt -k 9,9n -k 8,8nr -k 7,7n > test.uniq.del.CIMS.sort.txt

Usage

CIMS.pl [options] <tag.bed> <mutation.bed> <out.txt>

Arguments:

Argument	Description
<tag.bed>	BED file of unique CLIP tags
<mutation.bed>	BED file of mutations in unique CLIP tags. Make sure you paid attention to the notes above
<out.txt>	output file with the list of CIMS

Options:

Option	Description
-big	input files are big (e.g. over 6 million lines)
-n [int]	number of iterations for permutation (default: 5)
-p	track mutation position relative to read start
--no-sparse-correct	no sparcity correction *
-FDR [float]	threshold of FDR (default: 1)
-mkr [float]	threshold of m-over-k-ratio (default: 0)

*This option should not be used in general, but is included to reproduce our earlier analysis. We introduced this feature to eliminate an additional filtering step based on mutation freqeuncy (i.e., the "m" value).

Navigation

Difference between revisions of "Test"

From Zhang Laboratory