Difference between revisions of "Test"

From Zhang Laboratory

Jump to: navigation, search
Line 1: Line 1:
{{#meta: keywords | computational biology; systems biology; RNA splicing and regulatory networks; gene expression }}{{#meta: description | The Chaolin Zhang Laboratory Home Page at Columbia University}}{{#meta: Content-Type | text/html; charset=utf-8 }}
+
=Introduction=
  
<html>
+
Crosslinking induced mutation site or CIMS analysis is a computational method for HITS-CLIP data analysis to determine the exact protein-RNA crosslink sites and thereby map protein-RNA interactions at single-nucleotide resolution.  This method is based on the observation that UV cross linked amino-acid-RNA adducts introduce reverse transcription errors in cDNAs at certain frequencies, which are captured by sequencing and comparison of CLIP tags with the reference genome.  More details can be found in the following references:
 +
<pre>
 +
Zhang, C. †, Darnell, R.B. † 2011. Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data.  Nat. Biotech. 29:607-614.
  
 +
Moore, J.*, Zhang, C.*, Grantman E.C., Mele, A., Darnell, J.C., Darnell, R.B. 2014. Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat Protocols. 9(2):263-93.  doi:10.1038/nprot.2014.012.
 +
</pre>
  
 +
This brief document provides only the most critical information about how to run the program, which complements a more detailed, step-by-step guide described in the second reference above.
  
<script type="text/javascript" src="/data/Jssor/js/Jssor.Core.js"></script>
+
=Versions=
    <script type="text/javascript" src="/data/Jssor/js/Jssor.Debug.js"></script>
+
*v1.0.1 ( 5-22-2013 ), current
    <script type="text/javascript" src="/data/Jssor/js/Jssor.EventManager.js"></script>
+
**Minor internal extension
    <script type="text/javascript" src="/data/Jssor/js/Jssor.Easing.js"></script>
+
**Included joinWrapper.py which was missing in the previous version
    <script type="text/javascript" src="/data/Jssor/js/Jssor.Point.js"></script>
+
*v1.0.0 ( 12-14-2012 )
    <script type="text/javascript" src="/data/Jssor/js/Jssor.Utils.js"></script>
+
**The initial public release
    <script type="text/javascript" src="/data/Jssor/js/Jssor.Navigator.js"></script>
+
    <script type="text/javascript" src="/data/Jssor/js/Jssor.CaptionSlider.js"></script>
+
    <script type="text/javascript" src="/data/Jssor/js/Jssor.Slider.js"></script>
+
    <script type="text/javascript" src="/data/Jssor/js/Jssor.ThumbnailNavigator.js"></script>
+
    <script>
+
        jssor_slider1_starter = function (containerId) {
+
            var _CaptionTransitions = [];
+
            //Left to Right
+
            _CaptionTransitions["L-R"] = { $Duration: 1200, $FlyDirection: 1 };
+
            //Right to Left
+
            _CaptionTransitions["R-L"] = { $Duration: 1200, $FlyDirection: 2 };
+
            //Top to Bottom
+
            _CaptionTransitions["T-B"] = { $Duration: 1200, $FlyDirection: 4 };
+
            //Bottom to Top
+
            _CaptionTransitions["B-T"] = { $Duration: 1200, $FlyDirection: 8 };
+
  
            //To build caption transition code, reference to http://slideshow.jssor.com/documentation/transition-builder-caption.html
+
=Download=
  
            var jssor_slider1 = new $JssorSlider$(containerId, {
+
'''Source code:'''
                $ShowLoading: true,                                //[Optional] Show loading screen or not default value is false
+
                $AutoPlay: true,                                    //[Optional] Whether to auto play, default value is false
+
                $PlayOrientation: 1,                                //[Optional] Orientation to play slide (for auto play, navigation), 1 horizental, 2 vertical, default value is 1
+
                $DragOrientation: 3,                                //[Optional] Orientation to drag slide, 0 no drag, 1 horizental, 2 vertical, 3 either, default value is 1 (Note that the $DragOrientation should be the same as $PlayOrientation when $DisplayPieces is greater than 1, or parking position is not 0)
+
  
                $CaptionSliderOptions: {                            //[Optional] Options which specifies how to animate caption
+
*czplib (perl): a perl library with various functions for genomic/bioinformatic analysis. ([http://sourceforge.net/p/czplib/ download from SourceForge.net])
                    $Class: $JssorCaptionSlider$,                  //[Required] Class to create instance to animate caption
+
*CIMS (perl): the core algorithm. ([http://sourceforge.net/p/ngs-cims/ download from SourceForge.net])
                    $CaptionTransitions: _CaptionTransitions,      //[Required] Caption transitions to play caption, see caption transition section at jssor slideshow transition builder
+
                    $PlayInMode: 1,                                //[Optional] 0 None (no play), 1 Chain (goes after main slide), 2 Parallel (goes synchronous with main slide), default value is 1
+
                    $PlayOutMode: 1                                //[Optional] 0 None (no play), 1 Chain (goes before main slide), 2 Parallel (goes synchronous with main slide), default value is 1
+
                }
+
            });
+
        }
+
    </script>
+
  
<center>
+
=Installation=
    <div id="slider1_container" class="slider1" style="position: relative; width: 400px; height: 200px;">
+
   
+
        <!-- Loading Screen -->
+
        <div u="loading" style="position: absolute; top: 0px; left: 0px;">
+
            <div style="filter: alpha(opacity=70); opacity:0.7; position: absolute; display: block;
+
                background-color: #000000; top: 0px; left: 0px;width: 100%;height:100%;">
+
            </div>
+
            <div style="position: absolute; display: block; background: url(/data/Jssor/img/loading.gif) no-repeat center center;
+
                top: 0px; left: 0px;width: 100%;height:100%;">
+
            </div>
+
        </div>
+
  
        <!-- Slides Container -->
+
==Prerequisites==
        <div u="slides" style="position: absolute; left: 0px; top: 0px; width:400px; height:200px; overflow: hidden;">
+
            <div>
+
                <a u=image href="https://sfari.org" rel="nofollow"><img u="image" src="/data/images/slideshow/SFARI.png" width="400" height="200" /></a>
+
            </div>
+
  
            <div>
+
This software is implemented with perl . It also relies on several standard linux/unix tools such as grep, cat, sort, etc. We have tested the software on RedHat Linux, although it is expected to work on most unix-like systems, including Mac OS X.
                <a u=image href="http://zhanglab.c2b2.columbia.edu/index.php/MCarts_Documentation" rel="nofollow"><img u="image" src="/data/images/slideshow/mCarts.png" width="400" height="200" /></a>
+
            </div>
+
            <div>
+
                <a u=image href="http://zhanglab.c2b2.columbia.edu/index.php/OLego" rel="nofollow"><img u="image" src="/data/images/slideshow/olego.png" width="400" height="200" /></a>
+
            </div>
+
  
<!--
+
==Steps to install the software==
            <div>
+
                <a u=image href="http://zhanglab.c2b2.columbia.edu/index.php/Openings" rel="nofollow"><img u="image" src="/data/images/slideshow/hiring.png" width="400" height="200" /></a>
+
            </div>
+
//-->
+
        </div>
+
        <a style="display:none" href="http://slideshow.jssor.com">Javascript Slideshow</a>
+
        <!-- Trigger -->
+
        <script>
+
            jssor_slider1_starter('slider1_container');
+
        </script>
+
    </div>
+
  
</center>
+
* Download the perl library files czplib, if not already.
</html>
+
  
 +
Decompress it and move it to a place you like
  
'''Introduction of the Zhang Laboratory'''
+
<pre>
 +
$tar zxvf czplib.v1.0.x.tgz
 +
$mv czplib /usr/local/lib
 +
</pre>
  
We are part of [http://cpmcnet.columbia.edu/dept/gsas/biochem/ Department of Biochemistry and Molecular Biophysics], [http://sbi.c2b2.columbia.edu/ Columbia Initiative in Systems Biology], [http://www.columbiamnc.org/ Motor Neuron Center], [http://stemcell.columbia.edu/ Columbia Stem Cell Initiative], and [http://hiccc.columbia.edu Herbert Irving Comprehensive Cancer Center] at [http://http://www.cumc.columbia.edu Columbia University Medical Center].
+
Add the library path to the environment variable, so perl can find it.
 +
<pre>
 +
PERL5LIB=/usr/local/lib/czplib
 +
</pre>
  
We are fascinated by the complexity of the mammalian brain and the underlying molecular mechanisms.  While mammals have a similar number of genes compared to phenotypically simpler organisms (such as worm), one apparent feature of mammalian genes is their more complicated gene structures, providing opportunity of sophisticated regulation at the RNA level. 
+
* Download CIMS codes, if not already.
 +
Decompress it and move it to a place you like
  
The vision of my lab is to infer RNA regulatory networks in the nervous system, as a way to understand the mammlian complexity manifested in evolutionary-developmental (evo-devo) processes and in several neuronal disorders. Specifically we are interested in obtaining fundamental understanding how neuronal cell types are specified during the normal development process, how this process can be reversed in certain pathologic contexts (such as brain tumors), and why they die abnormally in neurodegenerative diseases. My lab will have a mixed dry and wet lab setup (a.k.a. "humid" lab). We use different model systems and a combination of high-throughput data driven and hypothesis driven approaches.
+
<pre>
 +
$tar zxvf CIMS.v1.0.x.tgz
 +
$cd CIMS
 +
$chmod 755 *.pl
 +
$mv CIMS /usr/local/CIMS
 +
</pre>
  
 +
Add the dir to your $PATH environment variable.
  
 +
=CIMS analysis=
 +
==Input files==
  
 +
The key script one needs to run is CIMS.pl, which will take two BED files as input: a list of unique CLIP tags (properly mapped to the reference genome), and the coordinates of mutations (deletions, insertions, or substitutions) in the reference genome and relative the CLIP tags.  It is critical to make sure:
  
 +
* analyze one type of mutations at a time.
 +
* the 4th column of the mutation BED file should match the name of the CLIP tag in the first BED file.
 +
* the coordinates of mutations relative to the CLIP tag (from the 5' end of the Watson strand, 0-based) is correctly specified in the 5' column of the second BED file. 
 +
* only mutations in unique CLIP tags should be included.
  
  
 +
Now you can run something like
  
 +
perl /usr/local/CIMS/CIMS.pl -v -n 5 -p -FDR 0.001 -c ./cache_del  test.uniq.bed test.uniq.del.bed test.uniq.del.CIMS.txt
  
  
<html>
+
The output is a list of CIMS at FDR<0.001, one per line.
<div align="center">
+
  
<a href="http://sbi.c2b2.columbia.edu/"><img src="/data/images/sbi_logo.png" width="223" height="56" /></a>
+
The first six columns of this file follow the definition of a BED file, including coordinates and strand of each CIMS. Columns 7-10 are k, m, FDR, and number of sites with m or more tags with mutations given k tags at that position in total (the denominator to calculate FDR, which gives an idea about the precision of the FDR value).
<a href="http://www.columbiamnc.org/"><img src="/data/images/mnc_logo.gif" width="122" height="35" /></a>
+
<a href="http://www.c2b2.columbia.edu"><img src="/data/images/C2B2_logo.png" width="215" height="60" /></a> <p>
+
  
</div>
+
This file can be reordered with the following command:
</html>
+
 
 +
<pre>
 +
sort test.uniq.del.CIMS.txt -k 9,9n -k 8,8nr -k 7,7n > test.uniq.del.CIMS.sort.txt
 +
</pre>
 +
 
 +
==Usage==
 +
<pre>
 +
CIMS.pl [options] <tag.bed> <mutation.bed> <out.txt>
 +
</pre>
 +
 
 +
Arguments:
 +
{|class="wikitable" width="100%" style="border:1px solid"
 +
|-
 +
!scope="column" width=150|'''Argument'''
 +
|'''Description'''
 +
|-
 +
|<tag.bed>
 +
|BED file of unique CLIP tags
 +
|-
 +
|<mutation.bed>
 +
|BED file of mutations in unique CLIP tags. Make sure you paid attention to the notes above
 +
|-
 +
|<out.txt>
 +
| output file with the list of CIMS
 +
|}
 +
 
 +
 
 +
Options:
 +
{|class="wikitable" width="100%" style="border:1px solid"
 +
|-
 +
!scope="column" width=150|'''Option'''
 +
|'''Description'''
 +
|-
 +
| -big
 +
| input files are big (e.g. over 6 million lines)
 +
|-
 +
| -n [int]
 +
|number of iterations for permutation (default: 5)
 +
|-
 +
| -p
 +
| track mutation position relative to read start
 +
|-
 +
| --no-sparse-correct
 +
| no sparcity correction *
 +
|-
 +
| -FDR [float]
 +
| threshold of FDR (default: 1)
 +
|-
 +
| -mkr [float]
 +
| threshold of m-over-k-ratio (default: 0)
 +
|}
 +
 
 +
<nowiki>*</nowiki>This option should not be used in general, but is included to reproduce our earlier analysis.  We introduced this feature to eliminate an additional filtering step based on mutation freqeuncy (i.e., the "m" value).

Revision as of 08:09, 5 May 2014

Introduction

Crosslinking induced mutation site or CIMS analysis is a computational method for HITS-CLIP data analysis to determine the exact protein-RNA crosslink sites and thereby map protein-RNA interactions at single-nucleotide resolution. This method is based on the observation that UV cross linked amino-acid-RNA adducts introduce reverse transcription errors in cDNAs at certain frequencies, which are captured by sequencing and comparison of CLIP tags with the reference genome. More details can be found in the following references:

Zhang, C. †, Darnell, R.B. † 2011. Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data.  Nat. Biotech. 29:607-614. 

Moore, J.*, Zhang, C.*, Grantman E.C., Mele, A., Darnell, J.C., Darnell, R.B. 2014. Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat Protocols. 9(2):263-93.  doi:10.1038/nprot.2014.012.

This brief document provides only the most critical information about how to run the program, which complements a more detailed, step-by-step guide described in the second reference above.

Versions

  • v1.0.1 ( 5-22-2013 ), current
    • Minor internal extension
    • Included joinWrapper.py which was missing in the previous version
  • v1.0.0 ( 12-14-2012 )
    • The initial public release

Download

Source code:

Installation

Prerequisites

This software is implemented with perl . It also relies on several standard linux/unix tools such as grep, cat, sort, etc. We have tested the software on RedHat Linux, although it is expected to work on most unix-like systems, including Mac OS X.

Steps to install the software

  • Download the perl library files czplib, if not already.

Decompress it and move it to a place you like

$tar zxvf czplib.v1.0.x.tgz
$mv czplib /usr/local/lib

Add the library path to the environment variable, so perl can find it.

PERL5LIB=/usr/local/lib/czplib
  • Download CIMS codes, if not already.

Decompress it and move it to a place you like

$tar zxvf CIMS.v1.0.x.tgz
$cd CIMS
$chmod 755 *.pl
$mv CIMS /usr/local/CIMS

Add the dir to your $PATH environment variable.

CIMS analysis

Input files

The key script one needs to run is CIMS.pl, which will take two BED files as input: a list of unique CLIP tags (properly mapped to the reference genome), and the coordinates of mutations (deletions, insertions, or substitutions) in the reference genome and relative the CLIP tags. It is critical to make sure:

  • analyze one type of mutations at a time.
  • the 4th column of the mutation BED file should match the name of the CLIP tag in the first BED file.
  • the coordinates of mutations relative to the CLIP tag (from the 5' end of the Watson strand, 0-based) is correctly specified in the 5' column of the second BED file.
  • only mutations in unique CLIP tags should be included.


Now you can run something like

perl /usr/local/CIMS/CIMS.pl -v -n 5 -p -FDR 0.001 -c ./cache_del  test.uniq.bed test.uniq.del.bed test.uniq.del.CIMS.txt


The output is a list of CIMS at FDR<0.001, one per line.

The first six columns of this file follow the definition of a BED file, including coordinates and strand of each CIMS. Columns 7-10 are k, m, FDR, and number of sites with m or more tags with mutations given k tags at that position in total (the denominator to calculate FDR, which gives an idea about the precision of the FDR value).

This file can be reordered with the following command:

sort test.uniq.del.CIMS.txt -k 9,9n -k 8,8nr -k 7,7n > test.uniq.del.CIMS.sort.txt

Usage

CIMS.pl [options] <tag.bed> <mutation.bed> <out.txt>

Arguments:

Argument Description
<tag.bed> BED file of unique CLIP tags
<mutation.bed> BED file of mutations in unique CLIP tags. Make sure you paid attention to the notes above
<out.txt> output file with the list of CIMS


Options:

Option Description
-big input files are big (e.g. over 6 million lines)
-n [int] number of iterations for permutation (default: 5)
-p track mutation position relative to read start
--no-sparse-correct no sparcity correction *
-FDR [float] threshold of FDR (default: 1)
-mkr [float] threshold of m-over-k-ratio (default: 0)

*This option should not be used in general, but is included to reproduce our earlier analysis. We introduced this feature to eliminate an additional filtering step based on mutation freqeuncy (i.e., the "m" value).