Difference between revisions of "MCarts Documentation"

From Zhang Laboratory

Jump to: navigation, search
(Introduction)
(Download)
Line 21: Line 21:
 
'''Source code:'''  
 
'''Source code:'''  
  
*czplib:
+
*czplib (perl): a perl library with various functions for genomic/bioinformatic analysis
*mCarts:
+
*mCarts (perl): the core algorithm
*PatternMatch:
+
*PatternMatch (c/c++): a handy tool to search individual motif sites based on consensus. It supports degeneracy and mismatches
*RegExpMatch:
+
*RegExpMatch (c/c++): a handy tool to search individual motif sites based on regular expression
  
 
'''Library data: '''
 
'''Library data: '''

Revision as of 15:10, 19 September 2012


Prediction of clustered RNA-binding protein motif sites in the mammalian genome Chaolin Zhang,1,* Kuang-Yung Lee2,3, Maurice S. Swanson2, Robert B. Darnell1,*

1 Laboratory of Molecular Neuro-Oncology, Howard Hughes Medical Institute, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA 2 Department of Molecular Genetics and Microbiology and the Center for NeuroGenetics, University of Florida, College of Medicine, Gainesville, FL 32610, USA 3 Department of Neurology, Chang Gung Memorial Hospital, Keelung, Taiwan

* Corresponding authors


Introduction

mCarts is a hidden Markov model (HMM) based methods to predict clusters RNA motif sites.

Many RBPs recognize very short and degenerate sequences, with targeting specificity achieved by mechanisms such as synergistic binding to multiple clustered sites and modulation of site accessibility through different RNA-secondary structures. mCarts integrates the number and spacing of individual motif sites, their accessibility and conservation, which substantially improves signal to noise ratio. This algorithm learns and quantifies rules of these features, taking advantage of a large number of in vivo RBP binding sites obtained from high throughput sequencing of RNAs isolated by cross-linking and immunoprecipitation (HITS-CLIP). We applied this algorithm to study two representative RBPs, Nova and Mbnl. Despite the very low information content in individual motif elements, our algorithm made very specific predictions for successful experimental validation.

Download

Source code:

  • czplib (perl): a perl library with various functions for genomic/bioinformatic analysis
  • mCarts (perl): the core algorithm
  • PatternMatch (c/c++): a handy tool to search individual motif sites based on consensus. It supports degeneracy and mismatches
  • RegExpMatch (c/c++): a handy tool to search individual motif sites based on regular expression

Library data:

  • mm9 (15 Gb compressed /109 Gb uncompressed)
  • hg18 (15 Gb compressed /212 Gb uncompressed)

Installation

Get started