Splicescope Documentation
From Zhang Laboratory
Contents
Introduction
Splicescope is a tool for predicting neuronal maturation based on the splicing profile as measured by RNA-sequencing (RNA-Seq). When the RNA-Seq data of a query sample is provided, Splicescope will determine its similarity to a reference and assign the query sample to the nearest maturation stage using an algorithm we developed for this purpose. We have demonstrated that this method can accurately predict maturation stages both from tissue samples derived from various brain regions and from cultured neurons differentiated from stem cells in vitro. In addition, Splicescope can evaluate the maturation stage by considering only alternative exons regulated by specific splicing factors that are important for developmental switches at different stages.
Our current version of the software support analysis of mouse data, and we plan to extend to other species in the near future.
More details about this work can be found in the following paper:
Weyn-Vanhentenryck, Feng, et al. Precise temporal regulation of alternative splicing during neural development. Accepted by Nature Communications
Versions
- v1.0.1 ( 05-22-2018)
- minor bug fix
- v1.0.0 ( 05-08-2018 )
- The initial public release
Software installation
Prerequisites
This software is implemented with Perl and R. We have tested the software on RedHat Linux, although it is expected to work on most Unix-like systems, including Mac OS X. The Splicescope package requires the following packages to be installed:
- R (version 3.0.0 and higher).
- R packages: ggplot2, betareg, and getopt.
The Splicescope software package can accept as input either an exon splicing matrix file or junction files in BED format from any RNA-Seq mapping software (e.g., Tophat).
- To generate the exon splicing matrix, we recommend the OLego and Quantas pipeline we developed. This pipeline has been extensively used in our work and comes with the flexibility to deal with single- or pair-end, stranded or unstranded libraries.
- To use junction bed files from other mapping software as input, install Quantas and download the mm10 annotation files. Please see http://zhanglab.c2b2.columbia.edu/index.php/Quantas for more details.
Splicescope installation and preparation
1. Install required R packages:
$R >install.packages(c("betareg", "ggplot2", "getopt"))
2. Install Quantas and download mm10 annotation files. Please see http://zhanglab.c2b2.columbia.edu/index.php/Quantas for more detail.
3. Install the Splicescope package. Download the files from github (e.g. into ~/src) and run from that directory:
$R CMD INSTALL splicescope-v1.x.x.tar.gz
We assume the Splicescope source code is under ~/src/ for this documentation. If you would like to directly access the commandline version script (splicescope.R) decompress the source file:
$tar zxvf splicescope-v1.x.x.tar.gz
Usage
You can run the following command to show descriptions of arguments, input and output format.
$Rscript ~/src/splicescope/splicescope.R
Arguments:
Argument | Description |
---|---|
-b, bedfile | Junction bed file |
-n, samplenames | Text file with all sample names (must be specified if using -b) |
-s, splicingmatrix | Text file with exon inclusion ratio matrix (cannot be specified if using -b) |
-o, outputfile | Output zip file (e.g. output.zip) |
-q, Quantas dir | The Quantas directory (e.g. /usr/local/src/quantas/countit/, must be specified if using -b) |
-a, Annotation dir | The annoation directory (e.g. /usr/local/src/mm10/, must be specified if using -b) |
Options:
Options | Description |
---|---|
-l, sample label | Output pdf format plot with sample label([1-On]) |
-c, cache dir | Path to write temporary file |
-v, verbose | Verbose mode |
-h, help | Print usage |
Please note that these two input formats allowed by Splicescope can be specified by either -s (for splice matrix) or -q (for junction BED files), but not at the same time.
Usage 1: Input is a splicing matrix generated by the OLego and Quantas pipeline
The test data in the splicing matrix format can be downloaded from http://zhanglab.c2b2.columbia.edu/data/Splicescope/samplematrix.txt.gz
$Rscript ~/src/Splicescope/splicescope.R [options] -s <in.splicingmatrix> -o <out.zip>
For an example, the first few lines of an input splicing matrix file is shown below (header line formatted for clarity):
event_id Name Sample1 Sample2 Sample3 CA-100036521-14294-117618-117680-170683[INC][40/1][DNT] 100036521//Gm16039 1 0.938 1 CA-100036521-117680-144789-144867-170683[INC][3/42][DNT] 100036521//Gm16039 0.05 0.014 0
The output is a compressed folder, which contains an html file summarizing the results.
Usage 2: junction bed files by OLego or other mapping software as input
The test data for junction bed files can be downloaded from http://zhanglab.c2b2.columbia.edu/data/Splicescope/samplebed.zip
$Rscript splicescope.R [options] -q quantas/ -a mm10/ -b <a.bed>,<b.bed>... -n samplenames.txt -o <out.zip>
Note:
- The option -q specifies the path to the Quantas package and -a specifies the folder containing the annotation files required by Quantas (only mm10 allowed for now).
- The option -q specifies a list of input exon junction files, separated by comma (without space)
Example junction BED file:
chr1 3207264 3213485 JUNC00000001 1 - 3207264 3213485 255,0,0 2 53,47 0,6174 chr1 3216873 3421784 JUNC00000002 5 - 3216873 3421784 255,0,0 2 95,83 0,204828
- The option -q specifies sample names corresponding to each exon junction bed file.
Example sample name file with two bedfiles, a.bed and b.bed, as input (columns are tab-delimited):
a name1 b name2
Exon junction bed files with the same sample name will be treated as replicates and merged into one.
Output formats
The output will be a zipped folder containing:
- index.html which summarizes the prediction results with visualization.
- result_prediction.txt which includes 10 columns representing the predicted maturation stage and corresponding prediction confidence score using both whole sets of developmentally regulated exons and RBP-Specific targets as reference.
- result_pca.txt which includes the PC1 and PC2 value used for both user-defined samples and reference cortex samples in the plot.
- dataPCA.png/pdf which is the PCA plot for user-defined samples based on reference samples with or without labels.