Splicescope Documentation

From Zhang Laboratory

Jump to: navigation, search

Introduction

Splicescope is a tool for predicting neuronal maturation based on the splicing profile as measured by RNA-sequencing (RNA-Seq). When the RNA-Seq data of a query sample is provided, Splicescope will determine its similarity to a reference and assign the query sample to the nearest maturation stage using an algorithm we developed for this purpose. We have demonstrated that this method can accurately predict maturation stages both from tissue samples derived from various brain regions and from cultured neurons differentiated from stem cells in vitro. In addition, Splicescope can evaluate the maturation stage by considering only alternative exons regulated by specific splicing factors that are important for developmental switches at different stages.

Our current version of the software support analysis of mouse data, and we plan to extend to other species in the near future.

More details about this work can be found in the following paper:

Weyn-Vanhentenryck, Feng, et al. Precise temporal regulation of alternative splicing during neural development. Accepted by Nature Communications

Versions

  • v1.0.1 ( 05-22-2018)
    • minor bug fix
  • v1.0.0 ( 05-08-2018 )
    • The initial public release

Software installation

Prerequisites

This software is implemented with Perl and R. We have tested the software on RedHat Linux, although it is expected to work on most Unix-like systems, including Mac OS X. The Splicescope package requires the following packages to be installed:

  • R (version 3.0.0 and higher).
  • R packages: ggplot2, betareg, and getopt.

The Splicescope software package can accept as input either an exon splicing matrix file or junction files in BED format from any RNA-Seq mapping software (e.g., Tophat).

  • To generate the exon splicing matrix, we recommend the OLego and Quantas pipeline we developed. This pipeline has been extensively used in our work and comes with the flexibility to deal with single- or pair-end, stranded or unstranded libraries.
  • To use junction bed files from other mapping software as input, install Quantas and download the mm10 annotation files. Please see http://zhanglab.c2b2.columbia.edu/index.php/Quantas for more details.

Splicescope installation and preparation

1. Install required R packages:

$R
>install.packages(c("betareg", "ggplot2", "getopt"))

2. Install Quantas and download mm10 annotation files. Please see http://zhanglab.c2b2.columbia.edu/index.php/Quantas for more detail.

3. Install the Splicescope package. Download the files from github (e.g. into ~/src) and run from that directory:

$R CMD INSTALL splicescope-v1.x.x.tar.gz

We assume the Splicescope source code is under ~/src/ for this documentation. If you would like to directly access the commandline version script (splicescope.R) decompress the source file:

$tar zxvf splicescope-v1.x.x.tar.gz

Usage

You can run the following command to show descriptions of arguments, input and output format.

$Rscript  ~/src/splicescope/splicescope.R

Arguments:

Argument Description
-b, bedfile Junction bed file
-n, samplenames Text file with all sample names (must be specified if using -b)
-s, splicingmatrix Text file with exon inclusion ratio matrix (cannot be specified if using -b)
-o, outputfile Output zip file (e.g. output.zip)
-q, Quantas dir The Quantas directory (e.g. /usr/local/src/quantas/countit/, must be specified if using -b)
-a, Annotation dir The annoation directory (e.g. /usr/local/src/mm10/, must be specified if using -b)

Options:

Options Description
-l, sample label Output pdf format plot with sample label([1-On])
-c, cache dir Path to write temporary file
-v, verbose Verbose mode
-h, help Print usage


Please note that these two input formats allowed by Splicescope can be specified by either -s (for splice matrix) or -q (for junction BED files), but not at the same time.


Usage 1: Input is a splicing matrix generated by the OLego and Quantas pipeline

The test data in the splicing matrix format can be downloaded from http://zhanglab.c2b2.columbia.edu/data/Splicescope/samplematrix.txt.gz

$Rscript  ~/src/Splicescope/splicescope.R [options] -s <in.splicingmatrix> -o <out.zip>

For an example, the first few lines of an input splicing matrix file is shown below (header line formatted for clarity):

event_id							Name			Sample1	Sample2	Sample3
CA-100036521-14294-117618-117680-170683[INC][40/1][DNT]		100036521//Gm16039	1	0.938	1
CA-100036521-117680-144789-144867-170683[INC][3/42][DNT]	100036521//Gm16039	0.05	0.014	0


The output is a compressed folder, which contains an html file summarizing the results.

Usage 2: junction bed files by OLego or other mapping software as input

The test data for junction bed files can be downloaded from http://zhanglab.c2b2.columbia.edu/data/Splicescope/samplebed.zip

$Rscript  splicescope.R [options] -q quantas/ -a mm10/ -b <a.bed>,<b.bed>... -n samplenames.txt -o <out.zip>

Note:

  • The option -q specifies the path to the Quantas package and -a specifies the folder containing the annotation files required by Quantas (only mm10 allowed for now).
  • The option -q specifies a list of input exon junction files, separated by comma (without space)

Example junction BED file:

chr1	3207264	3213485	JUNC00000001	1	-	3207264	3213485	255,0,0	2	53,47	0,6174
chr1	3216873	3421784	JUNC00000002	5	-	3216873	3421784	255,0,0	2	95,83	0,204828
  • The option -q specifies sample names corresponding to each exon junction bed file.

Example sample name file with two bedfiles, a.bed and b.bed, as input (columns are tab-delimited):

a	name1
b	name2

Exon junction bed files with the same sample name will be treated as replicates and merged into one.

Output formats

The output will be a zipped folder containing:

  • index.html which summarizes the prediction results with visualization.
  • result_prediction.txt which includes 10 columns representing the predicted maturation stage and corresponding prediction confidence score using both whole sets of developmentally regulated exons and RBP-Specific targets as reference.
  • result_pca.txt which includes the PC1 and PC2 value used for both user-defined samples and reference cortex samples in the plot.
  • dataPCA.png/pdf which is the PCA plot for user-defined samples based on reference samples with or without labels.