Role of Alternative Splicing in Arabidopsis Immune Response

    Steffen Heber (PI) and Paola Veronese (Co-PI)
North Carolina State University


NSF logo

This material is based upon work supported by the National Science Foundation under Grant No. 0951512.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

  We would like to thank NSF for their support during the past two years.


Abstract

Alternative splicing is an important mechanism of gene regulation that contributes to transcriptome and proteome diversity. Large scale EST/cDNA based studies suggest that alternative splicing affects at least 20% of the genes in Arabidopsis thaliana, and its frequency in other plant species may be even higher. However, in the vast majority of cases, we do not know where and when splice variants are expressed, or if they have a biological function. Several recent reviews suggest the importance of alternative splicing in plants as a mechanism for controlling both development and stress adaptation. The goal of this project is to further investigate this hypothesis by measuring the extent and functional significance of alternative splicing in Arabidopsis thaliana during its defense against the bacterial pathogen Pseudomonas syringae pv tomato. The investigators will employ high-throughput sequencing to query the expression of alternatively spliced genes, determine the impact of alternative splicing on protein structure and gene function, validate selected splice isoforms, and perform functional analysis of selected genes. A critical component of this project will be the development of new computational tools to detect and quantify alternative splicing in high-throughput sequencing data. So far, researchers might have avoided investigating alternative splicing due to technical challenges during data analysis; the tools developed during the course of this project will help to overcome these difficulties. The end-results of this study will contribute to a mechanistic understanding of alternative splicing, and help to conceive new strategies for stress tolerance improvement in crop plants.


Experiment

We subjected leaf tissue from Col-0 Arabidopsis seedlings to one of three treatments: 1) mock infection, 2) inoculation with virulent Pseudomonas syringae pv tomato (Pst) DC3000 and, 3) inoculation with avirulent Pst DC3000 expressing the avirulence protein avrRps4. Leaves subjected to mock infection are used as a control to identify genes specifically regulated in response to pathogenic infection. Arabidopsis plants can recognize avrRps4 via R proteins. This makes Arabidopsis resistant against infection with the avirulent Pst DC3000 strain. However, they are susceptible to infections with the virulent Pst DC3000 strain that does not express avrRps4. Two biological replicates of pooled leaf tissue were harvested from each treatment at time points: 1, 6, and 12 hour post inoculation. RNA was extracted, and subjected to paired-end Illumina sequencing. The resulting reads were first aligned to the TAIR 10 transcriptome. Reads that did not result in unspliced alignments to an existing TAIR 10 transcript where subsequently aligned to the TAIR 10 genome using spliced alignments. The results were processed to detect the occurrence of novel genome features (i.e. splice sites, AS events, transcripts, and genes), and to measure the expression of all genes and transcripts that are stored in TAIR 10.


Results

Novel Splicing Events

In our experiment, more than 50% of the expressed genes showed evidence for transcript start, or end modifications; approximately 40% of the expressed genes showed evidence for novel alternative splicing events, see Table 1 for a breakdown.


Table 1. Novel AS events.

Event type
# novel events detected
# genes with novel events
splice junction
25,864
10,400
intron retention
14,934
7,755
cryptic intron
2,508
1,408
alternative 3'/5' splice site 14,428
5,813
cassette exon 507
492
cryptic exon 76
73


Our data also generated more than 22,000 candidates for novel transcripts and 165 candidates for novel genes. The evidence for the novel AS events along with the candidate novel transcripts and genes proposed by Cufflinks can be displayed with our AS visualization tool.

Snapshot of our AS visualization tool
AS event viewer


Differential Gene Expression and Alternative Splicing

To characterize the transcriptome changes that occur during the host/pathogen interaction, we assessed the transcription levels of all TAIR 10 genes using three different approaches (edgeR [1], edgeRGLM [2], and Cufflinks [3]). In general, the edgeR method produced the most conservative gene lists. On average, 85% of the genes detected by edgeR were also reported by Cufflinks, and 92% of the edgeR gene list also occurred in the edgeRGLM gene list. The other two methods produced longer gene lists, but with less agreement between each other. Table 2 shows the number of differentially expressed genes that were detected by at least one method and by all three methods, respectively.


Table 2. Differentially expressed genes.

1 hpi
6 hpi
12 hpi

one method significant
all methods significant
one method significant
all methods significant
one method significant
all methods significant
avirulent / mock 198
1
3881
1718
8392
984
virulent / mock
2481
283
2636
720
6056
699
avirulent / virulent
399
13
1382
133
1943
380

One of the main goals of our study was to identify genes showing evidence for regulated alternative splicing associated with the plant's defense response. To achieve this goal, we assessed two additional types of differential expression: differential isoform expression (DEI) and differential alternative splicing (DAS). In the case of DEI, the goal is to identify, for each multi-isoform gene, which specific, individual transcripts show statistically significant changes in expression between treatments.  In the case of DAS, the focus is on identifying multi-isoform genes where the relative ratio of the transcript isoforms changes within each sample in response to treatment.

For each of 5,885 multi-isoform genes in TAIR 10, we tested for differential expression at the level of individual isoforms. We compared three different approaches. The first two approaches consider only reads that align to the unique regions in each transcript.  The  resulting read counts are then tested for differential expression using the same EdgeR and EdgeR GLM frameworks we previously used at the whole gene level.  In addition, we also used the Cufflinks software to test for differential isoform expression.  For each comparison, the EdgeR classic method produced the smallest transcript list and the Cufflinks method produced the largest transcript list. Table 3 shows the number of differentially expressed isoforms that were detected by at least one method and by all methods, respectively.


Table 3. Differentially expressed isoforms.

1 hpi
6 hpi
12 hpi

one method significant
all methods significant
one method significant
all methods significant
one method significant
all methods significant
avirulent / mock 54
0
1167
206
2921
112
virulent / mock
961
19
901
110
2183
106
avirulent / virulent
274
1
207
6
723
35

To identify candidates for regulated alternative splicing with possible biological relevance to the defense response, we selected genes that exhibit both DEI and DAS. We computed the change in mixture percentage for each isoform across treatments and identified DAS genes as cases where the 95% confidence interval for this difference did not contain zero [4,5]. Table 4 shows the number of differentially expressed isoforms which also exhibit DAS.


Table 4. Differentially expressed isoforms with significant isoform mixture changes.

1 hpi
6 hpi
12 hpi

one method significant
all methods significant
one method significant
all methods significant
one method significant
all methods significant
avirulent / mock 0
0
38
12
75
16
virulent / mock
24
3
48
18
104
23
avirulent / virulent
10
0
10
0
21
2

References

[1] Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40.

[2] Robinson MD, Smyth GK. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008 Apr;9(2):321-32.

[3] Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5.

[4] Howard BE, Heber S. Towards reliable isoform quantification using RNA-SEQ data. BMC Bioinformatics. 2010 Apr 29;11 Suppl 3:S6.

[5] Brian E. Howard, Paola Veronese, Steffen Heber. Improved RNA-Seq Partitions in Linear Models for Isoform Quantification. BIBM 2011: 151-154.



Science in the Classroom

We have used our RNA-Seq data set as a start point for a student research project in CSC 422/522: Automated Learning and Data Analysis, and CSC 530: Computational Methods for Molecular Biology. 
Here is a short project description, and the accompanying RNA-Seq read start data mining data set.

ReadStarts

RNA-Seq Reads. Black = Observed count of reads at each position along the gene. Red=Expected number of reads.




Publications

Software

The source code for the project is available at http://sourceforge.net/projects/iqowls/.