Role of
Alternative Splicing in Arabidopsis Immune Response
Steffen Heber (PI) and Paola Veronese
(Co-PI)
North Carolina State University
This material is based upon work supported by the National Science
Foundation under Grant
No. 0951512.
Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Science Foundation.
We
would like to thank NSF for their support during the past two
years.
Abstract
Alternative splicing is an
important mechanism of gene regulation that contributes to
transcriptome and proteome diversity. Large scale EST/cDNA based
studies suggest that alternative splicing affects at least 20% of
the genes in Arabidopsis thaliana, and its frequency in other
plant species may be even higher. However, in the vast majority of
cases, we do not know where and when splice variants are
expressed, or if they have a biological function. Several recent
reviews suggest the importance of alternative splicing in plants
as a mechanism for controlling both development and stress
adaptation. The goal of this project is to further investigate
this hypothesis by measuring the extent and functional
significance of alternative splicing in Arabidopsis thaliana
during its defense against the bacterial pathogen Pseudomonas
syringae pv tomato. The investigators will employ high-throughput
sequencing to query the expression of alternatively spliced genes,
determine the impact of alternative splicing on protein structure
and gene function, validate selected splice isoforms, and perform
functional analysis of selected genes. A critical component of
this project will be the development of new computational tools to
detect and quantify alternative splicing in high-throughput
sequencing data. So far, researchers might have avoided
investigating alternative splicing due to technical challenges
during data analysis; the tools developed during the course of
this project will help to overcome these difficulties. The
end-results of this study will contribute to a mechanistic
understanding of alternative splicing, and help to conceive new
strategies for stress tolerance improvement in crop plants.
Experiment
We subjected leaf tissue from
Col-0 Arabidopsis seedlings to one of three treatments: 1) mock
infection, 2) inoculation with virulent Pseudomonas syringae pv
tomato (Pst) DC3000 and, 3) inoculation with avirulent Pst DC3000
expressing the avirulence protein avrRps4. Leaves subjected to
mock infection are used as a control to identify genes
specifically regulated in response to pathogenic infection.
Arabidopsis plants can recognize avrRps4 via R proteins. This
makes Arabidopsis resistant against infection with the avirulent
Pst DC3000 strain. However, they are susceptible to infections
with the virulent Pst DC3000 strain that does not express avrRps4.
Two biological replicates of pooled leaf tissue were harvested
from each treatment at time points: 1, 6, and 12 hour post
inoculation. RNA was extracted, and subjected to paired-end
Illumina sequencing. The resulting reads were first aligned to the
TAIR 10 transcriptome. Reads that did not result in unspliced
alignments to an existing TAIR 10 transcript where subsequently
aligned to the TAIR 10 genome using spliced alignments. The results were processed to
detect the occurrence of novel genome features (i.e. splice sites,
AS events, transcripts, and genes), and to measure the expression
of all genes and transcripts that are stored in TAIR 10.
Results
Novel Splicing Events
In our experiment, more than
50% of the expressed genes showed evidence for transcript start,
or end modifications; approximately 40% of the expressed genes
showed evidence for novel alternative splicing events, see Table 1
for a breakdown.
Table 1. Novel AS events.
Event type
# novel events detected
# genes with novel events
splice junction
25,864
10,400
intron retention
14,934
7,755
cryptic intron
2,508
1,408
alternative 3'/5' splice site
14,428
5,813
cassette exon
507
492
cryptic exon
76
73
Our data also generated more
than 22,000 candidates for novel transcripts and 165 candidates
for novel genes. The evidence for the novel AS events along with
the candidate novel transcripts and genes proposed by Cufflinks
can be displayed with our AS
visualization tool.
Snapshot of our AS visualization tool
Differential Gene Expression and
Alternative Splicing
To characterize the
transcriptome changes that occur during the host/pathogen
interaction, we assessed the transcription levels of all TAIR 10
genes using three different approaches (edgeR [1], edgeRGLM [2],
and Cufflinks [3]). In general, the edgeR method
produced the most conservative gene lists. On average, 85% of the
genes detected by edgeR were also reported by Cufflinks, and 92%
of the edgeR gene list also occurred in the edgeRGLM gene list.
The other two methods produced longer gene lists, but with less
agreement between each other. Table 2 shows the number of
differentially expressed genes that were detected by at least one
method and by all three methods, respectively.
Table 2. Differentially expressed
genes.
1 hpi
6 hpi
12 hpi
one method
significant
all
methods significant
one method
significant
all
methods significant
one method
significant
all
methods significant
avirulent / mock
198
1
3881
1718
8392
984
virulent / mock
2481
283
2636
720
6056
699
avirulent / virulent
399
13
1382
133
1943
380
One of the main goals of our
study was to identify genes showing evidence for regulated
alternative splicing associated with the plant's defense response.
To achieve this goal, we assessed two additional types of
differential expression: differential isoform expression (DEI) and
differential alternative splicing (DAS). In the case of DEI, the
goal is to identify, for each multi-isoform gene, which specific,
individual transcripts show statistically significant changes in
expression between treatments. In the case of DAS, the focus
is on identifying multi-isoform genes where the relative ratio of
the transcript isoforms changes within each sample in response to
treatment.
For each of 5,885 multi-isoform
genes in TAIR 10, we tested for differential expression at the
level of individual isoforms. We compared three different
approaches. The first two approaches consider only reads that
align to the unique regions in each transcript. The
resulting read counts are then tested for differential expression
using the same EdgeR and EdgeR GLM frameworks we previously used
at the whole gene level. In addition, we also used the
Cufflinks software to test for differential isoform
expression. For each comparison, the EdgeR classic method
produced the smallest transcript list and the Cufflinks method
produced the largest transcript list. Table 3 shows the number of
differentially expressed isoforms that were detected by at least
one method and by all methods, respectively.
Table 3. Differentially
expressed isoforms.
1 hpi
6 hpi
12 hpi
one method
significant
all
methods significant
one method
significant
all
methods significant
one method
significant
all
methods significant
avirulent / mock
54
0
1167
206
2921
112
virulent / mock
961
19
901
110
2183
106
avirulent / virulent
274
1
207
6
723
35
To identify candidates for
regulated alternative splicing with possible biological relevance
to the defense response, we selected genes that exhibit both DEI
and DAS. We computed the change in
mixture percentage for each isoform across treatments and
identified DAS genes as cases where the 95% confidence interval
for this difference did not contain zero [4,5]. Table 4 shows the number of
differentially expressed isoforms which also exhibit DAS.
Table 4. Differentially
expressed isoforms with significant isoform mixture changes.
1 hpi
6 hpi
12 hpi
one method
significant
all
methods significant
one method
significant
all
methods significant
one method
significant
all
methods significant
avirulent / mock
0
0
38
12
75
16
virulent / mock
24
3
48
18
104
23
avirulent / virulent
10
0
10
0
21
2
References
[1] Robinson MD, McCarthy DJ, Smyth
GK. edgeR: a Bioconductor package for differential expression
analysis of digital gene expression data. Bioinformatics. 2010 Jan
1;26(1):139-40.
[2] Robinson MD, Smyth GK. Small-sample estimation of
negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008
Apr;9(2):321-32.
[3] Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G,
van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly
and quantification by RNA-Seq reveals unannotated transcripts and
isoform switching during cell differentiation. Nat Biotechnol. 2010
May;28(5):511-5.
[4] Howard BE, Heber S. Towards
reliable isoform quantification using RNA-SEQ data. BMC Bioinformatics. 2010 Apr
29;11 Suppl 3:S6.
[5] Brian E. Howard, Paola
Veronese, Steffen Heber. Improved RNA-Seq Partitions in Linear
Models for Isoform Quantification. BIBM 2011: 151-154.
Science in the Classroom
We have used our RNA-Seq data
set as a start point for a student research project in CSC 422/522: Automated
Learning and Data Analysis, and CSC 530: Computational Methods for
Molecular Biology. Here is a short project
description, and the
accompanying RNA-Seq read start data mining data
set.
RNA-Seq Reads. Black = Observed
count of reads at each position along the gene. Red=Expected
number of reads.
Publications
Brian E.
Howard, Paola Veronese, Steffen Heber. Improved RNA-Seq
Partitions in Linear Models for Isoform Quantification. BIBM 2011: 151-154
Brian E. Howard, Xiaoping Tan, Paola Veronese, Steffen Heber.
Workshop: Using a transcript catalog and paired-end RNA-Seq data
to identify differential alternative splicing. ICCABS 2011: 270
Howard BE, Heber S. Towards reliable isoform quantification
using RNA-SEQ data. BMC
Bioinformatics. 2010 Apr 29;11 Suppl 3:S6.