RIPSeeker: a statistical package for identifying protein

RIPSeeker: a statistical package for identifying
protein-associated transcripts from RIP-Seq experiments
Yue Li1, Dorothy Yanling Zhao 2, Jack Greenblatt2,3 and Zhaolei Zhang1,2,3
Department of Computer Science, 2Department of Molecular Genetics, 3Banting and Best Department of Medical Research, University of Toronto
Molecular Cell Molecular Cell
Molecular Cell
Molecular
Cell
e Polycomb-Bound RNAs
Noncoding RNA (ncRNA) biogenesis (Left; Mercer et al. 2009); long ncRNA
partners with polycomb repressive complex 2 (PRC2) in gene regulation (Right;
PROGRESS
INSIGHT
REVIEW
Margueron et al. 2011)
RIPSeeker workflow:
Bidirectional transcript
0
ncRNA binding
5 kb
Pax6
DNA
binding
EZH1/2
3UTR associated transcript
SET
JARID2
EED
PTM
binding
SUZ12
Gapped-­‐
Alignment Object Molecular Cell
Automa7c bin size selec7on chr1 Y
Y
Y
Y
Y
YY
Y
Y
Y
Y
Markers
Markers
Markers
C
C
adaptor
5’, 3’
ligation
PCR
adaptor
Inferring RIP regions, p(z |X), using two-state HMM with negative binomial
(NB) emission probability p(x |z = k) = N B(a , b ) on automatically discretized chromosome sequences:
RT
cDNA
synthesis
5’, 3’
Illumina
adaptor
sequencing
ligation
5’, 3’
adaptor
ligation
PCR
PCR
11
12
11
WT Libraries
RIP-seq
lot library
Technicalstatistics
replicate
2,857,116 remaining
356,435
98,704Number
74,305of reads
14,886
58,099
Total
(nonrepetitive
fraction
200 5’
3’
5’
3’
5’
3’
Biological
replicate
1,913,612
231,880
87,128
63,958
12,085
47,108
WT RIP-seq
301,427
231,104
50,445
182,538
3,316,367
1,174,808
VOLUME
10
|
MARCH
2009
|
157
Antisense Unannotated
after Reads distinct
Pilot
Ezh2-/controlreplicate 1,491,715
73,691
17,424
13,86574,305 2,998
11,294
Zd`ccXeGlYc`j_\ijC`d`k\[%8cci`^_kji\j\im\[
Technical
2,857,116
356,435 reads
98,704
14,886
58,099
RNA
braries
RNA
ncRNA
filtering
reads
Total
remaining
d
IgG Biological
control replicate 486,315
1,913,612 4,888231,880 1,05087,128973 63,958 191 12,085 783 47,108
Rulebased
5
4
3
2
1
0
80
60
40
20
0
15000
10000
5000
0
RIPSeeker
MACS
QuEST
HPeak
Cuffdiff
Rulebased
100
80
60
40
20
0
80
60
40
20
0
1500
1000
500
0
MACS
QuEST
HPeak
Cuffdiff
Rulebased
protein_coding (22702)
none_protein_coding (23500)
lincRNA (1637)
pseudogene (542)
retrotransposed (263)
antisense (1389)
retained_intron (5213)
non_coding (15)
sense_intronic (81)
RIPSeeker
bivalent_mm8liftOver2mm9 (2704)
imprint (91)
lincRNA_Guttman2011 (2127)
PRC2_binding_sites_mm8liftOver2mm9 (1805)
lincRNA_PRC2_Guttman2011 (34)
oncogenes (535)
tumor_suppresors (973)
RIPSeeker
Detecting known PRC2-ncRNA: Xist, Kcnq1ot1, Meg3
11
22
22
Biological
replicate
1,913,612
231,880
87,128
63,958
12,085
Total 0.90
maining
Technical
replicate
Libraries
CC
lenges. Currently, no statistical tool is dedicated to RIP-Seq analysis.
raction
only)
Unannotated
Antisense
30
distinct 0.71
er
Ezh2-/control
1,491,715
73,691
17,424
13,865
2,998
Biological
replicate
60
WT
RIP-seq
1.00
RNA
RNA
ncRNA
Ezh2-/- control
reads 0.27
ering
gG control
486,315
4,888
1,050
973
191
V. Conclusion:
Pair-wise comparison of shared peaks:
Cuffdiff Rulebased HPeak RIPSeeker MACS QuEST
Cuffdiff
100%
2%
4%
4%
4%
2%
Rulebased
23%
100%
28%
26% 23%
12%
Antisense Unannotated
distinct
after
Pilot
T RIP-seqEzh2-/- control
182,538
3,316,367 1,491,715
1,174,808
HPeak
26%
20% 100%
26% 19%
3%
73,691 301,427
17,424231,104
13,865 50,445
2,998
11,294
RNA
Libraries
RNA
ncRNA
reads
filtering
reads
IgG
control
486,315356,435 4,888 98,7041,050 74,305
973
191
783
chnical
replicate
2,857,116
14,886
58,099
RIPSeeker
45%
32%
55%
100% 40%
6%
II.
Motivation:
E
F
G
ological replicate
1,913,612
231,880 90
87,128
63,958
12,085
47,108
WT RIP-seq
301,427
231,104
50,445
182,538 MACS
54%
41%
39%
58% 100%
12%
3,316,367
1,174,808 fraction
of
reads
(nonrepetitive
only)
Libraries NumberCC
LINE
measures genome-wide
protein-RNA
interactions.
Despite similarity
h2-/- controlRIP-Seq1,491,715
73,691
17,424
13,865
2,998
11,294
Technical
replicate
2,857,116
356,435
98,704
74,305
14,886
58,099
LTR
QuEST
81%
69%
64%
71%
71%
100%
E
ads
F
G
60
G control
486,315
973 unique191
783 chalshared with
ChIP- and4,888
RNA-Seq, RIP-Seq
properties and
WT RIP-seq
1.00
901,050 presents
of all reads
Cuffdiff
21
21
22
HPeak
12
12
21
QuEST
Total number of peaks
of reads (nonrepetitive
fraction
only)
Unannotated
Antisense
distinct
after Number
Pilot
301,427
231,104
50,445RNA 182,538
3,316,367 Reads
1,174,808
RNA
ncRNA
reads
filtering
reads
longer than 200 nucleotides that have little or
X chromosomes in female mammals is inactivated.
-coding capacity. Long ncRNAs can regulate
X inactivation occurs so that females produce the same
3 4 6 | N AT U R E | VO L 4 6 9 | 2 0 JA N UA RY 2 0 1 1
ession through a diversity of mechanisms.
dosage of gene products from the
chromosome
as males.
©X2011
Macmillan
Publishers Limited. All rights reserved
ulative % of all reads
K4K36PolII
Total peaks identified by comparison methods
(-)
ot library statistics
intergenic
IV. Results: analyzing PRC2 RIP-seq dataset
Markers
D
5%
PCR
Gel purification
PCR
GelComparison
purification
of RIP-Seq with ChIP-Seq
and RNA-Seq:
PCR
h2
C
cation
cation
α-Ezh2
h2
D
T
noncoding
Gel purification
ligation
C
Markers
Y
WT
MarkersMarkers
Markers
Y
Ezh2-/WT
Markers
Ezh2-/-
WT
Ezh2-/-
C
Illumina
sequencing
WT
Markers
Ezh2-/WT
Markers
MarkersWT
Markers
Ezh2-/-
Ezh2-/-
WT
WT
Bioinformatic
5’, 3’
analysis
Y
Bioinformatic
purification
analysis
YY
B
PCR
testing
PCR
PCR
5’, 3’
adaptor
ligation
Y
Y
B
cDNA
RNA
extraction
synthesis
Y
Y
YY
Y
5’, 3’
RT
5’, 3’
adaptor
5’,
3’
adaptor
ligation
adaptor
NA Immuno- ligation
cDNA
YRT
xtraction
ligation
precipitation
Experimental
synthesis
RT
cDNA
validation;
synthesis
RT
Functional
10%
Features
Ribonucleoprotein ImmunoPrecipitation (IP) followed by high-throughput
Sequencing (RIP-Seq) (Zhao et al., 2010):
Y
coding
Feature%
RNAs
Genome-wide
Polycomb-Bound
Genome-wide
Polycomb-Bound
RNAs
RNAs Polycomb-Bound RNAs
Genome-wide Polycomb-Bound
Genome-wide
cDNAY
Y
synthesis
threeUTR
15%
SigTest
AEBP2
Several publications have reported the genome-wide localization of
RbAp46/48
bin count Figure 1 | Genomic organization of coding and non-coding transcripts. of long non-coding transcripts (orange) that are associated with paired
box
Yes H3K27me3
in
various
cell
lines
and
organisms,
with
some
divergent
Nature
Reviews
|
Genetics
posterior d
ecoding f
rom H
MM Intronicdiagram
transcript
Schematic
illustrating the complexity of the interleaved networks gene 6 (Pax6; purple).
5 kb
results depending on the methodology used and0 the model analysed.
A conservative estimation is that PRC2 targets represent at least 10%
of the genes
intoESrecruit
cells43
. PRC2
specifically
at —an
and
targets for
Pax6
ncRNAs
RNA
binding
proteins, resides
evolution;
observation
supported by the
antisense to introns, and they could similarly
34
Histone
one ofdeposition
the largest protein
the mamfinding that
functional
repeat sequence
regulate splicing
H3K27me3
— theclasses
Hox in
genes
and numerous
genes
encoding
PTM.
44–46
malian proteome,regulators
to gene promoters
hugely
domains
are a common
characteristic of
Alternatively,
the annealing of ncRNAbinding
binding
other
developmental
.
Interestingly,
in
human
cancer
cells,
-associated transcript expands the regulatory repertoire
3UTR
associated transcript
Mul7hits HMM posterior decoding and NB NB mixture model (G+C)-rich
available
several known long ncRNAs8.
can
target protein effector complexes to the
the PRC2
component
SUZ12
is
mainly
enriched
at
the
promoters
of
genes
No DNA mRNA
bindingtranscript in a manner analoto the transcriptional programme29.
sense
exists? parameter op7miza7on ini7aliza7on 47
encodingLong
glycoprotein
and
proteins . Further
ncRNAs also
act immunoglobulin-like
as co-factors
Post-transcriptional
regulation. The abilgous to the targeting of the RNA-induced
transcripts. studies
of longare
non-coding
transcripts
(orange)
that
areT.this
associated
with
paired
Mercer,
R., Dinger, M. E.,
& Mattick,
J. S.box
(2009).
Figure 3 | Thesilencing
many interactions
of PRC2
with chromatin.
Schematic
required
to determine
whether
a Reviews
consequence
of
to modulate
transcription
factor activity.
ity
ofisncRNAs
to recognize
complementary
complex (RISC)
to mRNAs
by
Nature
|
Genetics
2
ved networks thegene
6 (Pax6;
representation
of the PRC2
at chromatin.
For example,
in mice, thealterations
ncRNA Evf2of
is cancer
sequences
allows highly
interac- siRNAs.
RNA holoenzyme
duplexes resulting
from the Putative interactions with
genetic
andpurple).
epigenetic
cells oralso
whether
it isBackground
aspecific
either
DNA orannealing
histones that
could explaintranscripts
PRC2 recruitment
are highlighted.
transcribed
from an ultraconserved
tions that are amenable to regulating
various
of complementary
or
reflection
of the cancer-cell
origin. distal
•  export as wig, bed, etc Nuclear
enhancer and recruits
the binding
and in H3K27me3
steps in the post-transcriptional
even of long ncRNAs
with extended internal
Protein
In Drosophila,
domains
enriched
were found processing
•  (live) visualiza7on Viterbi predic7on Detect RIP action of by
the the
transcription
factor DLX2
to
of mRNAs,
including
their splicing, editing,
hairpins can be processed into endogenous
n; an
observation
supported
antisense
to introns,
and
they
could
similarly
lysate
to
cover
large
regions
of
the
genome,
usually
exceeding
10
kiloPRC2
recruitment
Nuclear
A gene
beads
GRanges Object Nuclear
this
same
expression
translation and degradation.
siRNAs to
silence
expression, raising
Protein
α-Ezh2
48,49 enhancer to induce
•  (live) annota7on of enriched bins regions Protein
hat functionalNuclear
repeat
sequence
regulate
splicing 34. transport,
Protein
RNA
bases
(kb)
.
In
mammals,
two
Exactly
how
mammalian
PRC2
is
recruited
to
chromatin
is
not
clear.
30different types of binding pattern
of adjacent protein-coding
genes (FIG. 2c).
Most mammalian genes express antisense
the possibility that many long ncRNAs feed
RNA response eleysate
cDNA
lysate
lysate
Antibody
Immunoare a common
characteristic
of
Alternatively,
the
annealing
of
ncRNA
A
beads
•  other R func7ons 26
A
beads
A
beads
have
been
reported
for
PRC2
or
H3K27me3:
some
very
large
domains
In
Drosophila,
DNA
sequences
known
as
Polycomb
Many
similar
enhancers are transcribed
transcripts,
which might constitute a class
into RNA silencing pathways .
α-Ezh2
α-Ezh2
α-Ezh2
RNA
RNA
RNA
8
RNA
nown long ncRNAs
.in cells
can
target
protein
complexes
toImmunothe
RNA
RNA
cDNA cDNA
cDNA
Antibody
which
they
are
active
— this
couldeffector
of ncRNA
that
particularly
adept
at
There
are probably
many
other
functions
Antibody
Antibody
of more
thanin 100
kb
such
as
those
containing
the
HoxisImmunoloci,
andprecipitation
ments
(PRE) are
targets
for PcG
protein
recruitment
when inserted
extraction
synthesis
ImmunoRT
incubation
RNA
Antibody
Immuno34
41,45,47,50
3,6
Y
Y
sense
mRNA
transcript
in
a
manner
analobe a general
strategy for
regulating
the kilobases
regulating
mRNA
dynamics . at exogenous
ofloci
long ncRNAs
awaiting
discovery.
For
Y Y
some smaller
domains
covering
a few
. H3K27me3
. Genetic
experiments
led to the identification
extraction
synthesis
5’, 3’
RT of RT
extraction
incubation
synthesis
RT
extraction
precipitation
incubation
synthesis
27
precipitation
incubation
extraction
Incubation
precipitation
precipitation
expression
of
key
developmental
genes
.
Antisense
ncRNAs
can
mask
key
cisexample,
the
ncRNA
NRON
has
been
shown
nscriptional regulation.
The
abil- to be centred
gous to around
the targeting
of the RNA-induced
i
5’, 3’ 5’,5’,
enrichment
seems
the transcription
start site of DNA-binding proteins that are required for PcG binding; however,
3’
3’
Long
ncRNAs
can
regulate
RNA
elements
in
mRNA
by
the
formation
of
RNA
to
regulate
the
nuclear
trafficking
of
the
41,51 by
adaptor
RNAs to recognize
complementary
silencing
complex
(RISC)
to itself
mRNAs
promoters,
but
with
a
lower
intensity
at
the
start
site
(Fig.
2).
genome-wide
analysis
showed
that
36any one of these trans-acting factors
adaptor
polymerase (RNAP) II activity through
duplexes,
the case of the Zeb2 (also
transcription factor NFAT , and the obseradaptor
34,41 as infrom
i i
k k
adaptor
s also allows Some
highlyH3K27me3
specific
interacsiRNAs.
RNA
duplexes
resulting
the
is
found
at
intergenic
regions
,
and
H3K27me3
only
partially
overlaps
with
PcG
target
genes.
Instead,
it
is
thought
that
other
mechanisms,
including
by
interaction
called
Sip1)
antisense
RNA,
which
complevation
that
many
long
ncRNAs
are
located
in
RNA-protein
ligation
RNA-protein
ligation ligation
Protein
RNA-protein
t are
amenable
to
regulating
various
annealing
of 52
complementary
transcripts
or
4
is
enriched
in
subtelomeric
regions
and
in
long-terminal
repeat
a
combination
of
these
factors
might
be
responsible
for
the
recruitment
with
the
initiation
complex
to
influence
ments
the
5`
splice
site
of
an
intron
in
the
the
cytoplasm
suggests
that
they
might
have
RNA-protein
ligation
complex
53
hecomplex
post-transcriptional
processing
even
of
long
ncRNAs
with
extended
internal
promoter
choice.
For
example,
in
humans,
5`
UTR
of
the
zinc
finger
Hox
mRNA
Zeb2
undiscovered
roles
in
cell
biology.
retrotransposons
of PcG proteins.
complex .
A
beads
complex
zh2
a ncRNAediting,
transcribed
from
an
upstream
(REF. 35)
.gene-expression
Expression
of the ncRNA prevents
As, including their
hairpins
can
be processed
into
endogenous
cDNAin C+G,
To splicing,
understand
how PRC2
can
maintain
specific
In mammals, PRC2-targeted sequences are highly enriched
5’ 11111111122222211112222111111122221111…12211111 3’
region of the dihydrofolate
reductase
theexpression,
splicing of anraising
intron that contains an
significance
t, translation patterns,
and degradation.
siRNAs
to silence gene
the overall chromatin
structure,
in addition
to H3K27me3
most of themMedical
being classified
as CpG islands, but these sequences
alone
synthesis
(DHFR) locus forms a triplex 54
in the major
internal
entry site required for
There is increasing interest
the potential
PCR
5’, 3’in adaptor
8
mmalian genes
express
antisense
the
possibility
that many
longribosome
ncRNAs
feed
patterns,
should
be
considered
.
This
issue
has
generated
a
great
deal
do
not
indicate
a
consensus
response
element
. Recently, two publicaExperimental
detect RIP regions
promoter of DHFR to prevent the binding
efficient translation
and expression of the
involvement of ncRNAs in disease aetiology,
3’
26
5’
Experimental
amplification
ligation
Illumina
ts, which might
constitute
a class
into
silencing
pathways
. Bioinformatic
31
of attention
in
the context
of RNA
ES-cell
differentiation
(Fig.
2).2e)
ES. This
cellssets a tions
identified
a mammalian
PRE
on
the basis
of the transcriptional
co-factor
TFIID
ZEB2 protein
(FIG.
precedent
owing
to
aberrant
function
of ncRNAs
in of PcG complex recruitPCR
hidden
validation;
Bioinformatic
Illumina
9,63
Experimental
A that is particularly
adept
Therevalidation;
are
probablychromatin
many
other
functions
PCR
are characterized
a more
open
and
flexible
organization
ment
in
Drosophila
.
Both
reports
suggested
an
important
role
for
(FIG.
2d).at
This by
could
be a widespread
mechafor
ncRNAs
in
directing
the
alternative
splicdifferentiation
and
developmental
processes.
analysis
sequencing
a
a
a
states:
Bioinformatic
Illumina
34nism for controlling promoter usage as thouk=1
ing
ofdiscovery.
mRNA to
isoforms.
Indeed, aYY1,
number
The abilitysequencing
of
ncRNAs to regulate
associa
a
a
g mRNA dynamics
.
long ncRNAswhich
awaiting
ForimporFunctional
analysis
and a higher
overall
rate ofoftranscription,
is thought
be
theofmammalian
orthologue
of the Drosophila
PRE DNA-binding
PCR
validation;
a
a
a
55
64
sands
of triplex
exist the
in
eukaryotic
studies
have
noted
the
prevalenceprotein
of ncRNAs
ated
protein-coding
genes might .contribute
…
…
Functional
ense ncRNAstant
can for
mask
key
cis- 32 structures
example,
the
ncRNA NRON
has
been
shown
pluripotency
. Notably,
H3K4me3
mark,
often
associated
PHO,
as
previously
proposed
RYBP,
a
protein
that
interacts
analysis
sequencing
a
a
a
Automatic
binning
testing
Gel
purification
chromosomes .
k=2
in mRNA bywith
the formation
of RNA
to regulate
the
nuclear
trafficking
of the
active
transcription,
was
present
at
most,
if
not
all,
PRC2-targeted
with
both
YY1
and
PRC1,
was
shown
to
be
required
for
PRC1
and
PRC2
Functional
of
read
counts
Gel purification
Long ncRNAs can also effect testing
global
39,41,43,51,56
63
36
, as in the casegenes
of theinchanges
Zeb2
(also
transcription
factor
NFAT
,
and
the
obserES cells,
forming
the
‘bivalent
domain’
.
Although
this
recruitment
.
Yet
genome-wide
analysis
in
mammals
did not show a
Glossary
by interacting with basal compo56
testing
Gel65purification
was of
initially
believed
to be
ES-cell
, bivalent
domains
clear overlap MicroRNA
between
YY1 and PcG target genes
. Moreover, PRC2 is
p1) antisense pattern
RNA, which
complevation
that
many specific
long Adaptive
ncRNAs
are located
in
nents
the RNAP
II-dependent
transcripGel
RIP
radiation
z1
z2
zi-1
zi
zi+1
zN
8
4
hidden
…
…
found
differentiated
somatic
cells,
albeit
atmorphological
a lowerhave
freunder-represented
at YY1
response
.
Hence,
so
far,
there
is
tion machinery.
ncRNAs
interact
with
e 5` splice sitehave
of anbeen
intron
in theinpurification
thethat
cytoplasm
suggests
that
they
might
Evolution
of
new
or functional
Single-stranded
RNAs of
approximatelyelements
21–23
RIP
39,43 II machinery are typically transcribed 57 characteristics in lineages that diversify in Ab
variables:
IgG
α-Ezh2
α-Ezh2
α-Ezh2
response
to
nucleotides
that regulate
gene expression
by partial
quency
; theyZeb2
were alsoundiscovered
found in zebrafish
but
are
rarely
detected
no
strong
evidence
for
the
involvement
of
transcription
factors
in
the
f the zinc finger
HoxRNAP
mRNA
roles in cell
biology.
environmental changes or to enable colonization of new
complementary base pairing to specific mRNAs. This
58III, thereby decoupling their
by RNAP
WT
Ezh2-/WT α-Ezh2
(-) α-Ezh2
Ab IgG
Drosophila
. Another histone species withecological
seemingly
disparate Cells
recruitment
of
PRC2
in mammals.
α-Ezh2
Expression ofinthe
ncRNA
prevents
niches.
annealing
inhibits
protein translation
and can also facilitate
RIP
x1
x2
xi-1
xi
xi+1
xN
expression from the RNAP II-dependent
ntH2Az,
degradation
oflong
the target
mRNA.RIP
observed
functionality
that
co-localizes
with
PRC2
is
the
histone
variant
On
the
other
hand,
ncRNAs
are
becoming
recognized
as
imporng of an intron
that contains
an
Medical
significance
Cells WT
Ezh2-/WT
(-)
transcription reaction they regulate.
For
Epigenetic
Next-gen
Collect
read
variables:
Ab
3’
IgG
α-Ezh2
α-Ezh2
α-Ezh2
5’
which
is
usually
active
genes
(Fig.
2).
Indeed,
PRC2
participants
in PRC2 function. In mammals, X-chromosome
ribosome entry
site required
forassociated
increasing
interest
inchanges
the
potential
Heritable
in phenotype
caused
bytant
mechanisms
Transvection
example,
Alu
elements There
thatwith
areistranscribed
nt
Ab sequences
IgG
α-Ezh2
α-Ezh2
α-Ezh2
sequencing
1,200
- Such
outside
of the genomic
sequence.
changes
might
Apparent
cross-talk
alleles on homologous
and
H2Az
co-localize
in involvement
undifferentiated
ES cells,
and
their
recruitinactivation
initiates
the between
expression
of
a
17-kb
ncRNA,
translation and
expression
of the
of ncRNAs
in disease
aetiology,
in response
to heat shock
bind tightly to
Cells
WT
Ezh2-/WT
(-)
RIP-­‐Seq Data XIST, which
remain through cell divisions during, for example, cellular
chromosomes, in which complementation is observed
59
II to preclude the
formation
of active
isRNAP
interdependent
.owing
The
apparent
contradiction
the
presence
coats
the
X chromosome
in cis.
Coating with
XIST RNA(-)
leads to a
Cells
WTpromoter mutations
Ezh2-/WT
otein (FIG. 2e).ment
This sets
a precedent
to aberrant
function
of in
ncRNAs
in
differentiation,
or nt
they might
persist through
subsequent
between
in one allele and structural
1,200
33
Analysis preinitiation
complexes
. Alu
Epigenetic
changes
include chromatin
mutationsof
in the
other. Transvection
can cause either
gene
of either
H3K4me3
or H2Az
withelements
H3K27me3
atgenerations.
the
of silent
marked alteration
chromatin
structure
characterized
by a progreskDpromoters
NAs in directing
the alternative
splicdifferentiation
and100
developmental
processes.
nt
contain
modular
domains
that
can
indemodifications, such as histone acetylation, or chemical
activation or repression.
75
kD
genes
in ES
cells might
reflect
the
necessary
plasticity
ofto these
cells,
but
sive heterochromatinization.
The inactive X chromosome becomes
RNA isoforms.
Indeed,
a
number
of
The
ability
of
ncRNAs
to
regulate
associalterations
the
DNA
itself,
such
as
DNA
methylation.
pendently mediate polymerase binding and
500
66
1,200
- and methylated atX H3K27
100
kDcontribute
also
result
in
partial
leakiness
of
gene
silencing.
That
PRC2
in
an
XIST-dependent
manner
. The two long
chromosome
inactivation
ave noted thecould
prevalence
of
ncRNAs
ated
protein-coding
genes
might
repression. In light of their abundance
Long
A process in which
one of theby
two the
copiesA
of the
75
kD methylation
H2Az co-localize
is consistent
with the
low levels
ofncRNA
DNA
stem–loop structures
formed
repeats present 5ʹ in the XIST
1,200
and distribution
in the mammalian
genome,
500
Transcripts longer than 200 nucleotides that have little or
X chromosomes in female mammals
60,61
67,68is inactivated.
at PcG these
target
genes in
ES cells
, given
evolutionarily
conserved
RNA
interactXwith
PRC2
vitro
, although
functional
domains
might have
no protein-coding capacity.
Long ncRNAs can
regulate
inactivation
occursin
so that
females produce
the samefurther regions of XIST
100thekD
62
gene expression
through a diversity of mechanisms.
dosage of gene
products an
fromXIST
the X chromosome
as males.
been shown
co-optedby
into
other ncRNAs
during
exclusivity
H2Az
and DNA
methylation
.
are clearly involved
because
transcript
in which the A repeats
ry
75
kD
RIP
200
100 ofkD
After ES-cell differentiation,
a substantial fraction
bivalent
can still recruit PRC2 to the XIST RNA-coated X chromo500
-domains are deleted
radiation
MicroRNA
43,60,61
69
ChIP-Seq
RNA-Seq
RIP-Seq
thatorlose
H3K4me3
and
H2Az
doblot
gain DNA
methylation
. Notably, some . Similarly, the long
ncRNA
KCNQ1OT1 can mediate PRC2
75
kD
Coomassie
stain
Western
of new morphological
functional
Single-stranded
RNAs
of
approximately
21–23
AbREVIEWS
NATURE
|
GENETICS
VOLUME
10 | MARCH 2009 | 157
IgG
α-Ezh2
α-Ezh2
α-Ezh2
200
500 - spreading in cis, thereby maintaining the imprinted expression of the
genes
enriched
in to
both H3K27me3
anotherbymark
stics in lineages that
diversify
in response
nucleotidesand
that H3K9me3,
regulate gene expression
partialassoci73835 231524
Ÿ)''0DXZd`ccXeGlYc`j_\ijC`d`k\[%8cci`^_kji\j\im\[
50000
70 3’
5’ This (-)
5’
3’
Cells
WT
Ezh2-/WT
5’
ntal changes or toated
enable
colonization
of
new
complementary
base
pairing
to
specific
mRNAs.
3’
with
repression, are
more abundant
in human fetal3’lung fibro- KCNQ1 domain . Long ncRNA3’ could also promote PRC2 binding in
ds gene
DNA
Coomassie
stain
Western
blot
5’
5’
3’
50
71,72
niches.
annealing
inhibits.protein
translation
and canthe
alsoauthors
facilitate
5’
blasts
(IMR90)
than
in
human
ES
cells
In
this
same
study,
trans
as
shown
for
the
RNA
HOTAIR
,
the
expression
of
which
from
40000
nt
degradation of the target mRNA.
200
showed that H3K27me3 domains are more extended in IMR90 cells or the HOXC locus is associated with repression of 40 kb of the HOXD
c
+
30000
Biorep1
transcription
CD4
T cells
than in ES cells,
and that H3K27me3 domain expansion
locus. Such mechanisms could betranscription
common to a large fraction of long
hanges in phenotype
caused
by mechanisms
Transvection
50
assie
stain
Western
blot
Pilot
statistics
Biorep2
1,200
- library
- ncRNAs73of
correlates
with
more
transcriptional
silencing
. Altogether,
these Number
. Inreads
light of these
results, ncRNA fraction
seems to beonly)
a strong candithe genomic
sequence.
Such changes
might efficient
Apparent
cross-talk between
alleles on
homologous 200
20000
(nonrepetitive
ough cell divisionsresults
during, for
example, that
cellular
in which
complementation
is observed
indicate
somaticchromosomes,
cells reinforce
gene
silencing by
increasing date for PRC2 recruitment.
Reads
5’
5’
3’ one allele
tion, or they mightthe
persist
through
subsequent
between promoter
mutations
in
and structural
RNA
RNA 5’ we propose a model in which the
10000
ds
DNA
length
of
H3K27me3
domains
and,
for
a
fraction
of
PRC2-targeted
Considering
this
information,
massie
stain
Western
blot
Pilot
library
statistics
3’
s. Epigenetic changes include chromatin
mutations in the other. Transvection
can cause either gene Total
5’
remaining
reads
(nonrepetitive
fraction
genes,
by
complementary
silencing
pathways
(H3K27me3
together with sumNumber
of relatively of
weak
interactions
or low energy steps
that areonly)
estab781 781
ons, such as histone acetylation, or chemical
activation or repression.
or DNA methylation).
someReads
pluripotency distinct
lished by each of the PRC2 holoenzyme
components
would function
MACS
QuEST
HPeak
Cuffdiff
Rulebased
RIPSeeker
Unannotated
Antisense
to the DNA itself,H3K9me3
such as DNA methylation.
PilotNot surprisingly,after
500
- the expression of which
factors,
could be
deleterious in differentiated together to attain the necessary energy to recruit PRC2 (Fig. 3). This
X chromosome
inactivation
RNA
Libraries
RNA
ncRNA
reads
filtering
reads
Total
NA
A process infashion
which one50of
of the
cells, are silenced in this redundant
. the two copies remaining
model
predicts up to four steps, not necessarily consecutive, that result
AA
NA
NA
thesis
DNA
nthesis
ynthesis
fiveUTR
20%
MACS
Cell
Genome-wideMolecular
Polycomb-Bound
RNAs
Disambiguate mul7hits using PCL
intron
0%
Cell
Molecular
Molecular
Cell
Molecular
Cell
Molecular
Cell
Genome-wide Polycomb-Bound RNAs
Molecular Cell Genome-wide
Genome-wide
Polycomb-Bound
RNAs
Polycomb-Bound
Molecular
Cell RNAs
Genome-wide localization of PRC2 andPromoter-associated
H3K27me3transcript
25%
Features
Genome-wide Polycomb-Bound
RNAs
de Polycomb-Bound
RNAs
P R O G RRNAs
ESS
ide Polycomb-Bound
Intronic transcript
exon
Feature%
engaged RNA Pol II during promoter escape or elongation, rather than
Antisense transcripts
by regulating the initiation phase of transcription. A likely possibility
is that PRC2 can repress transcription by different mechanisms, and
this may be gene specific.
30%
SigTest
BAM/
BED/
SAM Noncoding RNA
•  Remove duplicate alignments •  return unique hits only •  flag mul7hits Comparison in biological contexts of various genomic and epigenetic features:
Peak%
III. Methods: probabilistic inference to disambiguate multihits and derive statistical-confidence RIP regions
Features
I. Introduction: genome-wide identification of long noncoding RNAs interacting with chromatin regulators
Features
1
only) PRC2
47,108
11,294
783
SINE
Satellite
LINE
Simple repeat
LTR
Low complexity
Others SINE
RIPSeeker is a self-contained software package written in R and specifically tailored to efficiently analyze RIP-Seq data with statistical rigor. RIPSeeker demonstrates its sensitivity by
identifying the canonical PRC2- and CCNT1-associated (not shown) ncRNA with high statistical confidence and reasonable resolution. Additionally, RIPSeeker incorporates several existing R packages to automatically annotate RIP regions via Ensembl database, perform GO
enrichments, and launch UCSC genome browser with putative RIP regions as custom tracks for
visualization.
Because our current knowledge of protein-associated ncRNA is largely unknown (unlike TFBS),
it is difficult to evaluate the specificity of RIPSeeker predictions. However, the ability to prioritize candidate genes with rigorous statistical assessment allows RIPSeeker to generate valuable
information from RIP-Seq data for formulation of subsequent (more focused) experimental and
computational strategy.