Technical note: Low input RNA-Seq

 National Genomics Infrastructure Stockholm - Technical Note
Low Input RNA-Seq
Introduction
Library complexity
For a standard RNA-Seq transcriptome library,
SciLifeLab NGI Stockholm currently requires at least
2.0 µg RNA. To obtain this, users will typically need
somewhere in the region of 2 - 5x107 primary cells
(ref). In many experimental setups, it may not be
possible to generate this much source material.
To inspect how different input concentrations of
RNA affect library complexity (the number of unique
molecules in a sample), we used preseq to
extrapolate unique versus total molecules (Fig 2).
Decreasing the input concentration does result in
lower diversity, as expected. However, the
difference between the two samples has a larger
effect. This is likely to do with the difference in RIN
values of the input RNA.
NGI Stockholm will accept RNA samples with less
than 2.0 µg RNA, though we do not provide the
same results guarantee that typically comes with
transcriptome analysis. The aim of this study is to
quantify the effect of using low concentration RNAinputs in the RNA-Seq pipeline.
Experimental setup
Two biological samples were used. AM7852 is total
RNA prepared from HeLa-S3, bought directly from
Life technologies (see product guide). GM12878 is
RNA prepared in-house from the GM12878 cell line,
ordered from the Coriell Institute.
Nine libraries were prepared with varying input
concentrations using the Illumina TruSeq RNA HT
kit with polyA-selection, multiplexed in 1 lane and
sequenced on an Illumina HiSeq with High Output,
PE 2x100bp. Data was processed using the tuxedo
suite (tophat, cufflinks, cuffdiff and cummeRbund),
using the Human genome assembly GRCh37.
Samples clustered as expected (Fig 1).
Fig 1. Dendrogram showing sample clustering. Libraries can be
seen to cluster by sample type, with close correlation between
different input concentrations.
Author: Phil Ewels
[email protected]
Fig 2. Preseq complexity curves. The number of reads
sequenced for each library is plotted as a point. The low input
GM12878 libraries are sequenced nearly to saturation, whereas
further sequencing of the AM7852 would reveal more unique
reads.
Low Input RNA-Seq
Page 1 of 2
1474-2_LowInputRNA-SeqTechnote.pdf
NGI Technical Note
Doc #1474:2, 2014-12-16
National Genomics Infrastructure Stockholm - Technical Note
Number of observed genes
Correlation between replicates
For a greater understanding of the biological impact
of this difference in library complexity, we
calculated the number of observed genes at
different sub-sampling points within each library.
This gives curves with a similar profile to the preseq
plot, yet with a more tangible meaning (Fig 3). Here,
the difference between libraries is more pronounced
and input concentration appears to have little effect.
To check that replicates of the same sample yield
similar counts for each transcript, we plotted a
matrix of FPKM scatter plots with histograms
(Fig 4).
Fig 4. Scatter plots and histograms of FPKM values for the
AM7852 samples (left:right, top:bottom - 1 µg, 1 µg, 1 µg,
500 ng, 200 ng, 50 ng).
Replicates show a high degree of correlation,
indicating excellent reproducibility. However, the
final 50 ng input sample has a drop in the left peak
of the histogram showing a loss of information
about lowly expressed genes. There is also greater
variation in the scatter plots involving this sample.
This suggests that for transcript level analysis this
sample may be less reliable than the others.
Conclusion
Fig 3. Cufflinks gene observations at increasing sub-sampling
levels. Number of genes and the slope of the curve are similar
across input concentrations. Biological variation and sample
quality have a greater impact than input concentration.
Author: Phil Ewels
[email protected]
In summary, we conclude that sequencing of RNA
samples can give reliable data down to an input
concentration of 200 ng. Our results again show the
importance of high quality RNA extractions, with the
RIN score of the input RNA having a far larger
impact than the input concentration.
Low Input RNA-Seq
Page 2 of 2
1474-2_LowInputRNA-SeqTechnote.pdf
NGI Technical Note
Doc #1474:2, 2014-12-16