article

Detecting microRNA targets or siRNA off-targets using expression data

Posted: 3 December 2008 | Anton J. Enright, Group Leader, EMBL – European Bioinformatics Institute | No comments yet

Recently, small RNAs such as microRNAs (miRNAs) have been demonstrated to be important regulators in both plants and animals. In animals miRNAs act as translational repressors of target genes through a combination of inhibition of translation and mRNA destabilisation. These molecules have been implicated in a multitude of diseases, including cancer and represent promising candidates for both diagnostics and therapeutics. While substantial progress has been made in the detection, sequencing and profiling of miRNAs, accurately delineating their targets remains difficult. Purely computational approaches hold much promise, yet they still suffer from over-prediction. In this article we will describe alternative approaches that utilise computational analysis combined with gene expression data to better detect miRNA effects and their targets. In particular we will describe Sylamer1 a new tool for the detection of miRNA targets and siRNA off-target effects from expression data.

Recently, small RNAs such as microRNAs (miRNAs) have been demonstrated to be important regulators in both plants and animals. In animals miRNAs act as translational repressors of target genes through a combination of inhibition of translation and mRNA destabilisation. These molecules have been implicated in a multitude of diseases, including cancer and represent promising candidates for both diagnostics and therapeutics. While substantial progress has been made in the detection, sequencing and profiling of miRNAs, accurately delineating their targets remains difficult. Purely computational approaches hold much promise, yet they still suffer from over-prediction. In this article we will describe alternative approaches that utilise computational analysis combined with gene expression data to better detect miRNA effects and their targets. In particular we will describe Sylamer1 a new tool for the detection of miRNA targets and siRNA off-target effects from expression data.

Recently, small RNAs such as microRNAs (miRNAs) have been demonstrated to be important regulators in both plants and animals. In animals miRNAs act as translational repressors of target genes through a combination of inhibition of translation and mRNA destabilisation. These molecules have been implicated in a multitude of diseases, including cancer and represent promising candidates for both diagnostics and therapeutics. While substantial progress has been made in the detection, sequencing and profiling of miRNAs, accurately delineating their targets remains difficult. Purely computational approaches hold much promise, yet they still suffer from over-prediction. In this article we will describe alternative approaches that utilise computational analysis combined with gene expression data to better detect miRNA effects and their targets. In particular we will describe Sylamer1 a new tool for the detection of miRNA targets and siRNA off-target effects from expression data.

Currently there are 695 confirmed miRNAs in Human (miRBase 12)2. One expects miRNAs to have multiple targets. However few miRNA targets have been experimentally confirmed so far. Currently, no accurate high-throughput experimental approaches exist for accurately determining miRNA target binding. Clearly, purely computational approaches are promising but while they have been shown to have high-sensitivity they can suffer from over-prediction issues3. The key issue faced by computational approaches is that miRNAs are short (21nt) and that the key region for binding specificity is even shorter (6-8nt). Finding complementary binding sites in the 3’UTRs of potential target transcripts is hence daunting as one can find 6nt complementary sites for any miRNA across the entire genome randomly at reasonably high frequencies.

Target prediction

Some computational tools use additional filters to aid in the process of deciding which complementary sites are real and which are likely noise. Such filters include conservation of the site across species, potential thermodynamic energy, positional constraints within the 3’UTR and statistical models. These extra filters have indeed helped4, but still fall short of the mark. Furthermore, some of these filters (e.g. conservation) may be increasing specificity at the expense of sensitivity as it has been shown that some miRNAs have target sets that are not highly conserved5. It is possible that many binding sites predicted by such methods are feasible sites but that the miRNA and its predicted target are never in the same place at the same time. It seems clear that extra information derived experimentally can aid the process of target discovery.

The effect of microRNAs on mRNA expression levels

Initially, it was thought that miRNAs primarily operated by translational silencing and that the action of a miRNA would only be evident at the protein level. However, an experiment by Lee Lim and others at Rosseta Inpharmatics was instrumental in providing the first evidence that the action of miRNAs could also be detected at the mRNA level6. Their work showed that miRNAs introduced into HeLa cells had strong effects on mRNA levels and that the transcripts whose expressions were decreasing were strikingly enriched in potential seed matches to the introduced miRNA. Subsequent experiments demonstrated that miRNAs binding to their targets in 3’UTRs appear to stimulate both deadenylation and decapping which in turn marks the target transcript for degredation5. Recent studies combining both proteomic analysis and mRNA expression following miRNA perturbation do show cases where protein levels change but mRNA levels remain relatively static7,8. However, it would seem that in most cases significant shifts were observed at both the protein and mRNA levels.

The fact that introducing or removing a miRNA from a system of interest causes measurable mRNA and protein level changes creates a new way of probing miRNA targets. In the simplest case one can imagine comparing wild-type cells to cells where a miRNA is being over-expressed. One can then compare gene-expression profiles of these two cell types working under the assumption that increased levels of the miRNA will stimulate greater repression of its target genes. The expression levels of these putative target genes would decrease significantly and be detected according to fold-change.

Such observations have been previously used to predict miRNA targets in a number of systems. In an analysis of early Zebra fish development a single miRNA (miR-430) was reinjected into mutant embryos5. Expression profiles were taken from mutant embryos and embryos which had also been injected. A large number of mRNAs showed significant expression decreases following injection of miR-430. Of a total of 27 candidate target mRNAs from a total of 30, validated as being direct miRNA targets using GFP reporter assays. A similar study comparing T-Helper (Th1) cells from wildtype mice versus DmiR-155 (bic) mutant mice9. In this case those genes whose mRNA expression levels increased significantly in the mutant were identified as likely miR-155 target genes of which a number were subsequently validated using a luciferase reporter assay.

An experimental paradigm

The experimental paradigm for such studies is straightforward (see Figure 1).

enright - figure 1

Firstly it is useful to profile the system of interest to determine which miRNAs are expressed or changing significantly. Secondly a miRNA of interest can be perturbed using for example, a knock-out or transfecting in an antisense molecule to bind to the miRNA of interest and prevent if from functioning (e.g. Antigomir, 2’O-methyl or LNA). Subsequently, mRNA expression profile or proteomics analysis is used to obtain a readout of the effect of the perturbation. Finally, computational analysis of the expression data will determine whether there is a primary effect, how significant the effect is and also the candidate target genes involved.

Establish which miRNAs are important:

  • miRNA profiling
  • new technology sequencing.

Perturb miRNAs in the system:

  • Antisense knock-down
  • Knock-out mouse model
  • Knock-in transfection of double stranded miRNA analogue
  • Over-expression vector.

Profile mRNA or protein levels:

  • Gene expression
  • Proteomics.

Analysis and target prediction:

  • Differential expression analysis (e.g. t-test)
  • Sylamer.

Computational and Statistical Analysis

A question remains for this type of analysis: Are the genes, whose expression levels are changing, direct targets bound by the miRNA or indirect secondary effects? One relatively straightforward way to answer this question is to look at the presence or absence of complementary miRNA seed matches in the genes that are changing. If these genes are real direct targets of the miRNA then one would expect them to possess seed matches to the miRNA. Analysis of the frequency and significance of such seed matches in the 3’UTRs of genes that have changed can hence allow one to determine whether the effect is significant and identify the subset of genes most likely to be direct targets. However, questions remain about what threshold to use when selecting a genelist for such analysis. Gene-set enrichment analysis tools such as GSEA10 have recently been shown to be useful for analysis of over-represented terms or annotations in gene lists. Instead of using a single cutoff and thus a single genelist, Gene Set Enrichment Analysis (GSEA)10 uses the full list of genes, ranked according to how much they change in an experiment. This approach removes the need of imposing arbitrary cutoffs, instead searching for coordinated shifts in complete pathways or gene sets of biological interest, even if many individual genes might not lie at the top of the ranked genelist10.

This type of analysis can be extended to the case of finding words that are complementary to seed-regions of miRNAs or siRNAs in the 3’UTRs of genes whose expression has changed following a perturbation experiment11. Hence, if enrichment of such words correlates with the rankings of 3’UTRs of genes whose expression has changed during a miRNA experiment, part of the expression changes can be attributed to direct effects. This approach has been validated on numerous datasets and shows that particular miRNAs have major effects on tissue or developmental expression profiles, where that miRNA is removed or reinjected. Similarly, RNA interference (RNAi) experiments can be assessed to determine whether gene-expression changes resulting from knockdown are likely due to a primary effect or secondary, miRNA-like, off-target effects12. Although tools exist for discovering enriched word motifs in sequences, many do not deal with ranked sequences or cannot be directly applied to the problem of miRNA seed analysis. Recently we demonstrated a new method Sylamer1 for analysis of miRNA binding in expression data. The method is both powerful and also extremely fast, making it ideal for genome-wide datasets.

Sylamer

The Sylamer algorithm1 takes a list of genes with their 3’UTRs ranked from up-regulated to down-regulated following an miRNA or RNAi experiment. Seed matches to miRNA computed together with associated hypergeometric P-values of binding sites in 3’UTRs that have perfect complementarities to the 5′ end (seed) of a miRNA or siRNA. This is performed across nested leading bins of the ranked sequences, analogous to Gene Set Enrichment Analysis. The output is used to produce an intuitive landscape plot that tracks occurrence biases for all seeds across the gene ranking. This enables verification of the hypothesis that miRNAs or siRNAs are directly affecting expression, while also identifying the fraction of genes changing due to this effect. Unlike previous approaches, the method is fast enough to allow genome-wide analysis of all known miRNA seeds in large-scale experiments. Analysis of all known miRNA seeds for a human genome-wide experiment takes less than a minute. Below we demonstrate the utility and accuracy of this type of approach on several example miRNA and siRNA datasets.

Examples

In order to determine the effectiveness of our approach for the detection of enriched/depleted miRNA binding signals we applied it to two published datasets. The first dataset derives from a mouse knockout model of miR-155 (bic)9. In this case gene expression data was obtained for T-helper (Th1) cells from both knockout and wild-type animals. Each gene on the array (for which a 3’UTR was available) was ranked from most up-regulated to most down-regulated according to fold-change t-statistic. Our goal is to reliably determine whether the greatest contributions to gene-expression changes are direct effects resulting from absence of miR-155 in the knockout (i.e. loss of miR-155 mediated repression). The sorted genelist and associated 3’UTR sequences were supplied to Sylamer. The resulting enrichment analysis plot (see Figure 2a) clearly shows that most words drift randomly without showing any significance.

enright - figure 2

A strong signal is however evident for 6 (P ≤ 1×10-41), 7 (P ≤ 1×10-36) and 8nt (P ≤ 1×10-25) words corresponding to the seed-region of miR-155, peaking at ≈500 genes. This indicates that these most up-regulated genes are enriched in potential miR-155 binding sites and that their observed over-expression is likely due to the absence of miR-155 in the knockout sample.

In another example we take gene-expression data from maternal zygotic Dicer mutant (MZ-Dicer) Zebrafish embryos5,13. Here we aim to assess the role of an early developmental miRNA by comparing mutant fish against mutant fish injected with synthetic miR-430. The mutant fish cannot produce significant quantities of functional miRNAs as the Dicer enzyme (required for mature miRNA excision), is non-functional13. In this case the perturbation involves a miRNA being reintroduced to a system where miRNAs are not present. If miR-430 is significantly affecting gene-expression we expect the effect to be most evident in down-regulated genes (i.e. gain of miR-430 mediated repression). The resulting enrichment plots obtained using Sylamer (see Figure 2b) show that most words exhibit no significant enrichment or depletion across the genelist with the exception of those words directly corresponding to the seed region of miR-430.

As expected, this signal is observed in the down-regulated section of the genelist (P ≤ 1×10-26 at 6nt). This reconfirms the hypothesis that injection of miR-430 leads to direct repression of its set of target transcripts and yields a set of genes likely to be highly enriched in real miR-430 targets and excellent candidates for further validation5.

Application to RNAi experiment expression data

RNA interference (RNAi) is an increasingly common approach to study the effect of knocking-down a particular gene of interest. Frequently, gene-expression studies are undertaken after a gene has been knocked-down in order to determine the effect of RNAi knock-down of the primary target on mRNA expression levels and to identify possible downstream pathways and regulatory targets. However, it has been shown that many off-target effects observed in RNA interference experiments may be due to siRNAs acting as miRNAs on unintended genes12. This can create serious issues for genome-wide screens as designed siRNAs may be unintentionally affecting the expression of tens or even hundreds of genes. In the context of assessing expression data from an RNAi experiment this type of approach can be used to assess whether miRNA-like effects are present. In these cases one wants to see little or no enrichment or depletion of words complementary to the siRNA and any gene-expression changes observed are most likely secondary effects following successful knockdown of the intended target gene. Conversely, if an siRNA is binding other transcripts (off-targets), we expect to observe specific enrichment of complementary words to the 5’ end of that siRNA in down-regulated genes. The size and extent of any observed enrichment may also be used to evaluate how serious this effect is. Of course, smart-pooling of multiple siRNAs to a target gene should alleviate this effect, however this type of analysis could still be useful for validating large-scale screens.

A previous study used microarrays to measure the effects of transfecting different siRNAs into HeLa cells12. Using these data we can produce, for each transfection experiment, a genelist ranked according to fold-change starting with the most down-regulated genes (likely to be direct off-targets). In the first example (see Figure 2c) the siRNA does not seem to exhibit off-target effects as no particular sequences are enriched or depleted and expression effects observed are likely direct.

However in the second example a significant enrichment of words matching the 5′ end of the siRNA (see Figure 2d) is observed.

It can be seen that the effect on the expression profile is due to a miRNA-like effect, since the only significant words are those that match to the beginning of the siRNA. The use of Sylamer in these cases can help to identify screens which have worked as planned and to flag those screens where significant miRNA like off-target effects are observed.

Discussion

The examples shown above will hopefully illustrate the power of using enrichment analysis to detect miRNA seed sequences in genelists. Although not explicitly designed for siRNA analysis, we believe such approaches may also be useful for validating hits in large-scale siRNA hits. This approach allows one to determine rapidly whether an miRNA like effect is observed, to quantify the extent of the effect and to isolate the likely set of genes involved. The examples shown above utilise mRNA expression profiling as a readout of the miRNA perturbation although proteomics data could be also be used as long as it can produce a ranked ordered genelist of protein levels. It is not strictly required to directly perturb miRNAs, although this type of experiment typically gives the best results. One might also obtain reasonable results comparing wild-type cells to cancerous cells for example. The Sylamer software described in this article is freely available from http://www.ebi.ac.uk/enright/sylamer.

References

  1. van Dongen, S., Abreu-Goodger, C. & Enright, A.J. Detecting microRNA binding and siRNA off-target effects from expression data. Nature methods (2008).
  2. Griffiths-Jones, S., Saini, H.K., van Dongen, S. & Enright, A.J. miRBase: tools for microRNA genomics. Nucleic acids research 36, D154-158 (2008).
  3. Sethupathy, P., Megraw, M. & Hatzigeorgiou, A.G. A guide through present computational approaches for the identification of mammalian microRNA targets. Nature methods 3, 881-886 (2006).
  4. Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105 (2007).
  5. Giraldez, A.J. et al. Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312, 75-79 (2006).
  6. Lim, L.P. et al. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433, 769-773 (2005).
  7. Baek, D. et al. The impact of microRNAs on protein output. Nature 455, 64-71 (2008).
  8. Selbach, M. et al. Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58-63 (2008).
  9. Rodriguez, A. et al. Requirement of bic/microRNA-155 for normal immune function. Science 316, 608-611 (2007).
  10. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
  11. Farh, K.K. et al. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821 (2005).
  12. Birmingham, A. et al. 3′ UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods 3, 199-204 (2006).
  13. Giraldez, A.J. et al. MicroRNAs regulate brain morphogenesis in zebrafish. Science 308, 833-838 (2005).