article

Understanding early mouse embryonic development using single-cell mRNA Sequencing

Posted: 3 July 2014 | | No comments yet

Biomedical research often involves the use of cell lines that can be cultured in a laboratory. Individual cells within such cell lines often share a similar morphology. A remarkable exception are in vitro cultured mouse Embryonic Stem Cells (mESCs) – pluripotent cells derived from the blastocyst stage of the mouse developing embryo.

Different from many other cell lines, mESCs show heterogeneity in morphology and gene expression between the individual cells within a population. This heterogeneity makes it challenging to characterise mESCs at a molecular level, since global profiling methods generally require large numbers of cells. However, new methods and technologies allow global transcriptome profiling of individual cells. By capturing the global transcriptome of many individual cells using single-cell next-generation mRNA sequencing, our research focuses on the identification of novel transcription factor modules that contribute to the unique pluripotent state of mESCs, as well as to the differentiation of mESCs, as a model for early mouse embryonic development.

After fertilisation, a mouse oocyte develops through distinct embryonic stages into a fetus. Around three days after fertilisation (E3), the embryo forms a hollow ball within the ovulatory duct, called the blastocyst. At day E3.5, the blastocyst undergoes drastic changes after implantation into the womb. During mouse gastrulation (E6.5) the three germ layers (ectoderm, mesoderm and endoderm) develop. During all these stages, individual cell fate and identity is tightly regulated. This is not determined by the genetic information, as most cells within an organism have an identical genome. Cell identity is rather determined by what genetic information is used at a specific point in time at a certain rate. To gain insight into embryonic development, it is therefore essential to study the processes that occur on the genome by epigenetic profiling. Transcriptome profiling on the other hand will provide a read-out of the epigenetic status of a cell.

Mouse Embryonic Stem Cells (mESCs) are in vitro cultured cells that are derived from the inner cell mass cells of the early blastocyst1,2, and are therefore an accessible and elegant model system to study early embryonic development (see Figure 1). Like cells of the early embryo, mESCs are pluripotent: when mESCs are implanted into an embryo, they are able to contribute to all tissue types present in an adult mouse, including germline. The first mouse ESC lines were isolated in 19811,2. These studies showed that these cells require dedicated culture conditions to avoid differentiation of the mESCs. Currently, a widely used method to maintain pluripotency of ESCs is to supplement the culture medium with serum and the cytokine Leukemia Inhibitory Factor (LIF)3.

In the absence of culture supplements, or in the presence of differentiation-stimulating factors, mESCs will differentiate into a wide range of morphologically different cell types representing the three germ layers, reflecting embryonic development. However, already within undifferentiated mESC populations there are clear differences in morphology between individual cells (see Figure 2). It is currently unclear whether functionality can be attributed to this heterogeneity. Perhaps at their undifferentiated state mESCs already represent a range of cells primed towards various lineages essential for future embryonic development4. As the different cells are present within one population, single cell measurements are required to get more information on the molecular identity of the various cells, and to understand the functionality and molecular basis of the heterogeneity. Recent developments enable comprehensive global transcriptome of individual cells using single-cell mRNA-sequencing (mRNA-Seq). However, obtaining high quality single cell mRNA-Seq profiles is still challenging, because of the low amounts of molecules (in our case mRNA) in individual cells and the very small reaction volumes that are required. Here, we will provide a detailed overview of single-cell mRNA-Seq and how we use it to study embryonic development and mESCs.

Single-cell mRNA-Sequencing (mRNA-Seq)

The main purpose of our single-cell RNA sequencing is to be able to map transcription factor programmes and cellular dynamics within individual in vitro ESCs. Since out of all RNAs in a cell only the mRNA molecules encode proteins, characterising the repertoire of mRNAs is sufficient to obtain a snapshot of cellular identity. This circumvents single-cell global protein profiling, which is currently not yet feasible. Additionally, by sequencing the mRNA molecules we obtain information on the transcription activity that occurs on the genome during early embryonic development, another component contributing to pluripotency. In contrast to RNA molecules from other classes, most of the mRNAs are characterised by a poly(A) tail. This poly(A) tail can be used to selectively target the mRNA molecules.

In a single-cell mRNA-Seq experiment, we exhaustively determine all coding mRNA molecules in a cell by massively sequencing. Next to the relative abundance of the individual mRNA molecules, we obtain the identity of the individual nucleotides and thereby information on alternative splicing events and single nucleotide polymorphisms (SNPs). Due to the very low amount of starting material (about 0.5 picogram per cell), single-cell mRNA-Seq is a labour-intense process that requires accurate and precise sampling handling as further described in the next section.

Individual steps of single cell mRNA-Seq for mouse ESCs

The individual steps during a single-cell mRNA-Seq experiment are depicted in Figure 3. At the ‘wet’-lab side, mouse ESCs are separated into individual compartments (see Figure 3A). Subsequently, the cells are lysed to release the RNA from the single cell into suspension (see Figure 3B). Most current sequencers only sequence DNA: direct sequencing of RNA has been very challenging thus far. Therefore, the RNA is converted into complementary DNA (cDNA) by reverse transcription (RT). As explained in the previous section, there are multiple classes of RNA present within a cell, such as tRNA, rRNA and mRNA, not all of which are equally informative for our studies. To target the mRNA molecules, the RT reaction is primed on the mRNA-specific poly(A) tail using oligo(dT) primers (see Figure 3C). Subsequently the RNA:DNA hybrids molecules are converted into double stranded cDNA, after which the resulting cDNA is amplified by PCR to obtain sufficient material to be prepared for sequencing. The method we use is optimised to enable all steps in a serial fashion in a single reaction chamber without in-between product purifications, from single cell capture to the final amplified cDNA. This workflow enables minimal loss of RNA and/or cDNA, ensuring the full complexity of mRNA molecules to be present during sequencing of the cDNA. The sequencer itself is demanding for the DNA that is loaded on the machine. The DNA has to be pure and the individual fragments are preferably uniform of length. Additionally adapters have to be ligated to each individual molecule to enable capturing of the cDNA fragments by the sequencer. It should be noted that these steps, referred to as ‘library preparation’, are not specific to mRNA-Seq, but are part of most next-generation sequencing applications such as whole genome sequencing and ChIP-seq. When the DNA is fully processed it can be loaded on the sequencer (see Figure 3D). A successful sequencing run generates huge quantities of data: Currently hundred millions sequences of short DNA molecules (depending on the setting of the sequencer between 36 and 150 nt long).

On the ‘dry’-lab side, these short sequences are mapped against a reference genome to determine the parts of the genome that were transcribed. This enables accurate measurements of the transcript abundance of all mRNA molecules and their splice variants within the individual cells. Application of mRNA-Seq on single cells is relatively new and therefore uniform and clear-cut methods for analysis of the large amounts of data are currently being developed (see Figure 3E). For our research the expression profiles of all genes across hundreds of individual cells will be compared to gain insights into the dynamics of an embryonic stem cell population. Clearly, a big challenge is to design the experiments as such that biologically relevant conclusions can be drawn from the large datasets.

Quantification and validation of single cell mRNA-Seq

Absolute quantification using Unique Molecular Identifiers (UMIs)

Gene expression as measured by sequencing (RNA-Seq) is generally represented by relative values such as ‘FPKM’ (sequenced Fragments Per Kilobase of transcript per Million mapped reads). These have been shown to be very valuable for determining changes in gene expression. However, they do not provide quantitative expression of the absolute amount of mRNA molecules present in a single cell for each gene. To enable this, Unique Molecular Identifiers (UMIs) have recently been developed6. UMIs are random mixes of short stretches of DNA of around five nucleotides that are added to each cDNA molecule, either during reverse transcription or in the subsequent PCR, as part of the primers used in these reactions. Because of the very low probability that the same UMI barcode is added to an identical cDNA fragment, the resulting cDNA fragments in the sample after UMI addition are all unique. A large part of the bias created in single cell RNA-Seq experiments originates from the exponential amplification during PCR, something that is challenging to correct for7,8. As the addition of UMIs generates unique molecules in the original pool, it allows to correct for PCR biases after sequencing by including each specific sequence only once in the final analysis, discarding all but one of reads with exactly the same sequence. Although a very elegant method, the use of UMIs is not compatible with all the single cell mRNA-Seq methods described to date9. Especially methods that generate full-length transcript coverage seem to be challenging to quantify using UMIs. This is caused by the fact that only the 5’ and/or 3’ ends of transcripts, but not any in-between fragments, will contain a UMI after PCR and shearing. For these methods, RNA spike-ins are generally used to estimate transcript abundance.

Quantification using RNA spike-in

The abundance of mRNA transcript copies derived from a single gene generally ranges between one and a few hundred. All these transcripts have to be extensively processed before being measured by the sequencer. RNA spike-ins consist of RNA molecules of known quantity and sequence that are added to the single-cell suspension before reverse transcription of the mRNA. The spike-in RNAs can be used to generate standard curves for quantification of these RNAs after sequencing, which enable to correct for stochastic reverse transcriptase priming and unequal PCR amplification of the endogenous single-cell transcripts10. Furthermore, RNA spike-ins can be used to estimate the total amount of mRNA molecules present in the original cell11.

Validation using RT-qPCR

Considering the very low amounts of biological input material and the elaborate workflow with multiple amplification steps, quality control is an essential part of single-cell mRNA-Seq. Reverse Transcriptase quantitative PCR (RT-qPCR) is often used to determine the technical bias introduced during the sample preparation procedure. Furthermore, RT-qPCR can be applied as an independent measurement of gene expression to validate the single-cell mRNA-Seq profiles.

Analysis of single-cell mRNA-Seq

There are many approaches to analyse global gene expression mRNA-Seq profiles from individual cells, depending on the research question. For our research, a prominent goal is to identify common and differential gene expression programmes between individual mESCs. Principal component analysis and hierarchical clustering of all the genes and their expression between individual cells will provide a comprehensive overview of the identity and similarity of the different cells present within a mESC population. This will provide us with an answer on whether the various different ESCs within the total population represent various stages of differentiation, as is the current thought12. It might also shed more light on the observation that heterogeneity within ESC populations is a requirement for maintenance of pluripotency. Investigating modules of differentially expressed genes in more detail will teach us the (combinations of) transduction pathways that are active in the individual cells. Deep sequencing of many individual cells also has the potential to identify new gene modules with tightly intertwined regulatory patterns that are unique to ESCs.

By performing single-cell mRNA-Seq at multiple timepoints after differentiation we aim to gain more insight into the hierarchical order in which genes are activated during lineage commitment towards more differentiated cell types such as neural progenitor cells (NPCs). Identifying the master regulators key to the specific differentiation pathways will provide valuable information on cell identity acquired during the formation of the various cell types. Additionally, it will help us understand diseases associated with defective regulation of these transcription programmes, which might be valuable for the future development of medical treatments.

Outlook

Although still in its infancy, single-cell RNA-sequencing will revolutionise present day molecular biology by being able to capture differences in gene expression between individual cells, and to obtain comprehensive genome-wide RNA profiles from rare in vivo cells that cannot easily be amplified in vitro. Due to the heterogeneity in many biological samples – be it developing embryos or cancer tumours – these high-throughput RNA-sequencing experiments are rapidly becoming the new standard for gene expression analysis. Recent breakthroughs13,14 show the power of single-cell transcriptome analysis to gain detailed and accurate understanding of cell identity, cell-cell interactions and cell differentiation. Our research of single-cell mRNA-Seq on in vitro mouse ESCs will generate fundamental understanding of the master regulators that enable the blastocyst cells to develop into an entire organism. By studying individual cells in great detail we will be able to unravel gene expression regulation within individual ESCs, as well as expression dynamics within a population of ESCs. These insights are not only relevant for mouse embryogenesis, but will give us knowledge about the molecular development of tissues and teach us valuable lessons on gene regulation.

References

  1. Martin, G.R., Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells. Proc Natl Acad Sci U S A, 1981. 78(12): p. 7634-8.
  2. Evans, M.J. and M.H. Kaufman, Establishment in culture of pluripotential cells from mouse embryos. Nature, 1981. 292(5819): p. 154-6.
  3. Smith, A.G., et al., Inhibition of pluripotential embryonic stem cell differentiation by purified polypeptides. Nature, 1988. 336(6200): p. 688-90.
  4. Loh, K.M. and B. Lim, A precarious balance: pluripotency factors as lineage specifiers. Cell Stem Cell, 2011. 8(4): p. 363-9.
  5. Marks, H. and H.G. Stunnenberg, Transcription regulation and chromatin structure in the pluripotent ground state. Biochim. Biophys. Acta, 2014. 1839(3): p. 129-37.
  6. 6. Kivioja, T., et al., Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods, 2012. 9(1): p. 72-4.
  7. Li, J., H. Jiang, and W.H. Wong, Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol, 2010. 11(5): p. R50.
  8. Risso, D., et al., GC-content normalization for RNA-Seq data. BMC Bioinformatics, 2011. 12: p. 480.
  9. Wu, A.R., et al., Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods, 2014. 11(1): p. 41-6.
  10. Jiang, L., et al., Synthetic spike-in standards for RNA-seq experiments. Genome Res, 2011. 21(9): p. 1543-51.
  11. Marinov, G.K., et al., From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing. Genome Res, 2014. 24(3): p. 496-510.
  12. Tanaka, T.S., Transcriptional heterogeneity in mouse embryonic stem cells. Reprod Fertil Dev, 2009. 21(1): p. 67-75.
  13. Xue, Z., et al., Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature, 2013. 500(7464): p. 593-7.
  14. Deng, Q., et al., Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 2014. 343(6167): p. 193-6.

Acknowledgements

We thank the Head of the Department of Molecular Biology Professor Henk Stunnenberg for continuous support, and our colleagues of the Embryonic Stem Cell group for valuable discussions. Thanks to NWO for providing a personal grant (NWO VIDI; 864.12.007) to Hendrik Marks for this research. This article is based on a Dutch paper that we recently published in ‘Analyse’ (Nr. 3; Juni 2014), a bi-monthly journal for biomedical research (https://www.nvml.nl/35/Analyse.html).

Biography

Hendrik Marks obtained his MSc and PhD from Wageningen University (NL), followed by postdoctoral trainings at the University of British Columbia (Vancouver, CA) and the Radboud University (Nijmegen, NL). He is currently an Assistant Professor in epigenetics and stem cells at the Radboud Institute of Molecular Life Sciences (RIMLS, Nijmegen, NL). The aim of his research group is to unravel the regulatory mechanisms that maintain ESCs in their pluripotent state using global (single-cell) transcriptome, epigenome and proteome approaches. [email protected]

René Dirks holds an MSc degree in molecular biology. He is working on the characterisation of heterogeneity within pluripotent and differentiating stem cell populations as part of his doctorate’s degree at the Radboud Institute for Molecular Life Sciences (RIMLS, Radboud University Nijmegen, NL).