Understanding early mouse embryonic development using single-cell mRNA Sequencing

Share via

Posted: 3 July 2014 | | No comments yet

Biomedical research often involves the use of cell lines that can be cultured in a laboratory. Individual cells within such cell lines often share a similar morphology. A remarkable exception are in vitro cultured mouse Embryonic Stem Cells (mESCs) – pluripotent cells derived from the blastocyst stage of the mouse developing embryo.

Different from many other cell lines, mESCs show heterogeneity in morphology and gene expression between the individual cells within a population. This heterogeneity makes it challenging to characterise mESCs at a molecular level, since global profiling methods generally require large numbers of cells. However, new methods and technologies allow global transcriptome profiling of individual cells. By capturing the global transcriptome of many individual cells using single-cell next-generation mRNA sequencing, our research focuses on the identification of novel transcription factor modules that contribute to the unique pluripotent state of mESCs, as well as to the differentiation of mESCs, as a model for early mouse embryonic development.

After fertilisation, a mouse oocyte develops through distinct embryonic stages into a fetus. Around three days after fertilisation (E3), the embryo forms a hollow ball within the ovulatory duct, called the blastocyst. At day E3.5, the blastocyst undergoes drastic changes after implantation into the womb. During mouse gastrulation (E6.5) the three germ layers (ectoderm, mesoderm and endoderm) develop. During all these stages, individual cell fate and identity is tightly regulated. This is not determined by the genetic information, as most cells within an organism have an identical genome. Cell identity is rather determined by what genetic information is used at a specific point in time at a certain rate. To gain insight into embryonic development, it is therefore essential to study the processes that occur on the genome by epigenetic profiling. Transcriptome profiling on the other hand will provide a read-out of the epigenetic status of a cell.

Mouse Embryonic Stem Cells (mESCs) are in vitro cultured cells that are derived from the inner cell mass cells of the early blastocyst^1,2, and are therefore an accessible and elegant model system to study early embryonic development (see Figure 1). Like cells of the early embryo, mESCs are pluripotent: when mESCs are implanted into an embryo, they are able to contribute to all tissue types present in an adult mouse, including germline. The first mouse ESC lines were isolated in 1981^1,2. These studies showed that these cells require dedicated culture conditions to avoid differentiation of the mESCs. Currently, a widely used method to maintain pluripotency of ESCs is to supplement the culture medium with serum and the cytokine Leukemia Inhibitory Factor (LIF)³.

In the absence of culture supplements, or in the presence of differentiation-stimulating factors, mESCs will differentiate into a wide range of morphologically different cell types representing the three germ layers, reflecting embryonic development. However, already within undifferentiated mESC populations there are clear differences in morphology between individual cells (see Figure 2). It is currently unclear whether functionality can be attributed to this heterogeneity. Perhaps at their undifferentiated state mESCs already represent a range of cells primed towards various lineages essential for future embryonic development⁴. As the different cells are present within one population, single cell measurements are required to get more information on the molecular identity of the various cells, and to understand the functionality and molecular basis of the heterogeneity. Recent developments enable comprehensive global transcriptome of individual cells using single-cell mRNA-sequencing (mRNA-Seq). However, obtaining high quality single cell mRNA-Seq profiles is still challenging, because of the low amounts of molecules (in our case mRNA) in individual cells and the very small reaction volumes that are required. Here, we will provide a detailed overview of single-cell mRNA-Seq and how we use it to study embryonic development and mESCs.

Single-cell mRNA-Sequencing (mRNA-Seq)

The main purpose of our single-cell RNA sequencing is to be able to map transcription factor programmes and cellular dynamics within individual in vitro ESCs. Since out of all RNAs in a cell only the mRNA molecules encode proteins, characterising the repertoire of mRNAs is sufficient to obtain a snapshot of cellular identity. This circumvents single-cell global protein profiling, which is currently not yet feasible. Additionally, by sequencing the mRNA molecules we obtain information on the transcription activity that occurs on the genome during early embryonic development, another component contributing to pluripotency. In contrast to RNA molecules from other classes, most of the mRNAs are characterised by a poly(A) tail. This poly(A) tail can be used to selectively target the mRNA molecules.

In a single-cell mRNA-Seq experiment, we exhaustively determine all coding mRNA molecules in a cell by massively sequencing. Next to the relative abundance of the individual mRNA molecules, we obtain the identity of the individual nucleotides and thereby information on alternative splicing events and single nucleotide polymorphisms (SNPs). Due to the very low amount of starting material (about 0.5 picogram per cell), single-cell mRNA-Seq is a labour-intense process that requires accurate and precise sampling handling as further described in the next section.

Individual steps of single cell mRNA-Seq for mouse ESCs

The individual steps during a single-cell mRNA-Seq experiment are depicted in Figure 3. At the ‘wet’-lab side, mouse ESCs are separated into individual compartments (see Figure 3A). Subsequently, the cells are lysed to release the RNA from the single cell into suspension (see Figure 3B). Most current sequencers only sequence DNA: direct sequencing of RNA has been very challenging thus far. Therefore, the RNA is converted into complementary DNA (cDNA) by reverse transcription (RT). As explained in the previous section, there are multiple classes of RNA present within a cell, such as tRNA, rRNA and mRNA, not all of which are equally informative for our studies. To target the mRNA molecules, the RT reaction is primed on the mRNA-specific poly(A) tail using oligo(dT) primers (see Figure 3C). Subsequently the RNA:DNA hybrids molecules are converted into double stranded cDNA, after which the resulting cDNA is amplified by PCR to obtain sufficient material to be prepared for sequencing. The method we use is optimised to enable all steps in a serial fashion in a single reaction chamber without in-between product purifications, from single cell capture to the final amplified cDNA. This workflow enables minimal loss of RNA and/or cDNA, ensuring the full complexity of mRNA molecules to be present during sequencing of the cDNA. The sequencer itself is demanding for the DNA that is loaded on the machine. The DNA has to be pure and the individual fragments are preferably uniform of length. Additionally adapters have to be ligated to each individual molecule to enable capturing of the cDNA fragments by the sequencer. It should be noted that these steps, referred to as ‘library preparation’, are not specific to mRNA-Seq, but are part of most next-generation sequencing applications such as whole genome sequencing and ChIP-seq. When the DNA is fully processed it can be loaded on the sequencer (see Figure 3D). A successful sequencing run generates huge quantities of data: Currently hundred millions sequences of short DNA molecules (depending on the setting of the sequencer between 36 and 150 nt long).

On the ‘dry’-lab side, these short sequences are mapped against a reference genome to determine the parts of the genome that were transcribed. This enables accurate measurements of the transcript abundance of all mRNA molecules and their splice variants within the individual cells. Application of mRNA-Seq on single cells is relatively new and therefore uniform and clear-cut methods for analysis of the large amounts of data are currently being developed (see Figure 3E). For our research the expression profiles of all genes across hundreds of individual cells will be compared to gain insights into the dynamics of an embryonic stem cell population. Clearly, a big challenge is to design the experiments as such that biologically relevant conclusions can be drawn from the large datasets.

Quantification and validation of single cell mRNA-Seq

Absolute quantification using Unique Molecular Identifiers (UMIs)

Gene expression as measured by sequencing (RNA-Seq) is generally represented by relative values such as ‘FPKM’ (sequenced Fragments Per Kilobase of transcript per Million mapped reads). These have been shown to be very valuable for determining changes in gene expression. However, they do not provide quantitative expression of the absolute amount of mRNA molecules present in a single cell for each gene. To enable this, Unique Molecular Identifiers (UMIs) have recently been developed⁶. UMIs are random mixes of short stretches of DNA of around five nucleotides that are added to each cDNA molecule, either during reverse transcription or in the subsequent PCR, as part of the primers used in these reactions. Because of the very low probability that the same UMI barcode is added to an identical cDNA fragment, the resulting cDNA fragments in the sample after UMI addition are all unique. A large part of the bias created in single cell RNA-Seq experiments originates from the exponential amplification during PCR, something that is challenging to correct for^7,8. As the addition of UMIs generates unique molecules in the original pool, it allows to correct for PCR biases after sequencing by including each specific sequence only once in the final analysis, discarding all but one of reads with exactly the same sequence. Although a very elegant method, the use of UMIs is not compatible with all the single cell mRNA-Seq methods described to date⁹. Especially methods that generate full-length transcript coverage seem to be challenging to quantify using UMIs. This is caused by the fact that only the 5’ and/or 3’ ends of transcripts, but not any in-between fragments, will contain a UMI after PCR and shearing. For these methods, RNA spike-ins are generally used to estimate transcript abundance.

Quantification using RNA spike-in

The abundance of mRNA transcript copies derived from a single gene generally ranges between one and a few hundred. All these transcripts have to be extensively processed before being measured by the sequencer. RNA spike-ins consist of RNA molecules of known quantity and sequence that are added to the single-cell suspension before reverse transcription of the mRNA. The spike-in RNAs can be used to generate standard curves for quantification of these RNAs after sequencing, which enable to correct for stochastic reverse transcriptase priming and unequal PCR amplification of the endogenous single-cell transcripts¹⁰. Furthermore, RNA spike-ins can be used to estimate the total amount of mRNA molecules present in the original cell¹¹.

Validation using RT-qPCR

Considering the very low amounts of biological input material and the elaborate workflow with multiple amplification steps, quality control is an essential part of single-cell mRNA-Seq. Reverse Transcriptase quantitative PCR (RT-qPCR) is often used to determine the technical bias introduced during the sample preparation procedure. Furthermore, RT-qPCR can be applied as an independent measurement of gene expression to validate the single-cell mRNA-Seq profiles.

Analysis of single-cell mRNA-Seq

There are many approaches to analyse global gene expression mRNA-Seq profiles from individual cells, depending on the research question. For our research, a prominent goal is to identify common and differential gene expression programmes between individual mESCs. Principal component analysis and hierarchical clustering of all the genes and their expression between individual cells will provide a comprehensive overview of the identity and similarity of the different cells present within a mESC population. This will provide us with an answer on whether the various different ESCs within the total population represent various stages of differentiation, as is the current thought¹². It might also shed more light on the observation that heterogeneity within ESC populations is a requirement for maintenance of pluripotency. Investigating modules of differentially expressed genes in more detail will teach us the (combinations of) transduction pathways that are active in the individual cells. Deep sequencing of many individual cells also has the potential to identify new gene modules with tightly intertwined regulatory patterns that are unique to ESCs.

By performing single-cell mRNA-Seq at multiple timepoints after differentiation we aim to gain more insight into the hierarchical order in which genes are activated during lineage commitment towards more differentiated cell types such as neural progenitor cells (NPCs). Identifying the master regulators key to the specific differentiation pathways will provide valuable information on cell identity acquired during the formation of the various cell types. Additionally, it will help us understand diseases associated with defective regulation of these transcription programmes, which might be valuable for the future development of medical treatments.

Outlook

Although still in its infancy, single-cell RNA-sequencing will revolutionise present day molecular biology by being able to capture differences in gene expression between individual cells, and to obtain comprehensive genome-wide RNA profiles from rare in vivo cells that cannot easily be amplified in vitro. Due to the heterogeneity in many biological samples – be it developing embryos or cancer tumours – these high-throughput RNA-sequencing experiments are rapidly becoming the new standard for gene expression analysis. Recent breakthroughs^13,14 show the power of single-cell transcriptome analysis to gain detailed and accurate understanding of cell identity, cell-cell interactions and cell differentiation. Our research of single-cell mRNA-Seq on in vitro mouse ESCs will generate fundamental understanding of the master regulators that enable the blastocyst cells to develop into an entire organism. By studying individual cells in great detail we will be able to unravel gene expression regulation within individual ESCs, as well as expression dynamics within a population of ESCs. These insights are not only relevant for mouse embryogenesis, but will give us knowledge about the molecular development of tissues and teach us valuable lessons on gene regulation.

References

Martin, G.R., Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells. Proc Natl Acad Sci U S A, 1981. 78(12): p. 7634-8.
Evans, M.J. and M.H. Kaufman, Establishment in culture of pluripotential cells from mouse embryos. Nature, 1981. 292(5819): p. 154-6.
Smith, A.G., et al., Inhibition of pluripotential embryonic stem cell differentiation by purified polypeptides. Nature, 1988. 336(6200): p. 688-90.
Loh, K.M. and B. Lim, A precarious balance: pluripotency factors as lineage specifiers. Cell Stem Cell, 2011. 8(4): p. 363-9.
Marks, H. and H.G. Stunnenberg, Transcription regulation and chromatin structure in the pluripotent ground state. Biochim. Biophys. Acta, 2014. 1839(3): p. 129-37.
6. Kivioja, T., et al., Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods, 2012. 9(1): p. 72-4.
Li, J., H. Jiang, and W.H. Wong, Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol, 2010. 11(5): p. R50.
Risso, D., et al., GC-content normalization for RNA-Seq data. BMC Bioinformatics, 2011. 12: p. 480.
Wu, A.R., et al., Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods, 2014. 11(1): p. 41-6.
Jiang, L., et al., Synthetic spike-in standards for RNA-seq experiments. Genome Res, 2011. 21(9): p. 1543-51.
Marinov, G.K., et al., From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing. Genome Res, 2014. 24(3): p. 496-510.
Tanaka, T.S., Transcriptional heterogeneity in mouse embryonic stem cells. Reprod Fertil Dev, 2009. 21(1): p. 67-75.
Xue, Z., et al., Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature, 2013. 500(7464): p. 593-7.
Deng, Q., et al., Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 2014. 343(6167): p. 193-6.

Acknowledgements

We thank the Head of the Department of Molecular Biology Professor Henk Stunnenberg for continuous support, and our colleagues of the Embryonic Stem Cell group for valuable discussions. Thanks to NWO for providing a personal grant (NWO VIDI; 864.12.007) to Hendrik Marks for this research. This article is based on a Dutch paper that we recently published in ‘Analyse’ (Nr. 3; Juni 2014), a bi-monthly journal for biomedical research (https://www.nvml.nl/35/Analyse.html).

Biography

Hendrik Marks obtained his MSc and PhD from Wageningen University (NL), followed by postdoctoral trainings at the University of British Columbia (Vancouver, CA) and the Radboud University (Nijmegen, NL). He is currently an Assistant Professor in epigenetics and stem cells at the Radboud Institute of Molecular Life Sciences (RIMLS, Nijmegen, NL). The aim of his research group is to unravel the regulatory mechanisms that maintain ESCs in their pluripotent state using global (single-cell) transcriptome, epigenome and proteome approaches. [email protected]

René Dirks holds an MSc degree in molecular biology. He is working on the characterisation of heterogeneity within pluripotent and differentiating stem cell populations as part of his doctorate’s degree at the Radboud Institute for Molecular Life Sciences (RIMLS, Radboud University Nijmegen, NL).

Issue

Issue 3 2014

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended

Understanding early mouse embryonic development using single-cell mRNA Sequencing

Single-cell mRNA-Sequencing (mRNA-Seq)

Individual steps of single cell mRNA-Seq for mouse ESCs

Quantification and validation of single cell mRNA-Seq