Application of next generation sequencing to preclinical cancer model profiling

Bradford, James R.

Next generation sequencing: Application of next generation sequencing to preclinical cancer model profiling

6

SHARES

Share via

Posted: 15 December 2013 | James R. Bradford

Preclinical cancer models allow us to gain insight into therapeutic potential and mechanism of anti-cancer agents early in the drug discovery process. Whilst traditional array-based approaches have made a significant contribution to the characterisation of these models, the advent of next generation sequencing has revolutionised genomic research and is anticipated to make a huge impact on our understanding of preclinical models, leading to more targeted therapies for cancer patients. This article provides an overview of next generation sequencing in the context of cancer model profiling and evaluates the choice of technologies available and their application to both in vitro and in vivo model characterisation.

Roche acquires in vitro diagnostics company GeneWEAVE

Preclinical cancer models are critical to the development of anti-cancer therapeutics and advancing our understanding of cancer biology. They are consistently used as a platform to investigate therapeutic mechanism of action and identify potential biomarkers prior to clinical trials in which similar exploration is often complicated, unethical and expensive. Key to the continued relevance of preclinical models in early drug development is the provision of more detailed information on their molecular characteristics leading to deeper understanding of disease drivers, insight into the alignment of these models to human disease segments and provision of a richer pool of potential biomarkers. For this reason, an increasing number of pharmaceutical companies, academic institutions and model providers are turning to next generation sequencing (NGS) to profile both their in vitro and in vivo preclinical cancer models. NGS (also known as massively parallel nucleotide sequencing or second generation sequencing) involves high-throughput generation of gigabases of sequence data at a relatively low cost per residue. A variety of platforms exist, but all rely on the generation of a large number of relatively short sequences known as ‘reads’ that can then be aligned to a target database, or assembled de novo into contiguous sequences. As a result, the sequencing of whole transcriptomes, exomes and genomes has now become feasible causing a shift from more focused approaches such as capillary-based (Sanger) sequencing and DNA/RNA microarrays to comprehensive genome-wide analysis.

Whilst NGS has significantly impacted molecular biology in general, it has profound implications for understanding cancer and efforts to improve diagnosis and treatment. This is because the majority of cancers are triggered by an accumulation of genomic alterations such as single nucleotide variations (SNVs), copy number changes (amplifications / deletions), and chromosomal rearrangements (inversions / translocations). The high sequence coverage (repeated sequencing of the genomic locations under study) commonly achieved by NGS makes it particularly applicable to the detection of low frequency genomic alterations prevalent in heterogeneous cancer samples, novel chromosomal rearrangements and copy number alterations at high resolution.

NGS technologies for preclinical cancer model characterisation

Several NGS approaches are available to profile cancer models and the choice of technology will depend on the type and scope of question being asked, anticipated lifespan of the data and budget. A common practice is to use a combination of two or more approaches, exploiting the strengths of each to capture the most information at lowest cost. Three of the most popular NGS technologies for cancer model profiling are briefly discussed below and summaries in Table 1.

Table 1: Common options for NGS characterisation of pre-clinical cancer models

Information type	RNA-Seq	Whole exome	Whole genome	Targeted deep
Expression quantification	Yes	No	No	No
Coding SNVs/small indels	Maybe¹	Yes	Yes	If targeted
Gene fusions	Yes	No	Maybe³	If targeted
Copy number	No	Maybe²	Yes	No
Non-coding SNVs/small indels	No	No	Yes	If targeted
Structural variants and other rearrangements	No	No	Yes	No
Splice isoform usage pattern	Yes	No	No	No
Cost⁴	££	££	£££	£ / ££
¹ Refer to caveats in main text ² Possible if affects targeted exons and comparative data available ³ If captured by structural re-arrangement ⁴ Approximate costs per sample: £=200-500GBP, ££=700-1200GBP, £££=>2000GBP

Whole genome sequencing

Assuming sufficient coverage, whole genome sequencing provides the most complete profile of a cancer genome allowing the detection of SNVs, copy number changes and chromosome structure rearrangements in a single sequencing run. Since sequencing is not simply restricted to coding regions, whole genome sequencing allows discovery of mutations in regulatory regions such as promoters and enhancers, other non-coding regions such as microRNAs, as well as previously unexplored loci. Copy number changes can also be detected at high resolution with clear breakpoint definition, removing the need for an additional array-based copy number detection experiment. Despite these benefits, whole genome sequencing can be expensive compared to more targeted sequencing approaches due to the amount of sequencing required to achieve robust statistical confidence in aberration calls, especially for cancer genomes in which sample heterogeneity and ploidy need to be accounted for. In common with other sequencing approaches, whole genome sequencing can also suffer from potential sequence bias at GC-rich regions, and repetitive sequences are problematic due to reduced probability of achieving unique read alignment at these loci.

Targeted sequencing

Targeted sequencing offers increased sequence coverage at regions of interest at lower cost than whole genome sequencing. Most methods involved a capture step in which DNA or RNA baits hybridise and enrich for specific regions of interest in the total pool of nucleic acids. These regions are then amplified to undergo massively parallel sequencing. Whilst any fraction of the genome can be targeted, including non-coding regions, the most common approaches target either a small panel of genes of specific interest (targeted deep sequencing) or the exome (whole exome sequencing).

Targeted deep sequencing provides extremely high sampling of a small fraction of the genome resulting in statistically robust aberration calls and low frequency mutation detection albeit across a limited number of regions of interest. In theory, any location can be targeted, including non-coding loci, and fusion detection is possible if the breakpoint is known. More common is the use of a standard cancer panel such as Illumina’s TruSeq Amplicon Cancer Panel, or Life Technologies’ Ion AmpliSeq Cancer Hotspot Panel, designed to cover mutational hotspots across 48 and 50 oncogenes and tumour suppressor genes respectively. If budget is limited, such panels are usually sufficient to provide a high confidence set of somatic mutation calls across an established cancer associated gene set. Recent innovations such as Agilent Technologies’ Haloplex allow custom design of larger panels comprising 200-500 genes and, whilst generally more expensive than standard panels, offer a lower cost alternative to exome sequencing for more hypothesis-led exploration.

Exome sequencing offers a broader targeted approach and since exons comprise only one per cent of the genome, uses considerably less raw sequence than whole genome sequencing to achieve equivalent coverage at lower cost. It therefore represents a cost effective alternative to whole genome sequencing for mutation detection across coding regions and is proving a popular option for model characterisation. Current limitations include potential inefficiency in the targeting process that can result in missed exons although this is expected to improve as the technology matures.

Transcriptome sequencing

Whilst whole genome and targeted sequencing approaches sequence genomic DNA, transcriptome sequencing (also known as RNA-Seq) sequences cDNA derived from RNA species such as mRNA or miRNA. In transciptome sequencing, the set of reads generated during a sequencing run is treated as an unbiased sampling of the total nucleotide complement of the cells, making it possible to use the number of reads aligning to a given transcript as a measure of its expression level. RNA-Seq offers several technical advantages over arrays including greater sensitivity and dynamic range and the avoidance of probe effects. In addition to RNA quantification, RNA-Seq can be used to detect expressed transcript variants including splice isoforms and gene fusions. RNA-Seq may also be used an alternative to exome sequencing for mutation calling but this carries a number of caveats including lack of statistical power in calls from genes expressed at low levels, missed mutation calls in genes with undetectable expression, and false positive calls resulting from reverse transcriptase errors and RNA editing. Nevertheless, RNA-Seq offers rich information content at costs becoming increasingly competitive with array platforms.

Considerations for NGS mutation calls across preclinical cancer models

Inherent in all NGS technologies is the potential for false positive and false negative outputs. For example, sequencing errors and read misalignments can result in false positive mutation calls whereas false negative calls can result from insufficient coverage, particularly in more heterogeneous cancer samples. Both can be ameliorated by increased sequencing depth, and with use of appropriate parameters, most mutation calling software have some capability to distinguish sequencing errors from genuine mutations. Once a set of high confidence mutation calls has been established, a further challenge is to distinguish somatic from germline mutations. With clinical samples, the majority of germline mutations can be detected by comparison between tumours and matched normal mutation calls. However, this is not usually possible with preclinical models, particularly cell lines. In these cases, one option is to compare the predicted mutations against public single-nucleotide polymorphism (SNP) databases such as dbSNP¹ or the 1000 Genomes Project² to remove previously described variants that occur naturally in the human population. However, these databases are becoming increasingly populated with somatic mutations, thus some cancer-related mutations could be incorrectly discarded. An alternative approach is to consider minor allele frequency alongside the database searches. Germline mutations have expected minor allele frequencies of either 50 per cent for heterozygous events or 100 per cent for homozygous events, whereas somatic mutation allele frequencies are influenced by tumour heterogeneity, ploidy and local copy number resulting in allele frequencies anywhere between 0 – 100 per cent. Therefore, by only removing common SNPs with a minor allele frequency greater than one per cent, many germline mutations can be filtered out with minimal loss of somatic variants. Finally, a useful method of establishing clinical relevance of a somatic mutation detected in a preclinical model is to compare with known mutations across clinical samples in databases such as The Cancer Genome Atlas³.

Next generation sequencing of in vitro cancer cell line models

Cultured cancer cells remain the most commonly used preclinical models despite limitations such as sub-optimal modelling of the in vivo tumour microenvironment and inability to study the effects of the body on drug distribution and metabolism. Large panels of cell lines have therefore been the subject of several profiling initiatives, each providing a comprehensive collection of information at the level of RNA and DNA together with drug-sensitivity profiles across hundreds of cell lines covering a range of cancer types. Whilst these initiatives have generated extensive array based datasets, NGS is becoming increasingly exploited to supplement already valuable information. For example, the Cancer Cell Line Encyclopedia⁴ project used hybrid capture followed by targeted deep sequencing to detect mutations across a panel of 1651 genes in ~1000 cell lines. Whole exome sequencing data has been released across the NCI-60⁵ panel of 59 cell lines from nine different tissues and will soon be available across the Sanger cell line panel⁶. So far, mainly untreated cell lines have been characterised through the initiatives highlighted although the number of smaller scale studies that have used NGS to detect dynamic markers of response to compound treatment and understand therapeutic mode of action continues to grow. Examples of these can be found in public databases such as the Gene Expression Omnibus⁷ and the European Nucleotide Archive⁸.

Next generation sequencing of in vivo cancer models

In vivo models such as xenografts established from either cancer cell lines or patient-derived tumour tissue are commonly used to model response to targeted therapeutics, and the intrinsic or acquired resistance mechanisms that can limit therapeutic benefit. Both offer several advantages over cell line cultures such as more accurate modelling of the tumour microenviroment, drug metabolism and distribution. Patient derived tumour models (or explants) provide additional benefits since these are not grown on plastic or adapted to culture conditions at any stage. As a consequence, many of the original tumour characteristics are retained such as heterogeneity, clinical molecular signature, and architecture and as such they better represent the patient population. Furthermore, many explants can be established for disease segments not represented by cell lines. However, explant establishment is costly and often dependent upon histological type, therefore a bias exists towards high grade tumours and untreated patient samples. The genetic background of many explants models is poorly characterised and profiling efforts are hampered by samples containing a mixture of human tumour and surrounding mouse host tissue. To address the former, many explants providers are using NGS approaches to improve characterisation of their models, most commonly using targeted deep sequencing of a small panel of genes to identify driver mutations. Accurate separation of tumour and host has recently been demonstrated by RNA-Seq⁹ making feasible the study of agents that impact both the tumour and stroma in a single sequencing run without the need for specialist experimental protocols to separate human and mouse genomic material. Differentiating the effects on the tumour and its surrounding tissue is critical to the development of a clinically relevant understanding of new therapeutic activity.

Conclusions

The application of NGS to preclinical cancer model characterisation is still in its infancy with many more studies and innovations anticipated. For example, in addition to the technologies described above, NGS also offers the capability to characterise methylation, histone packaging and regulatory protein binding positions. Therefore, an important goal will be the systematic integration of such a broad spectrum of data from different NGS technologies enabling more accurate evaluation of a model’s clinical relevance through comparison with clinical samples that have undergone similar analyses. While sequencing costs continue to decrease, NGS profiling of preclinical cancer models is likely to become increasingly routine. This in turn presents a major challenge in the provision of sufficient computational infrastructure and domain knowledge to process and interpret the wealth of data, and build a more complete understanding of preclinical models which ultimately translates into therapeutic benefit for the cancer patient.

Acknowledgments

Thank you to Hedley Carr (AstraZeneca) for permission to use the original concept for Table 1.

References

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29: 308-11
The 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65
The Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068
Barretina, J. et al. (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607
Abaan et al (2013) The Exomes of the NCI-60 Panel: A Genomic Resource for Cancer Biology and Systems Pharmacology Cancer Res 73: 4372
Garnett, M. J. et al. (2012) Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575
Barrett et al (2013) NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 4: D991-5
http://www.ebi.ac.uk/ena/
Bradford et al. (2013) RNA-Seq differentiates tumour and host mRNA expression changes induced by treatment of human tumour xenografts with the VEGFR tyrosine kinase inhibitor cediranib. PLOS One, 10.1371/journal.pone.0066003

Biography

James Bradford gained a PhD from the University of Leeds in 2001 in developing novel approaches to study protein-protein interactions. He continued in Leeds as a post-doctoral researcher shifting focus to machine learning applications in gene function prediction motivated by data generation in genomics and proteomics. James then moved to the Paterson Institute for Cancer Research, Manchester where one of his primary roles was developing and implementing Next Generation Sequencing workflows leading to publication of the first RNA-Seq/Exon array platform comparison study. Since 2011, he has been a Senior Oncology Bioinformatics Scientist at Jameast driving new target research, and preclinical model and Next Generation Sequencing informatics capability builds.

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended