Next Generation Sequencing: Current realities in cancer biology

Share via

Posted: 16 February 2011 |

The rate of progress in molecular cell biological sciences has become dramatic. This is fuelled in part by developments in technology, none more so than in the field of nucleic acid sequencing. So-called Next Generation Sequencing Platforms promise to revolutionise our understanding of the importance of genetic differences on an individual basis. According to the modern personalised or stratified medicine paradigms, this will revolutionise current practices in terms of early detection, treatment, diagnosis, prognosis and even prevention. Revolutions are apt to disappoint and drug pipelines have yet to justify such optimism yet molecular geneticists can point already to notable successes like the completion of their flagship project, the human genome in 2001, within time and within budget. What are the current realities? The field of cancer serves as an excellent test and would suggest that advances are being made incrementally but rapidly.

Cancer is a genetic disease in which inheritance can play a part. The huge complexity of acquired genetic changes is however of major importance. Functional consequences for a subset of these changes have already been determined and efficacious new molecular therapies brought to patients as a consequence. Herceptin targeting of the HER2 receptor in breast, Erbitux targeting of the EGFR receptor in colorectal and Gleevec targeting of the BCR-ABL gene fusion products in chronic myelogenous leukemia all serve as paradigms of how knowledge about underlying oncogenic changes at the genetic level can be exploited to develop targeted therapies for use on a personalised basis, dependent on the presence of the alteration within individual patients. A major remaining challenge is to understand and act on the overwhelming remainder of the complexity which varies markedly on an individual basis and contributes to the development of resistance mechanisms.

First, it is useful to consider the progress already made using previous technologies including Sanger dideoxy sequencing, the almost exclusive method for over 30 years. Despite the limitation of older technologies, they have still been able to yield important insights into the significance of genetic changes in cancer and provide a platform on which the newer technologies can build. Sequencing studies have largely screened for mutations in exonic regions because of their presumed functional importance. Long, accurate read lengths allowed large PCR amplified regions from genes of interest to be read in a high throughput manner achieved by batch processes borrowed from the human genome sequencing project. Early comprehensive studies examined gene families, in particular the protein kinase superfamily because it contains many important effecter molecules implicated in cancer. The ‘kinome’ of 25 primary breast cancers yielded 92 somatic mutations¹. These were distributed unevenly between patients with almost half not yielding a mutation and over half of the mutations occurring in one individual, suggestive of a novel mutator phenotype in the latter. There was a significant excess of non synonymous mutations, suggesting that a subset contribute to cancer development, so called ‘drivers’ and the converse that not all changes are relevant, so-called ‘passengers’ that owe their existence to co-occurrence with positively selected changes.

Extended studies screened complete exomes including those of the common cancers, breast and colorectal and the aggressive cancers pancreatic and glioma^2-4. These indicated that a few genes such as TP53 were commonly mutated in many cases and in different cancer types, whilst a larger number of genes were mutated at a low frequency. PALB2 was identified as a susceptibility gene for pancreatic cancer⁵ and IDH1 was found to be commonly mutated in glioma when it was associated with a more favourable prognosis, further supporting the case for genetic stratification of cancers. Bioinformatics tools allowed mutations to be classified according to their likely significance based on their mutated frequency, disruptive effect and presence within conserved domains. The most convincing measures were based on evidence of clustering, especially around interface residues or active sites as found for GALNT5 and TGM3. Extensively curated gene based signalling pathways and networks, especially when mutations were combined with genome wide copy number alterations and expression data, allowed wider predictions to be made regarding the importance of key cellular pathways. Simple contingency tables could be set up to test whether alterations affected particular pathways more than would be expected by chance. In pancreatic cancers, 12 core pathways were affected in at least 67 per cent of the cases, with KRAS, G1/S, Hedgehog, TGF-beta and Wnt/Notch being affected in 100 per cent of the tumours.

Despite the success of these studies, significant challenges remain for the personalised cancer medicine paradigms. Few of the alterations discovered in the previous studies have been experimentally validated ex vivo and this would be outside of the capacity of current methods. The importance of the alterations has been statistically inferred and given the large numbers involved, false discovery rates may be high. Relatively narrow screens were performed which were poorly powered to sample the total possible variation. Typically, one million naturally occurring single base polymorphisms will differ between any two individuals. There are also extensive, natural polymorphic copy number variants which may encompass structural rearrangements including complex inversions. Somatic alterations can also include large scale gains and losses of material, plus intra and inter chromosomal rearrangements for which consequent gene fusions like TMPRSS2-ERG in prostate cancer have been considered to be the most significant⁶. Tumour samples are an admixture of tumour and normal cells placing increased demands on methods of scoring genetic variation because of the possible dilution of the tumour DNA signal by that from the normal cells. The latter can often be in the majority whilst the former can be genetically heterogeneous. Whole exome studies passaged the tumours through mouse to first remove the normal cells, also technically challenging. High density array platforms have made inroads towards addressing this complexity especially through comparative genome hybridisation (CGH) but the resolution is limited by probe density per genome and provides little insight into structural rearrangements^7,8.

This is where Next Generation Sequencers come in. The highest throughput machines produce relatively short reads of 50 to 100 bases per template. Their power resides is achieved by reading hundreds of millions of templates together. Coverage allows for each base to be read 30 to 100 times for accuracy and since each read represents a single original molecule, polymorphisms can be distinguished with high probability by comparing their relative frequency, even in admixtures. Moreover, each template can be read from both ends allowing structural arrangements to be detected through mapping opposite ends of individual molecules back to obviously different parts of the genome. When the opposite ends of high numbers of short paired end reads map to different parts of the genome, this can only arise if those parts of the genome have become contiguous through genomic rearrangements. Initial studies mapped genomic structural locations of alterations. Complete coverage of the genome isn’t required to detect the rearrangements since they are inferred by the non contiguous ends. The arrangements turned out to be extensive but quite variable within cancer types, some having few such changes and others many. In breast, most were intrachromosomal rearrangements but there was little evidence for recurrent gene fusions⁹. Counting the reads per genomic region also allowed copy number changes to be assessed more quantitatively and accurately than by CGH. Copy number changes were often associated with rearrangements leading to amplicons¹⁰. Extended exon sequencing studies by Next Generation Sequencers require robust methods of target enrichment. These have proven to be problematic with few techniques achieving a satisfactory balance adequately representing all of the intended target regions at depth whilst minimising off target representation. Most successful methods rely on target capture by hybridisation to synthetic oligonucleotide probe sequences¹¹. Provided that adequate bioinformatic resource is available, technically, it is easier to sequence whole genomes but clearly, this limits the number of cases possible per study. Output of the current leading platforms produced by ABI / Life Sciences and Illumina however have increased nearly 1000 fold in less than two years to now produce approximately 500 billion bases per machine per run. Each of the highest capacity machines has the capability to accurately compare between five and 10 human genomes simultaneously. This has spurred efforts to fully characterise cancer genomes.

The Cancer Genome Consortium is an international initiative created to characterise genetic variation across 50 cancer types of importance across the globe and their subtypes. Its strategy is based on the premise that given enough data, the significance of mutations can be inferred from their frequency. Aiming to detect mutations occurring at a frequency of three per cent or more, it will require the genomes of 500 cancers per to be sequenced or 25,000 in total¹², needing 100,000 billion accurate bases determined from over 300 million, billion raw bases. In practice, the heterogeneity of specific types cannot be known and subtypes may have differing mutation spectra, further impacting on the final numbers needed. For example, Sorlie and colleagues identified six different breast cancer types based on global gene expression analysis¹³. This classification has been broadly accepted, in part because it associates with outcome data and also because it is consistent with accepted histo pathological markers including triple negative, basal, ER HER2 and ESR1 positive types. Similarly, Chronic Lymphocytic Leukaemia’s vary according to the rearranged status of their VHL genes.

Complete sequencing of lung cancer and melanomas has provided new insights into the extent and nature of acquired changes. In the lung cancer, over 22,000 somatic changes were found and in the melanoma, there were over 33,000^14,15. These reflected the prevailing mutagens being largely tobacco related in the case of lung and two thirds were UV light related in the melanoma. It again brings into question the nature of ‘drivers’ and ‘passengers’. A significant proportion of the changes could have a small undetectable effect that would be hard to therapeutically manipulate. Alternatively, there may be a few important changes in a background of ‘passengers’. Further sequencing will help to distinguish these possibilities. An important message can be connected with the melanoma study. Conventional sequencing had identified mutation in BRAF as an important ‘driver’ in 66 per cent of melanomas¹⁶. Drugs that antagonise the mutationally activated BRAF have been developed and are achieving unprecedented responses for this notoriously resistant tumour type in early stage trials¹⁷. Despite the seemingly overwhelming background of genetic alterations in melanoma, there may be far fewer, large effect changes that can be targeted. Melanoma appears consistent with the oncogene addiction hypothesis which asserts that each cancer is dependant on one or a few oncogenic changes.

Next Generation Sequencing is unlikely to be restricted to screening for functional changes. High read depths open the possibility for extremely sensitive detection enabling applications such as detection of early stage cancer or residual disease in resection margins. Response or relapse could similarly be monitored in bodily fluids like blood or ascites. Pharmacogenomic studies may be similarly transformed. Single Nucleotide Polymorphism derived haplotypes currently used in association studies to compare genotypes with outcome are commonly monitored via assay platforms based on detection by hybridisation or mass spectroscopy. These can be costly and unwieldy but most importantly are based on early estimates of human genetic variation. The pilot phase of the ongoing 1000 human genomes project where Next Generation Sequencing has been used to read whole genomes for 1000 people, reports whole genomes for 179 individuals and exon sequences for 697 individuals representing seven populations¹⁸. Fifteen million more new polymorphisms have been catalogued, more than doubling the existing collection. Rare variants with a large impact may be more important to the population as a whole than common variants having a small impact. Sequencing may prove to be more effective in association studies than current approaches. Clinical studies will require faster turn around times than current systems achieve. In this respect, new platforms like the Ion Torrent are a welcome development. This silicon chip based detection has lower throughput currently compared to the leaders and costs per base higher. It does however have simple operation and a small footprint with a two hour turn around time, a clear advantage for real time clinical decisions like patient stratification.

References

Stephens, P., et al., A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat Genet, 2005. 37(6): p. 590-592
Wood, L.D., et al., The Genomic Landscapes of Human Breast and Colorectal Cancers. Science, 2007. 318(5853): p. 1108-1113
Jones, S., et al., Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science, 2008. 321(5897): p. 1801-1806
Parsons, D.W., et al., An Integrated Genomic Analysis of Human Glioblastoma Multiforme. Science, 2008. 321(5897): p. 1807-1812
Jones, S., et al., Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene. Science, 2009. 324(5924): p. 217
Tomlins, S.A., et al., Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer. Science, 2005. 310(5748): p. 644-648
Beroukhim, R., et al., The landscape of somatic copynumber alteration across human cancers. Nature. 463(7283): p. 899-905
Bignell, G.R., et al., Signatures of mutation and selection in the cancer genome. Nature. 463(7283): p. 893-898 Bignell, G.R., et al., Signatures of mutation and selection in the cancer genome. Nature. 463(7283): p. 893-898
Stephens, P.J., et al., Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature, 2009. 462(7276): p. 1005-1010
Campbell, P.J., et al., Identification of somatically acquired rearrangements in cancer using genomewide massively parallel paired-end sequencing. Nat Genet, 2008. 40(6): p. 722-729
Mamanova, L., et al., Target-enrichment strategies for next-generation sequencing. Nat Meth. 7(2): p. 111-118
International network of cancer genome projects. Nature. 464(7291): p. 993-998
Sorlie, T., et al., Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(19): p. 10869-10874
Pleasance, E.D., et al., A comprehensive catalogue of somatic mutations from a human cancer genome. Nature, 2009. advance online publication
Pleasance, E.D., et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature, 2009. advance online publication
Davies, H., et al., Mutations of the BRAF gene in human cancer. Nature, 2002. 417(6892): p. 949-954
Flaherty, K.T., et al., Inhibition of Mutated, Activated BRAF in Metastatic Melanoma. New England Journal of Medicine. 363(9): p. 809-819
A map of human genome variation from populationscale sequencing. Nature. 467(7319): p. 1061-1073

About the Author

Professor D. Ross Sibson directs the CCR, Applied Cancer Biology Labs in the CR-UK Cancer Centre at the University of Liverpool. He has a long standing public/private sector interest in high throughput gene based analysis (Unilever Research, Amersham International, MRC – biological manager UK-HGMP, Glaxo). He is the inventor for multiple sequencing related patents, the former Director of Research at InteraSeq, he is active in Next Generation Sequencing and currently concerned with how systems analysis can inform the exploitation of cancer biomarkers.

Issue

Issue 1 2011

Related diseases & conditions

Cancer Biology

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended