Proteomics in the pharmaceutical industry: is the analytical challenge just too large?

Posted: 23 January 2008 | Thierry Rabilloud, PhD, Head of proteomics group, Biophysics and Biochemistry of Integrated Systems, iRTSV/LBBSI, UMR CNRS 5092 | No comments yet

Large scale techniques, such as combinatorial chemistry, high throughput screening and the various “omics” techniques, have largely entered the pharmaceutical and diagnosis industry besides the more classical and targeted approaches. Among these large scale techniques, proteomics is one for which there seems to be a widening gap between what is expected and what has been delivered to date, resulting in a strong questioning of the position and usefulness of proteomics in this industry.

Large scale techniques, such as combinatorial chemistry, high throughput screening and the various "omics" techniques, have largely entered the pharmaceutical and diagnosis industry besides the more classical and targeted approaches. Among these large scale techniques, proteomics is one for which there seems to be a widening gap between what is expected and what has been delivered to date, resulting in a strong questioning of the position and usefulness of proteomics in this industry.

Large scale techniques, such as combinatorial chemistry, high throughput screening and the various “omics” techniques, have largely entered the pharmaceutical and diagnosis industry besides the more classical and targeted approaches. Among these large scale techniques, proteomics is one for which there seems to be a widening gap between what is expected and what has been delivered to date, resulting in a strong questioning of the position and usefulness of proteomics in this industry.

The goal of this paper is to recall the strengths and weaknesses of proteomics, and to put them in perspective, so that more realistic expectations can be put forward. This can also lead to a more efficient rationale in implementing successful strategies in which proteomics will fully deliver.

The pharmaceutical and diagnosis industry is engaged in an ever-going race toward new molecules acting on new targets and/or new markers allowing a better diagnosis, prognosis or risk assessment. In this race, where time is a lot of money, heavily parallel screening is becoming increasingly important, as shown by the examples of high throughput screening and combinatorial synthesis. Within this frame of thinking, it is no wonder that proteomics, which can be defined as the large scale screening of proteins, has fostered a deep interest of the pharmaceutical industry in the last few years. However, there seems to be more and more voices stating that proteomics has not delivered what it promised, in other words that there is not enough bang for the bucks.

As often, this discrepancy between achievements and expectations results from too large expectations rather than really poor achievements, and in this sense, it is necessary to have a good appraisal of what current proteomics are able to deliver or not. This is a crucial point to integrate proteomics at its correct place in the discovery pipeline, so to decide what bucks the likely bang deserves.

Proteomics is no more than analytical chemistry of proteins on a large scale, but every word of this sentence plays its role in understanding the possible performances of proteomics. Large scale also means robustness, in the sense that a few analytical conditions (for example, solvent) should achieve the maximal coverage of the analyte range. This is certainly true for genomics and transcriptomics, in the sense that nucleic acids are always highly soluble in water, whatever they code for. This is certainly not true in the protein world, where some proteins are very soluble in water, some require a high ionic strength medium to be soluble and others require both a lipid and a water environment to be fully structured and active. Needless to say, there is no universal solvent able to dissolve each and every protein, and this clearly states that comprehensiveness is just out of the reach of proteomics, making a clear and marked difference with other “omics” techniques.

It can be argued that this is true for proteins, but that the situation in terms of solubility is much better for peptides, so that the most efficient proteomics techniques do not rely on the separation and analysis of complete proteins, but on the separation and analysis of peptides arising from directed proteolysis of the proteins to be studied1,2. While this is certainly true as far as solubility is concerned, even peptides represent a complex enough world not to be soluble in a limited number of conditions compatible with a high sensitivity analysis.

This high sensitivity analysis is required because many proteins in cells or biological fluids are present at very low concentrations, while some proteins are present at fairly high concentrations. Of course, the expression range, for example the quantitative difference between the rarest and most abundant protein in the biological sample of interest, is greatly dependent on the sample of interest. This dynamic range covers 4 orders of magnitude in E. coli, 5 in yeast3, 6 in a typical mammalian cell and up to 12 orders of magnitude in a complex biological fluid such as plasma4.

Behind the dynamic range challenge lies the complexity challenge, for example the number of analytes to be taken into account. While the genomic world is known to be finite, and of lesser complexity than expected (for example 25000 genes in human opposed to the more than 100000 expected in the 90’s) the complexity of the proteomics world is much less well known. However, the only thing we are sure of is that this complexity is much higher than the one of genomics. For example, post translational modifications are known to modulate protein function and/or localisation, sometimes heavily, so that various forms arising from the same translation product should be considered and analysed as different.

Here again, the situation varies greatly between different organisms. While simple prokaryots probably show less than 3000 different proteins5,6, 10 to 20 thousands proteins are expected for a typical mammalian cell7, and complex biological fluids probably contain a fraction of a million different protein forms.

When all these biological figures are taken together and put in front of the analytical power of the current proteomic technologies, it becomes obvious that the coverage reached by proteomics is widely different depending on the system studied. While the proteomics study of prokaryots (for example bacteria of clinical interest) is not comprehensive but of decent coverage (probably up to 70-80%), only much less than half of the expected proteome is covered in mammalian cells. The situation is much worse for fluids, where the enormous dynamic range makes many strategies completely inefficient8. When 12 orders of magnitude are at play, removing 99% of the protein mass will just bring that down to 10 orders of magnitude, which is far beyond the power of any analytical technique.

Furthermore, in most applications, a list of proteins is far from being relevant. In most cases, we need to be able to measure quantitative levels between different situations, for example to compare a pathogenic bacterial strain to a non pathogenic one, or to compare a disease state to a control one, or to compare a treated animal to a control one. This need for quantitation further complicates the analyses, and decreases their performances.

Because of this acute lack of comprehensiveness, proteomics cannot be used as other large-scale screening processes are used in the pharmaceutical and diagnosis industry. Moreover, this also means that the possible uses will change depending of the system of interest, and it is therefore interesting to sort the possible schemes of applications of proteomics depending of the scientific areas considered.

In the drug discovery process, the incomplete nature of the proteomics analysis can pose major problems depending on the system of interest. While proteomics can be highly performing on simple systems, for example bacteria9, its inability to analyse minor proteins will be a major problem for finding biological targets of interest in more complex systems, typically human cells. As a matter of fact, in order to reach such minor proteins, extensive enrichment must be carried out. This means in turn that strong research hypotheses are to be made, so that the proteomics experiments loses its general screening dimension and thus loses most of its interest.

In the biomarker discovery field, for example for diagnosis or prognosis purposes, it must be stressed that the most obvious strategies, for example, comparing plasma or serum of diseased people verses healthy ones, with or without depletion of the most abundant serum proteins, will not give rise to any valuable biomarker. These strategies have been widely used and have not yielded more than inflammation markers, which despite not being disease-specific, are within the reach of the analytical techniques we have at hand.

Keeping this in mind, it is quite interesting to make some retrospective analysis to decide of superior strategies. Most of the protein markers we use today in diagnosis are proteins released in the bloodstream by tissue leakage, usually at low levels4. Although it can be anticipated that mass spectrometry techniques will be able to detect them in the future10, nowadays these proteins are detected by immunological procedures or by their enzymatic activities, when they have one. However, it must be kept in mind that almost all measurable enzymatic activities have probably been tested as putative biomarkers, and that only a few are useful in practice. Failure as a biomarker can arise from multiple factors, including, but not limited to, protein stability, interindividual variance, too indirect correlation with the disease, ubiquitous localisation and so on. There is no reason that this tremendous attrition rate should not apply also to protein markers found by other means, including proteomics, but this is, in some sense, the name of the game in biomarker discovery.

Thus, because of this attrition rate and because of the anticipated low abundance of interesting biomarkers, efficient strategies must be devised for the discovery of putative biomarkers. The most obvious one, although of limited scope, is to use a fluid less complex than plasma, such as urine and cerebrospinal fluid (CSF). It must be stressed that finding a protein in CSF that is indicative of a Creutzfeldt-Jakob disease is one of the few success stories of proteomics in the biomarker field11, although the value of this marker is not perfect12. This strategy is then valid, within the cases where such a fluid is relevant.

If biomarkers are to be found in the bloodstream, which is the general case, then two strategies can be put into place.

The first strategy is clearly to stand with the viewpoint that biomarkers are coming from tissue leakage. In this case, the question to ask is to find a protein that is specifically released by diseased cells of the organ of interest, compared to normal cells of this organ and to normal cells of other organs. In some instances, the initial search is carried out on a complete cellular extract in a cellular model of the disease of interest, and when such a differentially-expressed protein has been found, its presence is searched in serum and checked to be correlated with the disease of interest. A good example of this strategy has been published recently13. However, it can be anticipated that such strategies would benefit from being carried out on “secretomes” (for example, proteins released by cells in the surrounding medium) rather than on complete cellular extracts, as these secretomes show different protein profiles14.

The second strategy is to keep the analysis on serum/plasma, but to try to reach low abundance components. As this cannot be carried out in a general way, then subclasses of proteins must be selected for analyses: in other words, replacing the inefficient negative selection (depletion) by a positive selection process. A good example of this strategy is the work focused on glycoproteins and published recently15, but other means of selecting proteins on a structural feature, for example binding of a ligand such as ATP, can be investigated and should be able to provide the required depth of analysis, provided that the major plasma proteins are not selected by this means.

Proteomics can also be applied in the field of toxicology. It must be kept in mind that proteomics has been successful in this field quite early on16. While toxicological studies are very labour intensive, finding of indirect markers is much less a problem than in drug discovery, so that the poor comprehensiveness of proteomics is a much less acute problem. Of course, the strong limitations described for biomarkers apply when the toxicological study is made on a complex biological fluid. However, the reduced dynamic range observed in target cells or tissues makes the proteomics setup rather attractive when carried out on this type of material17,18. Unfortunately, the labour-intensiveness of proteomics in toxicology has considerably limited its use, while this is likely to be one of the areas where proteomics should deliver quite interesting results.

Finally, proteomics can be applied in more exotic fields, such as the control of bioproduction19, or in the quality control of products made by bioproduction. In the latter aspect, the use of “erasing” or equalising strategies allowing a better analysis of trace contaminants, can prove very powerful20.

In conclusion, proteomics can certainly deliver valuable results in the frame of the need of the pharmaceutical and diagnosis industry. However, the current limitations of the analytical tool must be carefully taken into account to avoid delusion.


  1. Schirle M, Heurtier MA, Kuster B. Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics. 2003 2:1297-1305
  2. Wolters DA, Washburn MP, Yates JR 3rd. An automated multidimensional protein identification technology for shotgun proteomics. Anal Chem. 2001 73:5683-5690.
  3. Lu P, Vogel C, Wang R, Yao X, Marcotte EM. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol. 2007 25:117-124
  4. Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics. 2002 1:845-867.
  5. Wilkins MR, Sanchez JC, Williams KL, Hochstrasser DF. Current challenges and future applications for protein maps and post-translational vector maps in proteome projects. Electrophoresis. 1996 17:830-838
  6. Tonella L, Hoogland C, Binz PA, Appel RD, Hochstrasser DF, Sanchez JC. New perspectives in the Escherichia coli proteome investigation. Proteomics. 2001 1:409-423
  7. Duncan R, McConkey EH. How many proteins are there in a typical mammalian cell? Clin Chem. 1982 28:749-55
  8. Lescuyer P, Hochstrasser D, Rabilloud T. How shall we use the proteomics toolbox for biomarker discovery? J Proteome Res. 2007 6:3371-3776
  9. Chao CC, Chelius D, Zhang T, Mutumanje E, Ching WM. Insight into the virulence of Rickettsia prowazekii by proteomic analysis and comparison with an avirulent strain. Biochim Biophys Acta. 2007 1774: 373-381
  10. Anderson, L.; Hunter, C.L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteomics. 2006, 5: 573-588.
  11. Harrington, M.G.; Merril, C.R.; Asher, D.M.; Gajdusek, D.C. Abnormal proteins in the cerebrospinal fluid of patients with Creutzfeldt-Jakob disease. N Engl J Med. 1986, 315: 279-283.
  12. Burkhard, P.R.; Sanchez, J.C.; Landis, T.; Hochstrasser, D.F. CSF detection of the 14-3-3 protein in unselected patients with dementia.Neurology. 2001, 56: 1528-1533.
  13. Roessler M, Rollinger W, Mantovani-Endl L, Hagmann ML, Palme S, Berndt P, Engel AM, Pfeffer M, Karl J, Bodenmüller H, Rüschoff J, Henkel T, Rohr G, Rossol S, Rösch W, Langen H, Zolg W, Tacke M. Identification of PSME3 as a novel serum tumor marker for colorectal cancer by combining two-dimensional polyacrylamide gel electrophoresis with a strictly mass spectrometry-based approach for data analysis. Mol Cell Proteomics. 2006 5:2092-2101
  14. Chevallet M, Diemer H, Van Dorssealer A, Villiers C, Rabilloud T. Toward a better analysis of secreted proteins: the example of the myeloid cells secretome. Proteomics. 2007 7:1757-1770.
  15. Zhang H, Liu AY, Loriaux P, Wollscheid B, Zhou Y, Watts JD, Aebersold R. Mass spectrometric detection of tissue proteins in plasma. Mol Cell Proteomics. 2007 6: 64-71
  16. Aicher L, Meier G, Norcross AJ, Jakubowski J, Varela MC, Cordier A, Steiner S. Decrease in kidney calbindin-D 28kDa as a possible mechanism mediating cyclosporine A- and FK-506-induced calciuria and tubular mineralization. Biochem Pharmacol. 1997 53:723-731
  17. Witzmann FA, Richardson MR. Two-dimensional gels for toxicological drug discovery applications. Expert Opin Drug Metab Toxicol. 2006 2:103-111.
  18. Merrick BA, Bruno ME. Genomic and proteomic profiling for biomarkers and signature profiles of toxicity. Curr Opin Mol Ther. 2004 6:600-607
  19. Wang YX, Yuan YJ. Direct proteomic mapping of Streptomyces Luteogriseus Strain 103 and cnn1 and insights into antibiotic biosynthesis. J Proteome Res. 2005 4:1999-2006.
  20. Fortis F, Guerrier L, Areces LB, Antonioli P, Hayes T, Carrick K, Hammond D, Boschetti E, Righetti PG. A new approach for the detection and identification of protein impurities using combinatorial solid phase ligand libraries. J Proteome Res. 2006 5:2577-2585.

Thierry Rabilloud, PhD

Head of proteomics group, Biophysics and Biochemistry of Integrated Systems, iRTSV/LBBSI, UMR CNRS 5092

Dr Rabilloud’s interest and expertise focus both on the development and improvement of proteomics techniques, especially protein separation methods, and on their application to cell biology biological problems, mainly cell differentiation in the hematopoietic system. Dr Rabilloud started to work on the large scale analysis of proteins in the 80’s, long before the word, ‘proteomics’ was created, so that he is a privileged witness of this field.

He has also been the inaugural president of the French Electrophoresis and Proteomics Society (in 2001), one of the organisers of the first meeting of the Human Proteomics Organisation (HUPO) in 2002. He currently is a member of the HUPO international council, and the representative for France in the European Proteomics Association.