A network of innovation from Canada
Posted: 7 March 2005 |
The Réseau Protéomique de Montréal Proteomic Network (RPMPN) was created in the year 2000 through funding from Genome Canada, Genome Québec and the Canadian Foundation for Innovation. For the past five years, the RPMPN has been involved in the Cell Map Project, which involves cell biologists from the Université de Montréal and McGill University. The […]
The Réseau Protéomique de Montréal Proteomic Network (RPMPN) was created in the year 2000 through funding from Genome Canada, Genome Québec and the Canadian Foundation for Innovation. For the past five years, the RPMPN has been involved in the Cell Map Project, which involves cell biologists from the Université de Montréal and McGill University. The goal is to achieve the most exhaustive documentation (identification, localisation and function) of protein expressed in mammalian organelles challenged with hormones such as insulin and EGF, along with their respective control.
Ultimately, this proteomic mapping will be applied to organelles from disease states such as cancer and diabetes. Since the project started, more than four terabytes of data have been collected on organelles such as endoplasmic reticulum (rough and smooth), Golgi apparatus, phagosome, endosome, clathrin coated vesicles and plasma membrane. The most widely used mass spectrometers in the proteomic field are ion traps, TOF and QTOF technologies, with several ionization sources (electrospray, nanospray, MALDI). Another instrument recently emerging in the proteomics field is the Ion Cyclotron Resonance (ICR), also referred to in literature as the Fourier Transform Mass Spectrometer (FT-MS). Of all mass spectrometry technologies, ICR gives the highest resolving power, along with the highest sensitivity and dynamic range. From a proteomic point of view, the advantage of these three features is obvious. Achieving the highest mass accuracy possible would limit the number of candidates within an in silico digested database. It has been suggested in the literature that with sufficient mass accuracy, proteins could be identified by Peptide Mass Fingerprint (PMF) alone.
MS/MS spectra of peptides would no longer be needed for protein identification. This still remains to be proven for unknown samples in a high throughput environment such as ours. We have several research projects currently under way that address this issue. Nevertheless, the ICR application in proteomics is evolving rapidly and seems to follow a similar path as the ion trap and the QTOF technologies of several years ago. For years, ICR has been used with direct infusion only. Coupling it to nano-LC should expand its analytical possibilities as exemplified by the pioneering work of Richard Smith. Our first proteomic analysis was carried out eight years ago on a triple quadrupole mass spectrometer by direct infusion using gold plated glass capillary emitters. Everything was operated manually, from the trypsin digestion to the MS/MS spectrum interpretation. It was time consuming and far from being efficient from a production point of view. We rapidly found that the peptide composition of the submitted samples were far too complex to be thoroughly analysed in this fashion. The next logical step was to put on line a chromatographic inlet to simplify the peptide composition sprayed in the atmospheric source.
Fortunately, nano-LC coupled to QTOF mass spectrometry became available about that time, with operating software allowing the automatic analysis of hundreds of peptides. Also, robotic digesters became available, allowing daily trypsin digestion of up to 192 samples per day. Our first LC-QTOF system was equipped with a short (5mm) C18 guard column which was used both as desalter and the analytical column. We were using a ballistic acetonitrile gradient (5% to 95% in seven minutes), running at a flowrate of 1µL per minute. This represented a marked improvement compared to direct infusion, but the sheer complexity of the Cell Map samples was still overwhelming for the unambiguous identification of a maximum of proteins. Clearly nanobore columns running at nanolitres per minute flowrate were needed. A new nanoflow technology had just emerged and was available commercially. The PicoFrit (NewObjective) system had all the characteristics that we were looking for; an internal diameter of 75µm and 10cm of stationary phase. The most important feature is that the end of the stationary phase coincides with the emitter. As a result, it can be positioned near the mass spectrometer sampling cone. The peptides are sprayed almost instantly as they elute from the column. Post-column dead volume being nonexistent, the chromatographic peak shape is optimal. From a preventive maintenance point of view, absence of fittings between the column and the mass spectrometer is a tremendous advantage. A major issue for a production laboratory is column longevity,because it is linked to operation downtime. In our laboratory, PicoFrit columns last an average of three months (around 1400 injections), which means that the column cost per sample is US$0.25. Based on this value, everyone can understand why we never even considered making our own columns. Eventually, the 15µm ID spraying tip becomes damaged through erosion and a higher capillary voltage is needed in order to maintain an optimal spray.
Quality and quantity
To achieve robustness in a proteomics operation such as ours, an efficient quality control (QC) system is imperative for working successfully on a 24/7 basis. The objective is to have a system that monitors the performance of all operations from sample submission up to the protein identification. To that end, the following process was put in place. Before starting the analysis on any 96 wells tray, several QC tests are systematically performed in order to verify not only instrument sensitivity, but also the general performance of the entire production line. Using the LC-QTOF as an example, the first test is the sensitivity test. A quantity of 250 femtomoles of (Glu)Fibrinopeptide B (Sigma no F3261) from a solution of 50 femtomoles per µL is injected and data is acquired with the same mass spectrometry acquisition features as for a real sample. Base peak intensity of the peptide MS trace is noted, along with the retention time of the chromatographic peak. Then the MS/MS spectrum is examined and the m/z values of the y series fragments are compared to the theoretical values, to check the calibration. All these data points are recorded for each instrument in their respective control charts. The second test involves the in-gel digest of Bovine Serum Albumin (BSA) which has been prepared in house on our robotic trypsic digestion system. As soon as it is acquired and the data file is transferred to the server, a text file version of the MS/MS spectra is created and sent to our local copy of Mascot (Matrix Science, UK) for database search. The number of peptides found and the percentage of sequence coverage are transferred in the control charts. As soon as any value from both QC tests is outside of two standard deviations (warning lines) higher or lower than the average value calculated from all the previous runs, the operator starts an investigation to determine which part of the analytical process is responsible for the variation. The instrument is put off line until the cause has been solved and returned to its standard performance level. Analyses are resumed only if all QC requirements are met. Since we have been using this QC system, we have successfully prevented the onset of several minor problems and corrected them before they could impair the quality of the results. Creating this QC system proved to be well worth the resource investment. We know that our instrumentation is running according to our parameters at all times and we also have a historical record of the instrument’s performance.
Making the most of data
For the past ten years, tremendous improvements have been made that permit the study of highly complex proteomic samples in an automatic and robust fashion. At this moment, for most laboratories around the world, the major challenge is no longer at the sample preparation or the mass spectrometric level, but at the data processing level. This challenge has been successfully addressed by the Bioinformatics team of the RPMPN. An important contribution of the Bioinformatics team was the creation of the CellMapBase (CMB) platform to integrate, control, process and maintain all of the features related to our proteomic pipeline. This database is a central repository of data that includes: protocols, mass spectrometric data, peaklists, projects, gel pictures meta data and public protein database search results. Scientists can access all these CMB features via the Internet from a single user authorisation login. A module of CMB is in development, to be interfaced with an in-house Laboratory Information Management System (LIMS). This module will provide information about the progression of all samples, from the moment they are submitted until analysis is completed. Bar code readers will be used and the status of every tray will be accessible at any time by the management team in contact with the scientists. Once a tray is completed on a mass spectrometer, a copy of the data is automatically transferred to the 40 terabyte server. As soon this task is done, software created by Bio-Informatics (TOMAS, Toolbox for Mass Spectrometry Data Analysis) initiates the data processing by creating a list of precursor ions with their MS/MS fragments. TOMAS uses Mascot Distiller (Matrix Science) to perform this peak detection and to de-isotope and remove noise from the spectra. Once the peaklist files are created, TOMAS automatically sends them to Mascot (Matrix Science) for database search on a multi-cluster system. At each step, QC tests are performed by TOMAS to ensure that all data files are peaklisted and Mascoted. Reports of successful and unsuccessful processes are automatically sent by E-mail to the laboratory managers and to the mass spectrometrist responsible for the tray analysis. Because peptides might not be unique, they could be linked to more than one protein sequence, resulting in a redundant list of proteins in which many peptides would point to multiple proteins. Eliminating this redundancy and sorting the proteins by hand is very tedious and error prone. Bioinformatics has created a clustering algorithm within the CMB, which partitions the protein hits into clusters where each cluster shares a unique subset of peptides. Comparison of different experiments (time courses, disease states) within CMB is now greatly facilitated for the scientists. Within a few days they are able to compile a list of proteins that would otherwise require several months of work.
A platform for everyone
The success of the RPMPN is due to its multidisciplinary approach to solve scientific problems. Efficient integration of cell biologists, mass spectrometrists and bio-informaticians has resulted in a robust platform that is flexible and able to react and adapt rapidly to new problems, and find solutions rapidly to provide useful data to the scientists. Finally, as a result of successful efforts from the RPMPN, this proteomics platform has been made available to the general scientific community through a fee-for-service approach.