Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were developed to allow the analysis of large intact (bigger than 1,000 Daltons) molecule. Two soft ionization techniques, Electrospray and Matrix Assisted Laser Desorption led to a huge jump in popularity as did the development of much more compact (bench top rather than whole laboratory) mass spectrometers. 1!

Mass spectrometry is dependant on the ability to turn the analyte of interest into individual but intact, charged molecules in the gas phase. These are released into a low pressure area (a vacuum from 10-3 to 10-10 torr) where they can be manipulated by electrostatic and/or magnetic fields and separated. The force fields require the molecule to be charged, neutral molecules cannot be manipulated and are lost from the system. 2!

Here the spectrum of a small molecule caffeine, shows basically one main peak and a few larger but much less intense peaks. 3!

Isotope distributions in nature: Carbon C12 (98.9%), C13 (1.1%), C14 (small) Hydrogen H1 (99.98%), Deuterium (0.015%), Tritium (small) Oxygen O16 (99.8%), O17 (0.04%), O18 (0.2%) Sulphur S32 (95.0%), S33 (0.8), S34 (4.2%) These atoms are common in biological systems. Since the heavy isotopes are rare, this is not significant for most small molecules. Once one starts to look at biological molecules like peptides, the mass changes can be significant. For example, insulin has a mass of over 6,000 and thus a 1% shift by each of carbon, oxygen, hydrogen etc. can spread the mass from the lightest molecule (all C12, H1, O16 etc) to the heaviest (all C13, D2, O18 etc) over a range of 20-30 mass units. However some elements such as bromine have almost equally distributed isotopes (Br79 50.5% and Br 81 49.5%) which give rise to spectra with all peaks appearing as doublets. The effect of the isotope distribution on the shape of the spectrum (sometimes called the isotope envelope) becomes much more pronounced when analysing larger biomolecules. Here the various masses are shown for the peptide hormone glucagon. 4!

Here is a spectrum of pure substance P, a peptide. Since a mass spectrometer always measures m/z the mass to charge ratio, two peaks are found. Here we see two main peaks, the doubly and singly charged ions. 5!

In order to find out what you are looking at, i.e. is it singly, doubly etc charged, one looks at the details of the isotope distribution. Since isotopes are always one mass unit apart, if the peaks are one unit apart, the ion is singly charge since the mass difference (1 mass unit) divided by the charge (1+) is 1. 6!

If the peaks are 0.5 mass units apart, they are doubly charged. Remember m/ z,that the mass difference is 1 unit between isotopes and if the charge is 2 then the the mass change is 1/2 = 0.5 7!

An electric field accelerates the ions to a high speed. After this, they are directed into a magnetic field which applies a force to each ion perpendicular to the plane defined by the particles' direction of travel and the magnetic field lines. This force deflects the ions (makes them curve instead of traveling in a straight line) to varying degrees depending on their mass-to-charge ratio. Lighter ions get deflected more than the heavier ions. This is due to Newton's second law of motion. The acceleration of a particle is inversely proportional to its mass. Therefore, the magnetic field deflects the lighter ions more than it does the heavier ions. The detector measures the deflection of each resulting ion beam. From this measurement, the mass-to-charge ratios of all the ions produced in the source can be determined. 8!

All of these mass spectrometers have many things in common. Firstly they possess an Ion Source, that produces ions, an Analyzer that sorts them in some way by their masses, and a Detector that measures the relative intensities of different masses. The underlying principle of all mass spectrometers is that the paths of gas phase ions in electric and magnetic fields are dependent on their mass-to-charge ratios which is used by the analyzer to distinguish the ions from one another. 9!

The simplest type of mass spectrometer involves a single mass separation stage. The ions that are passed from the source into the mass analyzer give a simple read out of the intact molecular ions (assuming the ionization method is soft enough) 10!

As interest grew in analyzing the structure of molecules, more complex mass spectrometers were developed with two mass separation stages. The first stage allows the selection of unique molecules by creating a single mass window to filter away other molecular species. The isolated molecule can then be broken into smaller components by a variety of techniques and the resultant fragment ions can be analysed in the second mass seperation stage. This is called tandem mass spectrometry or MS/MS since it originally was carried out using two mass spectrometers joined together in tandem. 11!

Fragmentation of gas-phase ions is essential to tandem mass spectrometry and occurs between different stages of mass analysis. There are many methods used to fragment the ions and can result in different types of fragmentation and thus different information about the structure and composition of the molecule. There are a number of different tandem MS experiments, which each have their own applications and offer their own information. An instrument equipped for tandem MS can still be used to run MS experiments. Tandem MS can be done in either time or space. Tandem MS in space involves the physical separation of the instrument components (QqQ or QTOF), tandem MS in time involves the use of an ion trap. Post-source fragmentation is most often what is being used in a tandem mass spectrometry experiment. Energy can also be added to the, usually already vibrationally excited, ions through post-source collisions with neutral atoms or molecules, the absorption of radiation, or the transfer or capture of an electron by a multiply charged ion. Collision-induced dissociation (CID), also called collisionally activated dissociation (CAD), involves the collision of an ion with a neutral atom or molecule in the gas phase and subsequent dissociation of the ion. In mass spectrometry, collision-induced dissociation (CID), referred to by some as collisionally activated dissociation (CAD), is a mechanism by which to fragment molecular ions in the gas phase. The molecular ions are usually accelerated by some electrical potential to high kinetic energy in the vacuum of a mass spectrometer and then allowed to collide with neutral gas molecules (often helium, nitrogen or argon). In the collision some of the kinetic energy is converted into internal energy which results in bond breakage and the fragmentation of the molecular ion into smaller fragments. These fragment ions can then be analyzed by a mass spectrometer. In peptide analysis, CID cleaves randomly along the peptide backbone producing b and y ions (see later section 12!

Two types of MS/MS experiments can be carried out depending on the instrument type being used. The first approach developed was tandem in space, in which the parent molecule of interest is fragmented in one part of the instrument before being moved to a second part for the analysis of the daughter (fragment) ions. 13!

The alternative to tandem in space, is the type of experiment that is carried out in an ion trap; tandem in time. Here the isolation of the parent molecule and the analysis of the daughter ions produced by fragmentation occur in the same part of the instrument. The two processes are merely separated by time, the parent isolation occurs first, then the fragmentation and finally the daughter analysis is carried out in the same part of the trap. 14!

Genomics began with the goal of sequencing entire genomes. To accomplish this task, two different sequencing approaches were developed. These methods can be thought of in the following way: Imagine that you have the complete works of an author, written in a language that you studied in school, but never became fluent in. Moreover, the books are in such bad shape that if you open them, they disintegrate. You have two alternatives. You can remove one page at a time, preserve it and decipher it. Or you can open all the books at once and then pick up the fragments of paper and use the words on them to figure out how they fit together. The page-by-page approach to sequencing the human genome was used by the public genome-sequencing consortium. This group first figured out how all the pages fit together and then deciphered all the words on each page. Finally, it assembled the pages back together to produce the whole genome. The advantage of this approach is that it is very precise. The disadvantage is that it takes a long time. The biotechnology company Celera used the other method, called whole genome shotgun sequencing, in its competing effort to sequence the human genome. This method is equivalent to figuring out what s written on all the fragments of paper from all of the volumes and then figuring out how they piece together. To do this procedure effectively requires starting with several copies of each volume so that overlaps among the fragments can be found. The number of original copies is referred to as coverage. To produce a high-quality sequence by this method usually requires eight- to tenfold coverage. The disadvantage of this method is that you rarely get the whole sequence to line up. The advantage is that the portion of the sequence that does line up is acquired much more rapidly than via the pageby-page method. 15!

Proteins can be identified in simple mixtures by digesting them with an enzyme and then measuring the masses of the peptides formed. The set of masses is called the peptide fingerprint. A database is made containing all the proteins in the species genome and the masses of all the peptides from each protein produced by a certain enzyme are calculated. Thus each protein has a theoretical peptide fingerprint. The experimental fingerprint is then compared to all the theoretical fingerprints and the best match is calculated. This should be the correct identity of the unknown protein. 16!

Fingerprints are generated by using specific proteases. These are ones that cut after known amino-acids and hence one can predict theoretically which peptides will be formed.trypsin is the most commonly used protease in proteomics studies since it cuts after arginine and lysine and on average generates peptides that are around 12 amino acids long on average. This is ideal for ESI MS/MS analysis. 17!

A specific enzyme, here trypsin, is used to cut the protein into peptides. Trypsin cuts after arginine (R) and lysine (K) and the masses of the peptides can then be calculated. The experimentally determined masses are then searched against the theoretical masses in the fingerprint database to try and find the best match between the two sets of masses. The result of the search is returned as a list of matches according a probability of this not occuring at random. 18!

Here the output results from database search using the popular Mascot program are shown. A graph is shown to aid visualization. The green box indicates an area where the probability of a hit being correct is less than the significance threshold set, usually 0.05. The red bars outside the box indicate proteins that are likely hits. 19!

The first time a peptide match to a query (one spectrum) appears in the report, it is shown in bold face. Whenever the top ranking peptide match appears, it is shown in red. This means that protein hits with peptide matches that are both bold and red are the most likely assignments. These hits represent the highest scoring protein that contains one or more top ranking peptide matches. 20!

The concept of shotgun proteomics is shown above. Instead of separating the proteins, the entire cell extract is digested with proteases and then the complex mixture is separated. The peptides are eluted from the final separation method, usually reversed-phase chromatography directly into the mass spectrometer where they are automatically subjected to MS/MS analysis. The peptides are identified in a similar way to how proteins are identified. Maybe 10 peptides are entering the mass spectrometer. The MS picks automatically the most intense, isolates it (throwing away the other 9 peptides) and then smashes it into pieces. The mass of the peptide is used to search the database to find all peptides with the same mass. The fragmentation spectra of all these peptides are then calculated and compared to the experimental fragments observed. The best matching peptide sequence is then selected. 21!

Here an automatic RP-HPLC-MS/MS run is shown. The mass spectrometer first accumulates a normal MS scan. It finds the 10 most intense peaks. It uses a mass window of around 10 to prevent picking all the isotopes in a intense peak envelope. The mass spectrometer then sequentially performs MS/MS on each of the ten peaks and then returns to MS mode. The ten peaks are then placed in an exclusion list which tells the mass spectrometer to ignore these masses for the next 5 minutes to ensure they have all eluted and are not repeatedly analysed. The next ten most intense peaks are then determined and scheduled for MS/MS: 22!

The experimental data is generated by the automatic accumulation of MS/MS spectra of tryptic peptides from the multi-dimensional peptide separation. A list of intact peptide masses, each with a list of their fragment masses is generated. In a manner analogous to protein fingerprinting, a theoretical in silico list of the masses of all the tryptic peptides predicted for a specific genome together with their predicted fragment ions is generated. In a first pass, the best theoretical 1000 matching peptide intact massesis generated for each experimental parent mass. Then a cross-correlation analysis is done between the experimental MS/MS spectrum and every theoretical spectrum. The crosscorrelation indicates which is the best matching spectrum and again the probability of the match not occurring at random is calculated. 23!

This shows the MASCOT output for such search. The green area shows insignificant matches and the red boxes indicate significant protein identifications. 24!

25!

26!

27!

28!

29!

30!

31!

32!

This shows the MASCOT output for such search. The green area shows insignificant matches and the red boxes indicate significant protein identifications. 33!

Nominal values: important: we have discret space, unit: 1 m/z 38!

39!

40!

41!

42!

43!

44!

45!

46!

47!

48!

49!

50!

51!

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. BLAST searches for high scoring sequence alignments between the query sequence and sequences in the database using a heuristic approach that approximates the Smith- Waterman algorithm. The exhaustive Smith-Waterman approach is too slow for searching large genomic databases such as GenBank. Therefore, the BLAST algorithm uses a heuristic approach that is less accurate than the Smith- Waterman but over 50 times faster. The speed and relatively good accuracy of BLAST are the key technical innovation of the BLAST programs. The BLAST algorithm can be conceptually divided into three stages. In the first stage, BLAST searches for exact matches of a small fixed length W between the query and sequences in the database. For example, given the sequences AGTTAC and ACTTAG and a word length W = 3, BLAST would identify the matching substring TTA that is common to both sequences. These exact matches are known as seeds. By default, W = 11 is used for nucleic seeds. In the second stage, BLAST tries to extend the match in both directions, starting at the seed. The ungapped alignment process extends the initial seed match of length W in each direction in an attempt to boost the alignment score. If a high-scoring un-gapped alignment is found, the database sequence passes on to the third stage. 52!

53!

54!

55!

56!

57!

58!

59!

60!

61!

62!

63!

64!

65!

66!

67!

68!

69!

70!

71!

72!

73!

74!

75!

76!

77!

78!

79!

80!

81!