Matthiesen R. — Bioinformatics Methods in Clinical Research (Methods in Molecular Biology) :: Электронная библиотека попечительского совета мехмата МГУ

This volume discusses the latest developments in clinical proteomics and describes in detail the algorithms used in publicly available software tools. It should be considered as a proteomics-bioinformatics resource and offers the opportunity to understand the details of the various publicly available algorithms. The book should not only be considered as a pure bioinformatics resource filled with complex equations but it aims to describe the background of the biology and experimental methods. However, detailed experimental protocols will only be referenced. The pro and cons of the various experimental methods in relation to data analysis will be reviewed as well. In other words the intention is to make a connection between theory and practice. Practical examples showing results from the software tools will in some cases be given. The book is divided into five sections:

Section I: A detailed discussion on algorithms implemented in software tools for assignment of MS and MS/MS spectra to peptides and proteins. The most popular tools for searching mass spectrometry data is currently commercial tools like Mascot and Seaquest. These tools provide the most basic analysis of mass spectrometry data however; by using the publicly available tools one can often move further with the data analysis and mass spectrometry data has more flavors than for example micro array data where the gene IDs is given directly by the array software. The problem is that mass spectrometry data acquisition methods are different depending on the specific task and produce slightly different data types. For example, the samples can be enriched for specific modifications and mass spectrometry settings can be optimized for the specific modification. Making specific algorithms for each type of spectra data will be a major task which deserves more attention. The different publicly available software tools have already some specialization for specific tasks and the different tools have both pro and cons in specific cases. Understanding the details and the philosophy behind the algorithms helps in deciding which tool is best for a specific search task.

Section II: Starts with an overview of the different quantitative proteomics strategies which discusses the pro and cons of the different labeling strategies, labeling versus non-labeling, model system (cell or tissue types), and mass spectrometers in relation to quantitative proteomics. The consecutive contribution describes quantitative algorithms used in publicly available tools. Algorithms for label free quantitation by LC-MS intensity profiling, stable isotope labeling and MS, and quantitation from 2D-gels will be covered.

Section III: Is titled "Finding biomarkers in MS data". The word biomarker has different meaning depending on the context in which it is used. It is here used in a clinical context and should be interpreted as: "a substance whose specific detection level indicates a particular cellular or clinical state". In theory one could easily imagine cases where one biomarker has several detection levels intervals that indicate various sates. A even more complex example would be a set of biomarkers and their corresponding set of dectection levels intervals could be used for classifying a specific cellular or clinical state. In complex cases more elaborat models based on machine learning is essential. This section therefor starts with a gentle introduction to machine learning followed by examples of useful algorithms and mehtods for classification based on mass spectrometry spectra.

Section IV: The proteomics mass spectrometry data storage problem is still not solved to a satisfactory degree. Parsing the data into the databases is in some cases problematic. Other problems such database schemas that do not fully encapsulate the result from mass spectrometry data are also evident. However, the most common type of data and results can be stored in current proteomics databases.

Section V: System biology is the study of the interactions between the components of a biological system. System biology is a broad field involving data storage, controlled vocabulary, data mining, interaction studies, data correlation, and modeling of biochemical pathways. The data input comes from various omics fields such as genomics, transcriptomics, proteomics, interactomics and metabolmics. Notice that metabolomics can be further divided in to subcategories such as peptidomics, glycomics, and lipidomics. The discussion in this section will be restricted to system biology in relation to proteomics. It will describe how to relate the proteomics result to result obtained in other omics fields and how one can automatically obtain functional annotation of the identified proteins.