Àâòîðèçàöèÿ
Ïîèñê ïî óêàçàòåëÿì
Raychaudhury S. — Computational text analysis for functional genomics and bioinformatics
Îáñóäèòå êíèãó íà íàó÷íîì ôîðóìå
Íàøëè îïå÷àòêó? Âûäåëèòå åå ìûøêîé è íàæìèòå Ctrl+Enter
Íàçâàíèå: Computational text analysis for functional genomics and bioinformatics
Àâòîð: Raychaudhury S.
Àííîòàöèÿ: This book brings together the two disparate worlds of computational text analysis and biology and presents some of the latest methods and applications to proteomics, sequence analysis and gene expression data. Modern genomics generates large and comprehensive data sets but their
interpretation requires an understanding of a vast number of genes, their complex functions, and interactions. Keeping up with the literature on a single gene is a challenge itself-for thousands of genes it is simply impossible.
Here, Soumya Raychaudhuri presents the techniques and algorithms needed to access and utilize the vast scientific text, i.e. methods that automatically "read" the literature on all the genes. Including background chapters on the necessary biology, statistics and genomics, in addition to practical
examples of interpreting many different types of modern experiments, this book is ideal for students and researchers in computational biology, bioinformatics, genomics, statistics and computer science.
ßçûê:
Ðóáðèêà: Áèîëîãèÿ /
Ñòàòóñ ïðåäìåòíîãî óêàçàòåëÿ: Ãîòîâ óêàçàòåëü ñ íîìåðàìè ñòðàíèö
ed2k: ed2k stats
Ãîä èçäàíèÿ: 2006
Êîëè÷åñòâî ñòðàíèö: 312
Äîáàâëåíà â êàòàëîã: 11.12.2007
Îïåðàöèè: Ïîëîæèòü íà ïîëêó |
Ñêîïèðîâàòü ññûëêó äëÿ ôîðóìà | Ñêîïèðîâàòü ID
Ïðåäìåòíûé óêàçàòåëü
3’ and 5’ untranslated regions 23
Aach, J. and Church, G.M. 67
Abbreviations, use in gene name recognition 228 237—42 238 239 243
Abstract co-occurrences 254—59 256
Abstract co-occurrences, number, prediction of likelihood of interaction 254—59 256 257 258
Accession number (AC), SWISS-PROT 109
Accuracy 36 212
Adenosine 18 19
Affine gap penalty function 43
Affinity precipitation 248
Agglomerative hierarchical clustering 71—2
Alanine 25
Alberts, B., Bray, D. et al. 17
Algorithms, measurement of performance 35—7
Aligned sequences 42
Alignment algorithms 42—4
Alignment, dynamic programming 44—7
Alizadeh, A.A., Eisen, M.B. et al. 62 63 68 78
Alpha helices 25
Alpha helices, hydrogen bonding pl 2.3
Altman, R.B. and Raychaudhuri, S. 67 86
Altschul, S.F., Gish, W. et al. 48
Altschul, S.F., Madden, T.L. et al. 115
Ambiguity of gene names 229 232
Amino acid sequences, probabilities 30
Amino acids 24—5 25
Amino acids, emission probabilities 59
Amino acids, genetic code 23
Amino acids, secondary structure prediction 56—7
Amino acids, structure 24
Amino acids, substitutions 41 42—3
Amino acids, synthesis 21—2
Amino acids, transition probabilities 59—60
Anchoring enzymes 64
Andrade, M.A. and Valencia, A. 112 113 173
Ank root 237
Annotated genes see also Functional vocabularies
Annotated genes, use of maximum entropy classifier 221—4
Annotated genes, uses 196—7
Annotation quality, GO 187
Annotation quality, relationship to NDPG sensitivity 179
Appearance of words, use in name recognition 228 232—3 234 241—3 242
Arabidopsis thaliana, GO annotated genes 13
Arbeitman, M.N., Furlong, E.E. et al. 98 193
Arginine 25
Arrays, gene expression profiling 26—7 63 pl
Arrays, noise sources 125
Article indices 227
Ashburner, M., Ball, C.A. et al. 7 90 148 152 196
Asparagine 25
Aspartic acid 25
Average linkage clustering 181 191
Bachrach, C.A. and Charen, T. 213 224
Backward algorithm 60
Bailly, V. et al. 149
Bait proteins 248
Ball, C.A., Awad, I.A. et al. 126
Ball, C.A., Dolinski, K. et al. 212
Base pairing 19 20
Base pairing in RNA 21
Baum — Welsh algorithm 60—1
Bayes' theorem 30 255
Behr, M.A., Wilson, M.A. et al. 63
Ben-Hur, A., Elisseeff, A. et al. 67
Best article score (BAS) 160—2
Best article score (BAS), precision-recall plot 160
Beta sheets 25 26
Beta-GAL 248
Bias in study areas 5 6
Binary vectors, comparison metrics 87
Binding proteins 141
Binomial distribution 31 32 33
Bioinformatics 1 2
Biological function 26—7
Biological function codes 195
Biological function databases 7
Biological function querying 101—4
Biological process terms, Gene Ontology 12 198
Biological similarity, relationship to textual similarity 97—9
BioMed Central 3 9
Biomolecular Interaction Network Database (BIND) 263
Blake, J.A., Richardson, J.E. et al. 9 184 229
Blaschke, C., Andrade, M.A. et al. 7 260—1 265
BLAST (Basic Linear Alignment Search Tool) 39 48 83 107
BLAST, comparison of breathless protein with other proteins 97 98.
Boeckmann, B., Bairoch, A. et al. 3 109
Breathless 228
Breathless, abbreviations 237—38
Breathless, gene literature study 96—9
Breathless, SWISS-PROT record 109 pl
Breathless, synonyms 229 231
Breitkreutz, B.J., Stark, C. et al. 250
Brill, E. 234
Brown, P.O. and Bostein, D. 1
Caenorhabditis elegans, assembly of functional groups 185—9
Caenorhabditis elegans, Candida albicans, GO annotated genes 13
Caenorhabditis elegans, GO annotated genes 13
Caenorhabditis elegans, literature index 185 186
Caenorhabditis elegans, sensitivity of NDPG 187
Calculation of mean 35
Candidate gene identification 8
Carbohydrate metabolism genes 150
Catlett, M.G. and Forsburg, S.L. 151
CCAAT promoter 50
Cellular compartment terms, Gene Ontology 198
Central dogma of molecular biology 18 pl
Centred correlation metric 181
Chang, J.T., Raychaudhuri, S. et al. 8 107 117 118
Chang, J.T., Schutze, H. et al. 233 235 238—40
Chang, J.T., Schutze, H. et al., unified gene name finding algorithm 240—3
Chaperones 24
Chaussabel, D. and Sher, A. 95
Chee, M., Yang, R. et al. 63
Chen, J.J., Wu, R. et al. 63
Cherry, J.M., Adler, C. et al. 9 155 174 181 184 212 229
Chi-square testing, feature selection 210—12 211 216 218
Chips, sources of noise 125
Cho, R.J., Campbell, M.J. et al. 63
Chu, S. and Herskowitz, I. 78
Chu, S., DeRisi, J.L. et al. 78
Classification methods 66 74—9
Classification of documents, inconsistencies 218
Clustal Walgorithm 48 49
Cluster boundary optimization 178—84 192—3
Cluster identification 192—3
Cluster software 86 181
Clustering algorithms 66—72 172
Clustering algorithms, k-means clustering pl 2.8
Clustering, hierarchical 178—84
Clustering, NDPG scoring 173—8
Clustering, use in organizing sequence hits 114
Co-occurring gene names 249—50
Co-occurring gene names, assessment of efficacy 250—4
Co-occurring gene names, interaction verbs 260—1
Co-occurring gene names, number, prediction of likelihood of interaction 254—59
Coded messages, information theory 33—4
Codons 21—2
Codons, genetic code 23
Coherence of gene groups 147. See also Functional coherence of gene groups
Coin tossing, hidden Markov models 55—6
Coin tossing, probabilities 28 29
Collection frequency 85—6
Comments field (CC), SWISS-PROT 109
Comprehensive Yeast Genome Database (CYGD) 169
Concordance see Overlap clusters
Conditional probability 28—9
Conditional probability, Bayes’ theorem 30
Conditions, in expression analysis 65
Confidence scores of maximum entropy classifier 220—21
Consensus sequences 50
Conserved substitutions 41
Context, use in recognition of gene names 228 235—7 242
Continuous probability distribution functions 31 32 33
Core terms, in name finding algorithm 233 234
Correlation coefficient 67
Corruption studies, gene groups 166—7
Cosine metric 87
Cosine metric, comparison of breathless with other genes 96—7
Cosine metric, comparison of gene expression profiles 98
Cosine metric, neighborhood expression information scoring 130—1 203
Covariance matrices, linear discriminant analysis 77 pl
Covariance matrices, principal component analysis 73
Craven, M. and Kalian, J. 7
Credibility, genomics literature 4
Cross-referencing, assessment of functional coherence of gene groups 152
Cysteine 25
Cytochrome P450 genes, appearance 232—3
Cytosine 18 19
Danio rerio, GO annotated genes 13
Data analysis 65—6
Data analysis, clustering algorithms 66—72
Data analysis, dimensional reduction 72—4
Data interpretation 66 68 74 77 pl pl
Data interpretation problems 1—2
Data, statistical parameters 34—5
Database building 5 7
Database of Interacting Proteins (DIP) 7 262
databases 3—4 7 9—11.
Databases, Biomolecular Interaction Network Database (BIND) 263
Databases, Comprehensive Yeast Genome Database (CYGD) 169
Databases, electronic text 9
Databases, GenBank database, growth 37
Databases, GENES database 201
Databases, PATHWAYS database 201
Databases, SCOP database 117—18
Databases, Stanford Microarray Database (SMD) 126
Dendrograms, hierarchical clustering 71 178
Deoxyribonucleic acid see DNA
Deoxyribonucleotides 18 19
Deoxyribose 18 19
DeRisi, J.L., Iyer, V.R. et al. 66 78
Dice coefficient 87
Dictionary strategy, gene name identification 228—2 240 251
Dictyostelium discoideum, GO annotated genes 13
Dimensional reduction 66 67 72—4
Dimensional reduction, feature selection 88—90
Dimensional reduction, latent semantic indexing 92—4
Dimensional reduction, weighting words 90—1
Dirichlet priors 159
Discrete probability distribution functions 31 32 33
Discriminant line, linear discriminant analysis 76
Distance metrics, clustering algorithms 67
Distribution functions see Probability
Distribution functions (pdfs) 0
Distributions of words, WDD 157—60
Divergence value, WDD 15
Diversity, genomics literature 5 141 150 195
DNA (deoxyribonucleic acid) 18—20
DNA (deoxyribonucleic acid), binding by proteins 25 26
DNA (deoxyribonucleic acid), Sanger dideoxy sequencing method 39 pl
DNA (deoxyribonucleic acid), transcription 21 22 245 247
DNA polymerase 18
DNA polymerase, use in Sanger dideoxy sequencing method 39
DNA-dependent ATPase genes, yeast 148—50 149
Document classification see Text classification
Document frequency 85 88 89 91
Document gene indices 95. see also Databases
Document similarity assessment 83—4
Document similarity assessment, comparison metrics 86—7
Document similarity assessment, word values 88
Document vectors 84—6 85
Document vectors, latent semantic indexing 92—3
Document vectors, vocabulary building 88—90
Document vectors, weighting words 90—1
Donaldson, I., Martin, J. et al. 7 263
Dossenbach, C. Roch, S. et al. 96
Dot plots 41—2
Drosophila melanogaster, assembly of functional groups 185—9
Drosophila melanogaster, breathless gene literature search 96—9
Drosophila melanogaster, breathless gene literature search, BLAST hits pl 5.1
Drosophila melanogaster, breathless gene literature search, BLAST hits, keywords 112 113
Drosophila melanogaster, gene name detection 232
Drosophila melanogaster, genome size 18
Drosophila melanogaster, GO annotated genes 13
Drosophila melanogaster, keyword queries 101—4 103 104
Drosophila melanogaster, latent semantic indexing 94
Drosophila melanogaster, literature 183
Drosophila melanogaster, literature index 185 186
Drosophila melanogaster, literature, document frequencies of words 88 89
Drosophila melanogaster, sensitivity of NDPG 187
Durbin, R., Eddy, S. et al. 40
Dwight, S.S., Harris, M.A. et al. 187
Dynamic programming 44—7 83
Dynamic programming score matrix 45
Dynamic programming, forward algorithm 59
Dynamic programming, multiple alignment 49
Dynamic programming, tracing back 47
Dynamic programming, use in gene name recognition 238—40
Dynamic programming, Viterbi algorithm 57—9 58
Edman degradation of proteins 39—40
Eisen, M.B., Spellman, P.T. et al. 67 70 78 86 168—9 172 174 180
Electronic publishers 2—3
Electronic text resources 9
Emission probabilities, amino acids 59
Empirical distribution, article scores 164
Enhancers 23—4
Entrez Gene 11
Entropy models 206. see also Maximum entropy modeling
Entropy of a distribution 34
Enzyme Commission (EC) classification scheme 200 201
Enzymes 24
Epstein Barr virus, genome size 18
Error sources, gene expression analysis 125
Escherichia. coli, genome size 18
Eskin, E. and Agichtein, E. 107 120 121
Euclidean metric 67 87
Events, conditional probability 28—9
Events, independence 29—30
Events, probability 27—8
Evidence codes 188 189 198 199—200
Exons 21 22
Exponential distributions 32
Exponential distributions, expression value of words 142—3 pl
Exponential distributions, maximum entropy probability distribution 208
Extend step, gene name recognition algorithm 243
Faculty of 1000 4
False negatives 36
False positives 36
False positives in single gene expression series 124
False positives in single gene expression series, recognition 135 137—8
Fbgn0023184 192
Fbgn0029196 192
Fbgn0034603 (glycogenin) 192
Feature selection 88—90
Feature selection, text classification algorithms 210—12
Feature terms in name finding algorithm 233 234
Features, in expression analysis 65
Features, in maximum entropy classification 206
Feng, Z.P. 120
Fields, S. and Song, O. 141
Filtering, gene name detection 232 241
Fly functional clusters 193 pl
Fly gene expression data et, hierarchical pruning 189—2
FlyBase 9 11 88 95 109 184 190
FlyBase, lists of synonyms 229 230
FlyBase, standardized names 228
FORWARD ALGORITHM 59
Fractional reference (fr) parameter, WDD 158
Fractional references for documents, best article score system 160—1
Frequency of words see Document frequency
Ðåêëàìà