Авторизация
Поиск по указателям
Clarke C.L.A., Cormack G.V. — Information Retrieval: Implementing and Evaluating Search Engines
Обсудите книгу на научном форуме
Нашли опечатку? Выделите ее мышкой и нажмите Ctrl+Enter
Название: Information Retrieval: Implementing and Evaluating Search Engines
Авторы: Clarke C.L.A., Cormack G.V.
Аннотация: Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus, a multi-user open-source information retrieval system developed by one of the authors and available online, provides model implementations and a basis for student work.
The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems implementation perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. Additionally, professionals in computer science, computer engineering, and software engineering will find Information Retrieval a valuable reference.
After an introduction to the basics of information retrieval, the text covers three major topic areas — indexing, retrieval, and evaluation — in self-contained parts. The final part of the book draws on and extends the general material in the earlier parts, treating specific application areas, including parallel search engines, link analysis, crawling, and information retrieval over collections of XML documents. End-of-chapter references point to further reading; end-of-chapter exercises range from pencil and paper problems to substantial programming projects.
Язык:
Рубрика: Технология /
Статус предметного указателя: Готов указатель с номерами страниц
ed2k: ed2k stats
Год издания: 2010
Количество страниц: 632
Добавлена в каталог: 18.06.2014
Операции: Положить на полку |
Скопировать ссылку для форума | Скопировать ID
Предметный указатель
48
48
48
34
34
48
51 271
-nDCG 459
code 193—194 209
-value 191
code 193 209 214
233—235 239
236—237
240—242 249
160 169
229 243
229 233
code 194—195
48
64
64
49
33
49
33
397—399
abandonment 540
Abstract data type 33 48 160 577
Accumulator 145 493
Accumulator pruning 147—151 493
Accuracy 322 335
Active learning 337
Active Server Pages 510
Adaptive algorithm 38 62 65 304
Adaptive coding 177
Adaptive compression 190
AdaRank-MAP 399
Adhoc task 24
ADT see "Abstract data type"
Advanced search 160
Adversarial IR 507
Alternative hypothesis 427
Anchor text 277 507 513 536—537 555
Antony and Cleopatra 14
AOL query log 480
Apache 27
Arabic 98
Arithmetic coding 186—189 189 223
Arithmetic mean 44 68 409
ASCII 13 91
ASP 510
Assessment 8 67 73 411 441—446
Average precision 71 408
Average response time 75
B-tree 113
Background language model 290
Backlinks 534
Bag of words 60 145 151 158
Bagging 376 385—387
Basque 94
batch updates 229—231
Bayes' rule see "Bayes' Theorem"
Bayes' theorem 260 333
Bengali 97
Bernoulli trials 418
Bias 413
Bigram 19 96 115
Binary document format 11
Binary Independence Model 261—263
Binary interval 187
Binary search 39 107 111 133 217 220
Binomial distribution 418 491
Bit buffering 208
Blind feedback see "Pseudo-relevance feedback"
Blocking 186 216
Bloom filter 131
BM1 273
BM11 273
BM15 273
BM25 73 138—139 258 272 296 301 312 354
BM25F 258 277—279 539
body 193
Bonferroni correction 428
Bookstein, Abraham 267
Boolean retrieval 52 63—66 137 573
Boosting 376 387—388
Bootstrap 424
Bootstrap aggregation 386
Bosak, Jon 11 29
Bose — Einstein statistics 300
Bpref 461
Branch prediction 199 208 595
Bray, Tim 169
Browser extensions 526
Buckley, Chris 71
Burrows — Wheeler compression 191
Burst trie 133
Byte-aligned coding 205—206
bzip2 191
Cache 41 479—484 544
Cache hierarchy 481
Cache line 594
Cache policy 482—483
Candidate phrase 39
Candidate solution 63
Carnegie Mellon University 27
Case normalization 84
Categorization 4 310 320 376
Central limit theorem 431 439
CGI 510
Chaining 107 121 122
Chinese 95—96 98
CJK languages 95—96
Classification 312 331—366
Classification, binary 331
Classification, decision trees 360 364—366
Classification, feature engineering 338—339
Classification, kernel methods 357
Classification, linear classifier 349—353
Classification, multicategory 388—394
Classification, perceptron algorithm 352
Classification, probabilistic 339—349
Classification, Rocchio's method 354
Classification, SVM 353
Clef 98
Cleverdon, Cyril 460
Clickthrough curve 540
Clickthrough inversion 540
ClueWeb09 collection 25
Clustering 4 215 224 337
Code 177
Code tree 179 181 183
Codepomt (Unicode) 91 95 97
Combiner 502
Common Gateway Interface 510
Compression model 19 177—180 188—190 196 202 361—363
Confidence interval 416—426 432 470
Confidence level 416
Contingency table 332
Continuation flag 205
Cosine similarity 56 72
CPU cache 109 124 594
Cranfield paradigm 460
Cranfield tests 460
Crawler trap 557
Cross-entropy 361 373
Cross-validation 384—385 399
Cutting, Doug 27 504
Cutts, Matt 553
Damping factor (PageRank) 518
DDS see "Deadline-driven scheduling"
De Morgan's laws 65
Deadline-driven scheduling 478
Decision tree 360 364—366
DECO algorithm 539
Decoder 175
Decoding performance 204—209 223
Decompounding 94 98
Deep Web 511
Degrees of freedom 423
Delta code see " code"
Density function see "Probability density function"
Desktop search 3 251
DFR see "Divergence from randomness"
DICTIONARY 33 106—110
Dictionary compression 216—222
Dictionary group 217
Dictionary interleaving 114—118 221—222
Dictionary operations 106
Dictionary, hash-based 107
Dictionary, sort-based 107
Dictionary-as-a-string approach 108
diff 252
Digital library 4
Dijkstra, Edsger 86
Dimensionality reduction 338 356
Dirichlet smoothing 291 295
Disk seek 111 145 233 238 493
distribution 417
Distribution, binomial 418 491
Distribution, empirical 419
Distribution, exponential 473
Distribution, Gaussian 417
Distribution, geometric 192 196 210
Distribution, normal 417
Distribution, Poisson 267—268 473 548
Distribution, Student's t-distribution 423
Distribution, Zipfian see "Zipf's law"
Divergence from randomness 287 298—302
Diversity 455—460 537
DMC 190 361
Docid index 49
document 7—8 45
Document format, binary 11
Document format, HTML 9
Document format, Microsoft Office 13
Document format, OOXML 13
Document format, PDF 11
Document format, PostScript 11
Document format, raw text 13
Document format, SGML 11
Document format, XML 11 565
Document length normalization 78 139 271—273 295 296 299 301
Document Map 105 214
Document partitioning 490—493 496
Document reordering 214—216
Document structure, logical 11
Document structure, physical 11
document type declaration 570
Document Type Definition see "DTD"
Document, element 7 564
Document, update 7
Dot product 55
Double index 86
DTD 568—570
Duplicate Web page 549
Dutch 94 95 98
Dwell time 541
Dynamic Markov compression see "DMC"
Dynamic page 510
Dynamic programming 492
Dynamic rank 517 535
EBCDIC 91
effect size 426
Effectiveness 8 67 538 584
Efficiency 8 75 468
Eigenvalue 529
Eigenvector 529
Eliteness 267 299 301
Empirical distribution 419
Encoder 175
English 95
Enterprise search 4 511
entropy 180 223 360
Entropy of English text 190 191
Entropy, relative 296
Ergodic 529
Euclidean distance 56
Exhaustivity 8 584
Exponential search see "Galloping search"
Extensible Markup Language see "XML"
F-measure 68 371
False negative 332
False positive 332
Fault tolerance 472 496—498
FCFS see "First-come first-served"
Feature engineering 338—339
Feedback see "Pseudo-relevance feedback"
File system search 3
Filtering 4 310 313 320
filtering, spam 325 342
Finite-context model 178 190
Finnish 94 95 97
Fire 98
First-come first-served 473 474 478
First-order language model 19
Fisher, Ronald Aylmer 414 427
Fixed-effect model 440
Fixed-point iteration 519
Flash 13
Flat index see "Schema-independent index"
FLWOR expression 574
Follow matrix 523
Forward index 131
Fragmentation 233 242
Fragmentation, internal 108 123 124 483
French 94 95
Frequency index 49
Front coding 219—221
Function word 89
Fusion 376 377—381
Galloping search 42—44 62 65 111 246
Gamma code see " code"
Garbage collection 245—250
Gaussian distribution 417
GC-list 160—162
Generalizability 415
Generalized concordance list see "GC-list"
Generative model 286 289
Geometric mean 44 409
Geometric mean average precision see "GMAP"
Geometric partitioning 240—242
German 95 98
GMAP 410 422
Goldilocks 75
Golomb code 196—200 209
Gosset, William Sealy 423
GOV2 collection 25
GPU 504
Graded relevance see "Relevance"
Gradient descent 348
Granularity 112
GROUPING 124
gzip 191
Hadoop 504
Hamlet 87 90 290 508
Harmonic mean 68
Harmonic mean, weighted 68
Hash table 107
Hathaway, Anne 51
Реклама