Авторизация
Поиск по указателям
Clarke C.L.A., Cormack G.V. — Information Retrieval: Implementing and Evaluating Search Engines
Обсудите книгу на научном форуме
Нашли опечатку? Выделите ее мышкой и нажмите Ctrl+Enter
Название: Information Retrieval: Implementing and Evaluating Search Engines
Авторы: Clarke C.L.A., Cormack G.V.
Аннотация: Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus, a multi-user open-source information retrieval system developed by one of the authors and available online, provides model implementations and a basis for student work.
The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems implementation perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. Additionally, professionals in computer science, computer engineering, and software engineering will find Information Retrieval a valuable reference.
After an introduction to the basics of information retrieval, the text covers three major topic areas — indexing, retrieval, and evaluation — in self-contained parts. The final part of the book draws on and extends the general material in the earlier parts, treating specific application areas, including parallel search engines, link analysis, crawling, and information retrieval over collections of XML documents. End-of-chapter references point to further reading; end-of-chapter exercises range from pencil and paper problems to substantial programming projects.
Язык:
Рубрика: Технология /
Статус предметного указателя: Готов указатель с номерами страниц
ed2k: ed2k stats
Год издания: 2010
Количество страниц: 632
Добавлена в каталог: 18.06.2014
Операции: Положить на полку |
Скопировать ссылку для форума | Скопировать ID
Предметный указатель
Heap 128 141 184
Hidden Markov model 306
Hidden Web 511
HITS algorithm 532—534 554
Holdout validation 383
Holistic twig joins 585
Home page finding 539
Host crowding 493
HTML 9 525 567
HTML anchor 277 536
HTML body 277
HTML header 277
Huffman code 181—185 189 200
Huffman code, canonical 184—185 199 201
Huffman code, length-limited 185 201 209
Hungarian 94
Hybrid index maintenance 238—239
Hyperlinks 9
HyperText Markup Language see "HTML"
Hypothesis test 427—429
IDF see "Inverse document frequency"
IE see "Information extraction"
Impact ordering 153 494
Implicit user feedback 526 535 540 555
In-degree 509
Incremental crawling 547
Independence assumption 261
Index block size 116
Index construction 118—131
Index construction, in-memory 119—125
Index construction, merge-based 127—131 229
Index construction, sort-based 125—127
Index construction, two-pass 123
Index partition 127 228 240 471 488
Index pruning 153—160 495
Index types 46—51
Index updates, distributed 490
Index updates, incremental 231—242
Index updates, non-incremental 243—251
Indexable Web 511
Indexing time 105
Indri 27—28
Inex 565 579
INEX, CAS task 579
INEX, CO task 579
infAP 449—450
Inference network model 280
Inferred average precision see "infAP"
Information extraction 5
Information gain 366
Information need 5
Informational query 514
Inner product 55
Insert-at-back heuristic 121
Inter-query parallelism 488
Interactive search and judging 443
Interpolative coding 202—204 213 223
Intra-query parallelism 489 494
Intranet 511
Invalidation list 243—244
Inverse document frequency 57 264 581
inverted index 33
Inverted index, docid 49
Inverted index, frequency 49
Inverted index, positional 49
Inverted index, schema-dependent 48
Inverted index, schema-independent 33 48 49
Irish 94
Italian 94 95
Japanese 95 98
JavaScript 13
Jelinek — Mercer smoothing 291 295
Jump vector (PageRank) 523
Kendall's 445
Kendall's notation 474
Kendall, David 474
Kendall, Maurice 445
Kernel trick 358
KL divergence see "Kullback — Leibler divergence"
Korean 95
Kullback — Leibler divergence 156 286 296 527
Lam 328
Landmark-diff 252
Language model 17—23
Language modeling 258 286 287—298
Laplace's law of succession 301
Laplace, Pierre-Simon 298
latency 8 470
Latent Semantic Analysis 78
Lazy evaluation 244
Learning , on-line 337
Learning , semi-supervised 336
Learning , supervised 336
Learning , transductive 336
Learning , unsupervised 337
Learning to rank 312 376 394—400
Learning, incremental 337
Legal search 46
Lemma 87
Lemmatization 87
Length normalization see "Document length normalization"
LETOR 399
Lexeme 87
LFU 482
Lightweight structure 160—168
Likelihood ratio 333 341
Linear classifier 349
Link analysis 517—534 554
Link function 356
Linked list 122
Linked list, unrolled 123 124 130
List compression, batched 196
List compression, global 195
List compression, local 195 210
ListNet 399
Little's Law 475 476
Little, John 475
LLRUN 200—201 209 212 253
LLRUN-k 202
LOG 422
Log-odds 260
Logical document structure 11
Logistic regression 346 383 389
Logistic regression, gradient descent 348
Logistic regression, multicategory 392
logit 260 422
Logit average 328
Long tail 480 513
Lookup table 208
Lovins stemmer 97
LRU 482
LSA 78
Lucene 27
m-cover 303
M/M/1 queueing model 475—477
Macbeth 9 33 290 508 567 577
Machine learning 312 336
Macro-average 322
MAP 71—74 137 409 444 447 584
MapReduce 498—503
Markov chain 23 529
Markov chain, aperiodic 529
Markov chain, continuous 475
Markov chain, irreducible 529
Markov chain, periodic 529
Markov model 21—23 362
Maximal marginal relevance 461 493
Maximum likelihood 17 289 297
MaxScore 143—145 491
Mean average precision see "MAP"
Mean reciprocal rank see "MRR"
Mean, arithmetic 44 68 409
Mean, geometric 44 409
Mean, harmonic 68
Mean, weighted harmonic 68
Merge operation, cascaded 126 129
Merge operation, multiway 126 128 241
Meta-analysis 415 439—441
Metalanguage 11
Metasearch 380
Micro-average 322
Microsoft Office 13
Monty Python's Flying Circus 78
Morphology 86
Move-to-front heuristic 121
Move-to-front pooling 444
MRR 322 409 539
Multicategory classification 388—394
Multicategory ranking 388—394
N 48
N-gram 92—93 95 96
Naieve Bayes 334
Named page finding 538
Navigational query 513 539
nDCG 451—453 538
Near-duplicate Web page 549
New Oxford English Dictionary 160 169
NEXI 564 572—573
NeXT 33
nextDoc 49
NIST 23
No Merge 232 233
Nonparametric code 192—195 216
Normal distribution 417
Normalized Discounted Cumulative Gain see "nDCG"
Novelty 455—460 537 549
NTCIR 98
nugget 459
Null hypothesis 427
Obama, Barack 441 515
OCR 97 see
Odds 333
Odds ratio 333
ODP see "Open Directory Project"
Offset 48
Okapi BM25 see "BM25"
Okapi BM25F see "BM25F"
Omega code see " code"
On-line indexing see "Index updates"
Open Directory Project 526 547
Open source 27
Open Source IR Systems 27—28
Optical Character Recognition 4 85
Order-preserving 260
Orthography 94
Out-degree 509
Overfitting 338 349
Overlap 580
p-value 426
Package-Merge 185
PageRank 105 517—532 554
PageRank, focused 526
PageRank, personalized 526
PageRank, topic-oriented 526
Parametric code 195—201 216
Passage retrieval 302—305
Path expressions 571
PCA 554
PDF 11
Pearson, Karl 427
Per-term index 112 133
Perceptron algorithm 352—353 357
Perron — Frobenius theorem 530
Phrase search 35—39 111
Physical document structure 11
Pike, Rob 97
Pinyin 96
Pivoted document length normalization 78
Poisson distribution 267—268 473 548
Poisson, Simeon Denis 268
Polish 94
Pooling (TREC) 73—75 411 441 443—448
Popper, Karl Raimund 427
Population 414
Porter stemmer 87
Porter, Martin 87 95
Portuguese 94
Position tree see "Suffix tree"
Positional index 49
Postings list 33 110—114 161
PostScript 11
Power 406 434—438
Power method 530
PPM 190
Pre-allocation factor 123
Pre-allocation, proportional 123 236
Preamble 184 186 212 223
PRECISION 67—68 318 328 407
Precision at k documents 69 408
Precision of measurement 413
Precision, interpolated 70
Prefix query 106 110 113 133
Prefix-free 178
prev 33
prevDoc 49
PRF see "Pseudo-relevance feedback"
Principal component analysis 554
Prior odds 334
Probabilistic model 258
Probability density function 341 417 473
Probability density function, cumulative 417
probability distribution see "Distribution"
Probability Ranking Principle 8 259 287
Proper binary tree 179
Prosecutor's fallacy 332
Proximity ranking see "Term proximity"
PRP see "Probability Ranking Principle"
Pseudo-frequencies 279
Pseudo-relevance feedback 131 156 275—277 469
qrels 24 411 441 443
Query 6
Query abandonment 540
Query arrival rate 473
Query drift 277
Query execution plan 244
Query expansion 273 297
Query log 98 472 480 513
Query processing, document-at-a-time 139—145
Query processing, term-at-a-time 145—151 493
Query processing, top-k 142—145
Query reformulation 540
Query term frequency 271
Query time 105
Question answering 5 302 457
Queue discipline 474 478
Queueing theory 472—477
Random access 35 111 116 196 216
Random error 413
Range encoding 223
Rank effectiveness 454
Rank-biased precision 461
Rank-equivalent 260
Rank-preserving 260
RankBoost 399
RankEff 454
RankSVM 399
realloc 123 124
Recall 67—68 88 138 318 328 407
Recall-precision curve 70
Receptionist 490
Реклама