Clarke C.L.A., Cormack G.V. — Information Retrieval: Implementing and Evaluating Search Engines :: Электронная библиотека попечительского совета мехмата МГУ

Главная Ex Libris Книги Журналы Статьи Серии Каталог Wanted Загрузка ХудЛит Справка Поиск по индексам Поиск Форум

Авторизация

Поиск по указателям

Красота

Clarke C.L.A., Cormack G.V. — Information Retrieval: Implementing and Evaluating Search Engines

Clarke C.L.A., Cormack G.V. — Information Retrieval: Implementing and Evaluating Search Engines

Обсудите книгу на научном форуме

Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter

Название: Information Retrieval: Implementing and Evaluating Search Engines

Авторы: Clarke C.L.A., Cormack G.V.

Аннотация:

Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus, a multi-user open-source information retrieval system developed by one of the authors and available online, provides model implementations and a basis for student work.
The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems implementation perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. Additionally, professionals in computer science, computer engineering, and software engineering will find Information Retrieval a valuable reference.
After an introduction to the basics of information retrieval, the text covers three major topic areas — indexing, retrieval, and evaluation — in self-contained parts. The final part of the book draws on and extends the general material in the earlier parts, treating specific application areas, including parallel search engines, link analysis, crawling, and information retrieval over collections of XML documents. End-of-chapter references point to further reading; end-of-chapter exercises range from pencil and paper problems to substantial programming projects.

Язык:

Рубрика: Технология/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 2010

Количество страниц: 632

Добавлена в каталог: 18.06.2014

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID

Предметный указатель

Heap      128 141 184
Hidden Markov model      306
Hidden Web      511
HITS algorithm      532—534 554
Holdout validation      383
Holistic twig joins      585
Home page finding      539
Host crowding      493
HTML      9 525 567
HTML anchor      277 536
HTML body      277
HTML header      277
Huffman code      181—185 189 200
Huffman code, canonical      184—185 199 201
Huffman code, length-limited      185 201 209
Hungarian      94
Hybrid index maintenance      238—239
Hyperlinks      9
HyperText Markup Language      see "HTML"
Hypothesis test      427—429
IDF      see "Inverse document frequency"
IE      see "Information extraction"
Impact ordering      153 494
Implicit user feedback      526 535 540 555
In-degree      509
Incremental crawling      547
Independence assumption      261
Index block size      116
Index construction      118—131
Index construction, in-memory      119—125
Index construction, merge-based      127—131 229
Index construction, sort-based      125—127
Index construction, two-pass      123
Index partition      127 228 240 471 488
Index pruning      153—160 495
Index types      46—51
Index updates, distributed      490
Index updates, incremental      231—242
Index updates, non-incremental      243—251
Indexable Web      511
Indexing time      105
Indri      27—28
Inex      565 579
INEX, CAS task      579
INEX, CO task      579
infAP      449—450
Inference network model      280
Inferred average precision      see "infAP"
Information extraction      5
Information gain      366
Information need      5
Informational query      514
Inner product      55
Insert-at-back heuristic      121
Inter-query parallelism      488
Interactive search and judging      443
Interpolative coding      202—204 213 223
Intra-query parallelism      489 494
Intranet      511
Invalidation list      243—244
Inverse document frequency      57 264 581
inverted index      33
Inverted index, docid      49
Inverted index, frequency      49
Inverted index, positional      49
Inverted index, schema-dependent      48
Inverted index, schema-independent      33 48 49
Irish      94
Italian      94 95
Japanese      95 98
JavaScript      13
Jelinek — Mercer smoothing      291 295
Jump vector (PageRank)      523
Kendall's $\tau$       445
Kendall's notation      474
Kendall, David      474
Kendall, Maurice      445
Kernel trick      358
KL divergence      see "Kullback — Leibler divergence"
Korean      95
Kullback — Leibler divergence      156 286 296 527
Lam      328
Landmark-diff      252
Language model      17—23
Language modeling      258 286 287—298
Laplace's law of succession      301
Laplace, Pierre-Simon      298
latency      8 470
Latent Semantic Analysis      78
Lazy evaluation      244
Learning , on-line      337
Learning , semi-supervised      336
Learning , supervised      336
Learning , transductive      336
Learning , unsupervised      337
Learning to rank      312 376 394—400
Learning, incremental      337
Legal search      46
Lemma      87
Lemmatization      87
Length normalization      see "Document length normalization"
LETOR      399
Lexeme      87
LFU      482
Lightweight structure      160—168
Likelihood ratio      333 341
Linear classifier      349
Link analysis      517—534 554
Link function      356
Linked list      122
Linked list, unrolled      123 124 130
List compression, batched      196
List compression, global      195
List compression, local      195 210
ListNet      399
Little's Law      475 476
Little, John      475
LLRUN      200—201 209 212 253
LLRUN-k      202
LOG      422
Log-odds      260
Logical document structure      11
Logistic regression      346 383 389
Logistic regression, gradient descent      348
Logistic regression, multicategory      392
logit      260 422
Logit average      328
Long tail      480 513
Lookup table      208
Lovins stemmer      97
LRU      482
LSA      78
Lucene      27
m-cover      303
M/M/1 queueing model      475—477
Macbeth      9 33 290 508 567 577
Machine learning      312 336
Macro-average      322
MAP      71—74 137 409 444 447 584
MapReduce      498—503
Markov chain      23 529
Markov chain, aperiodic      529
Markov chain, continuous      475
Markov chain, irreducible      529
Markov chain, periodic      529
Markov model      21—23 362
Maximal marginal relevance      461 493
Maximum likelihood      17 289 297
MaxScore      143—145 491
Mean average precision      see "MAP"

Mean reciprocal rank      see "MRR"
Mean, arithmetic      44 68 409
Mean, geometric      44 409
Mean, harmonic      68
Mean, weighted harmonic      68
Merge operation, cascaded      126 129
Merge operation, multiway      126 128 241
Meta-analysis      415 439—441
Metalanguage      11
Metasearch      380
Micro-average      322
Microsoft Office      13
Monty Python's Flying Circus      78
Morphology      86
Move-to-front heuristic      121
Move-to-front pooling      444
MRR      322 409 539
Multicategory classification      388—394
Multicategory ranking      388—394
N      48
N-gram      92—93 95 96
Naieve Bayes      334
Named page finding      538
Navigational query      513 539
nDCG      451—453 538
Near-duplicate Web page      549
New Oxford English Dictionary      160 169
NEXI      564 572—573
NeXT      33
nextDoc      49
NIST      23
No Merge      232 233
Nonparametric code      192—195 216
Normal distribution      417
Normalized Discounted Cumulative Gain      see "nDCG"
Novelty      455—460 537 549
NTCIR      98
nugget      459
Null hypothesis      427
Obama, Barack      441 515
OCR      97 see
Odds      333
Odds ratio      333
ODP      see "Open Directory Project"
Offset      48
Okapi BM25      see "BM25"
Okapi BM25F      see "BM25F"
Omega code      see " $\omega$ code"
On-line indexing      see "Index updates"
Open Directory Project      526 547
Open source      27
Open Source IR Systems      27—28
Optical Character Recognition      4 85
Order-preserving      260
Orthography      94
Out-degree      509
Overfitting      338 349
Overlap      580
p-value      426
Package-Merge      185
PageRank      105 517—532 554
PageRank, focused      526
PageRank, personalized      526
PageRank, topic-oriented      526
Parametric code      195—201 216
Passage retrieval      302—305
Path expressions      571
PCA      554
PDF      11
Pearson, Karl      427
Per-term index      112 133
Perceptron algorithm      352—353 357
Perron — Frobenius theorem      530
Phrase search      35—39 111
Physical document structure      11
Pike, Rob      97
Pinyin      96
Pivoted document length normalization      78
Poisson distribution      267—268 473 548
Poisson, Simeon Denis      268
Polish      94
Pooling (TREC)      73—75 411 441 443—448
Popper, Karl Raimund      427
Population      414
Porter stemmer      87
Porter, Martin      87 95
Portuguese      94
Position tree      see "Suffix tree"
Positional index      49
Postings list      33 110—114 161
PostScript      11
Power      406 434—438
Power method      530
PPM      190
Pre-allocation factor      123
Pre-allocation, proportional      123 236
Preamble      184 186 212 223
PRECISION      67—68 318 328 407
Precision at k documents      69 408
Precision of measurement      413
Precision, interpolated      70
Prefix query      106 110 113 133
Prefix-free      178
prev      33
prevDoc      49
PRF      see "Pseudo-relevance feedback"
Principal component analysis      554
Prior odds      334
Probabilistic model      258
Probability density function      341 417 473
Probability density function, cumulative      417
probability distribution      see "Distribution"
Probability Ranking Principle      8 259 287
Proper binary tree      179
Prosecutor's fallacy      332
Proximity ranking      see "Term proximity"
PRP      see "Probability Ranking Principle"
Pseudo-frequencies      279
Pseudo-relevance feedback      131 156 275—277 469
qrels      24 411 441 443
Query      6
Query abandonment      540
Query arrival rate      473
Query drift      277
Query execution plan      244
Query expansion      273 297
Query log      98 472 480 513
Query processing, document-at-a-time      139—145
Query processing, term-at-a-time      145—151 493
Query processing, top-k      142—145
Query reformulation      540
Query term frequency      271
Query time      105
Question answering      5 302 457
Queue discipline      474 478
Queueing theory      472—477
Random access      35 111 116 196 216
Random error      413
Range encoding      223
Rank effectiveness      454
Rank-biased precision      461
Rank-equivalent      260
Rank-preserving      260
RankBoost      399
RankEff      454
RankSVM      399
realloc      123 124
Recall      67—68 88 138 318 328 407
Recall-precision curve      70
Receptionist      490

1 2 3

Реклама

© Электронная библиотека попечительского совета мехмата МГУ, 2004-2026

Электронная библиотека мехмата МГУ

Valid HTML 4.01!

|

Valid CSS!

О проекте