Àâòîðèçàöèÿ
Ïîèñê ïî óêàçàòåëÿì
Witten I.H., Moffat A., Bell T.C. — Managing Gigabytes: Compressing and Indexing Documents and Images
Îáñóäèòå êíèãó íà íàó÷íîì ôîðóìå
Íàøëè îïå÷àòêó? Âûäåëèòå åå ìûøêîé è íàæìèòå Ctrl+Enter
Íàçâàíèå: Managing Gigabytes: Compressing and Indexing Documents and Images
Àâòîðû: Witten I.H., Moffat A., Bell T.C.
Àííîòàöèÿ: In this fully updated second edition of the highly acclaimed Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading—an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web.
ßçûê:
Ðóáðèêà: Òåõíîëîãèÿ /
Ñòàòóñ ïðåäìåòíîãî óêàçàòåëÿ: Ãîòîâ óêàçàòåëü ñ íîìåðàìè ñòðàíèö
ed2k: ed2k stats
Èçäàíèå: Second edition
Ãîä èçäàíèÿ: 1999
Êîëè÷åñòâî ñòðàíèö: 519
Äîáàâëåíà â êàòàëîã: 15.11.2009
Îïåðàöèè: Ïîëîæèòü íà ïîëêó |
Ñêîïèðîâàòü ññûëêó äëÿ ôîðóìà | Ñêîïèðîâàòü ID
Ïðåäìåòíûé óêàçàòåëü
Portable bitmap format 416 418 467
Portable graymap format 416 467
Posting file see "Inverted file"
PPM 22 61—65 100—101
ppm program 91 92 102
ppm program for bilevel images 273
ppm program, compression results 407
ppm program, initial model 407—409
ppm program, memory 22 99 410
ppm program, random decoding 392
ppm program, speed 97 391
ppm program, synchronization 407—409
PPM, comparison to block-sorting method 69
PPM, exclusions 62
PPM, method A 62
PPM, method C 63
PPM, method D 63
PPM, method X 63
PPM, performance 22
PPM, PPM* 65 101
PRECISION 153 154 188—191 194 215
Prediction by Partial Matching see "PPM"
Prefix code 31 100
Priming of model 311 391 407—409
Priority queue see "Heap"
Probabilistic ranking 216—218 222
Probability estimation 24—25
Probability estimation in JBIG 287—288
Progressive transmission 266—268 281 303—309
Progressive transmission in JBIG 286 309
Progressive transmission in JPEG 298 302 309
Progressive transmission of textual images 313
Progressive transmission, error modeling 307
Project Gutenberg 434—435 453 473
Projection profile 361—367 386 388
Pyramid coding 303 304
Q-coder 59
Queen Elizabeth I 432
Query 153—221 see "Ranked
Query in mg 423—428
Query term see "Term"
Query, distributed 218—221
Query, fuzzy Boolean 222
Query, interactive 214—218
Query, partially specified term 170—173
Quicksort 231 234 261
Random access see "Synchronization"
Random graph 167—168 see
Ranked query 153 154 180—187 see "Relevance"
Ranked query, coordinate matching 155 181
Ranked query, inner product measure 181—185
Ranked query, iterative 214—216
Ranked query, normalization 185 186 201 203—205
Ranked query, optimal 214
Ranked query, probabilistic 216—218 222
Ranked query, pruning strategy 208
Ranked query, statistical methods 155
Ranked query, vector space methods 155 185—187
READ code 270—274
Recall 153 154 188—191 194
Recall-precision curve 191 194
Reconstructed text 314 316 341
Regions, classification of 385—388
Relative term frequency 183—184
Relevance 153 180 189
Relevance, evaluation 215
Relevance, feedback 214—216 222
Relevance, judgments 191
Representative terms 104 153 444
Residual image 318 341—343
Resolution 269 418
Resolution, reduction 282—286
Response (to query) 153
Retrieval effectiveness 188—198 210 215 see "Precision" "Relevance"
Rice coding 222 247 293—294
Rotated lexicon 172—173 222
Rotation operation 358
Run-length coding for Huffman coding 50
Run-length coding for length limited Huffman coding 404
Run-length coding for LZ coding 76
SAKDC see "Swiss Army Knife Data Compression"
sc-i 296
Scanning 265 344—349 355 437
Scanning, upside down 367
Screening library templates 332—333 336
Search engines 194—197 220 438—440
Segmentation 320—325 355 372—384 388
Segmentation of text strings 378—382
Segmentation, bottom-up 374
Segmentation, mark-based 376—378
Segmentation, recursive X-Y cut 376
Segmentation, run-length smoothing 374
Segmentation, top-down 376
Segmentation, using document grammar 383—384
Selection problem 210 222
Self-entropy 277 280
Self-expanding data 78
Self-synchronizing code 87—90 101
Semi-adaptive model see "Semistatic model"
Semi-static model 27—28 79 391—394
Shakespeare, William 2 14 19 92 256 435 458
Shannon, Claude 25 99 116
Shear transformation 358
SIGIR conference 221
Signature file 130—139 151 226 see
Signature file for TREC 139
Signature file, advantages of 135 144—145
Signature file, comparison with other methods 143—145
Signature file, compression of 141
Signature file, disadvantages of 137—138 143—144 260
Signature file, dynamic collections 260
Signature file, effect of stemming 147
Signature file, false-match checking 132
Signature file, negated terms 132—133
Signature file, size of 134—139
Similarity measure 154 180 see
Skew 355
skipping 176—178 207—208 423—428
Skipping, multiple indexes 220
Skipping, overhead of 425 428
Skipping, performance 222 423 429
Sorting 222 see
Sorting for inversion 231—245
Sorting of accumulators 210—213
Sorting of marks 325
Sorting, Burrows — Wheeler transform 65
Sorting, external merge sort 231
Source coding theorem 25 116
Spamming for World Wide Web searching 195
Spectral selection 302 303
Speed see "Time"
Spiders for the World Wide Web 194
Standard see "CCITT fax standard" "JBIG" "JPEG" "MPEG"
Static collections 226
Static collections and mg 451
Static model 27 116—119 274
Statistical compression 23
Stemming 106 146—147 151 see
Stemming in mg 463
Stemming in NZDL 470
Stemming in query 174
Stemming, effect on index size 147 427
Stop list see "Stop words"
Stop words 13 147—150 427
Stop words, construction of index 106
Stop words, disadvantages of 148—149
Stop words, effect on index size 149
Stop words, TREC 194
String matching see "Pattern matching"
Strong, J. 20
Successive approximation 302 303
Swiss Army Knife Data Compression (SAKDC) 65
Symbol numbers and offsets 339—340
Symbol numbers and offsets, compression of 341 349
Symbolwise models 23 61—74
Synchronization 85—90 176—178 391
Synchronization for arithmetic coding 86—87
Synchronization for fax coding 269
Synchronization for inverted list 176—179
Synchronization for ppm 407—409
synchronization, cost of 85—86 395
t-gaps 236 240
t-gaps, cost of 236
Taxonomy, executable 65
Template matching 325—337
Template matching, combined size-independent strategy 330 334 353
Template matching, combined symbol matching 353
Template matching, comer alignment 418
Template matching, compression-based 330—332 334 338
Template matching, exclusive-OR 334
Template matching, global 326—328
Template matching, local 329—330
Template matching, pattern matching and substitution 334
Template matching, performance comparison 333—337
Template matching, weighted AND-NOT 334 353
Template matching, weighted exclusive-OR 327 334
Term 104 153 444
Term, partially specified see "Pattern matching"
Text compression 21—102 390—415
Text compression in mg 394 395
Text compression, compression rate 91—95 98 406—407
Text compression, dynamic collections 256 412—415
Text compression, memory 99 410—412
Text compression, performance 90—99 406—415
Text compression, speed 95—97 390 409—410
Text compression, to 13 bytes 436
Textual image 7 15 263 311—353
Textual image compression 314—351 355 415—419 see
Textual image compression in mg 417 466
Textual image compression, performance 343
Textual image compression, system considerations 349
Texture analysis 385
Thesaural substitution 106 216
TidBITS 473
Time for bilevel image compression 416
Time for Boolean queries 174 423—425 429
Time for compression 409 428
Time for compression-based template matching 332
Time for decoding inverted file 425
Time for decompression 409
Time for flood-fill 323
Time for Hough transform 366
Time for Huffman code construction 49
Time for inversion 420
Time for library pruning 339
Time for mark-based segmentation 378
Time for ranked queries 207—213 423 425—429
Time for skew detection 369
Time for symbol matching 319
Time for template matching 326
Training set 216
Transform for progressive image transmission 303
Transform, Burrows — Wheeler 65
Transform, discrete cosine 298
Transform, Hough 358
TREC collection 108 150 261
TREC collection, accumulators 207
TREC collection, address bits 394
TREC collection, approximate weights 205
TREC collection, compression leakage 236 249
TREC collection, compression performance 98
TREC collection, decoding memory 410—412
TREC collection, dynamic inverted file 259
TREC collection, false match 136
TREC collection, final compressed size 422 453 462
TREC collection, frequent words 147—148
TREC collection, Huffman coding 404
TREC collection, inversion 225 254 419
TREC collection, lexicon 161 172
TREC collection, minimal perfect hash function 168
TREC collection, paged version 423 426
TREC collection, ranked queries 155 192—194
TREC collection, rotated lexicon 173
TREC collection, skipping 178 207
TREC collection, stemming 147
TREC collection, stop words 149
TREC collection, time to build database 420
TREC collection, within-document frequencies 200
TREC project 108 150 192—194 389
Tresor de la Langue Frangaise 151
Trie 80—81
Trigram indexing see "n-gram indexing"
Trinity College, Dublin see "Library"
Two-level context compression 277—279 347 415
Two-level context compression, parameters 417
Unary coding 117 199 240 248
Update cache 259
Upside-down text 366 367
Ussher, Archbishop 432
Vector quantization 303
Veronica 438
Virtual memory 225 230
Visually impaired readers 477
Vocabulary see "Lexicon"
WebCrawler 439
Weight see also "Approximate weights"
Weight, document 187 201 260
Weight, document-term 183—185 260
Weight, term 155 183—184 209 260
Wells, H.G. 435 449
White space 339
Wildcard 155 170—173
William of Occam 446
Window see "Compression dictionary-based"
Within-document frequencies 182 198—201 208 236 421
Word-based model 72—74 392 406
Word-based model, reducing memory of 410
Word-level index 112—114
Words, clustering 122
Words, parsing 73
Wordsworth, William 1 19
World brain 435 449
World encyclopedia 435
World Wide Web 434 443
World Wide Web, adult sites 196
World Wide Web, advertising 196—197 221
World Wide Web, dead links 220
World Wide Web, distributed retrieval 218
World Wide Web, effectiveness of search engines 197
World Wide Web, GIF images 290
World Wide Web, images 264
World Wide Web, NZ Digital Library 469
World Wide Web, PNG images 290
World Wide Web, progressive image display 303
World Wide Web, ranking queries 214
World Wide Web, search engines 5 194—197 220 222 438—440
Wright, E.V. 27 102
X-Y tree 376
Young, R. 19
Zero-frequency problem 28—29 62 100 256 277 413
Zipf's law 183 386
Zipf, George 183
Ziv — Lempel coding 20 22 23 75—84 254 391 see "LZ78" "gzip" "compress"
Ziv — Lempel coding in GIF and PNG 290
Ziv — Lempel coding, speed of 98
Ziv, Jacob 22 75
Ðåêëàìà