Witten I.H., Gori M., Numerico T. — Web Dragons: Inside the Myths of Search Engine Technology |
Abrahamson, David 58
Act of selection 229—230
Advanced Research Project Agency (ARPA) 15
Advertising, click fraud 167
Advertising, content-targeted 167
Advertising, cost per action 166
Advertising, Google revenue 202
Advertising, pay-per-click model 166
Advertising, return on investment 166
Advertising, search engine 165—167
Aerial Board of Control (ABC) 10
AJAX 143 237
AJAX, defined 128
AJAX, implications 129
AJAX, technology 129
Alexandrian Library 32—33
Alexandrian principle 35—37
Aliweb 133—134
AltaVista 134—135
Amazon 52—53
AND queries 118
Angelini, Giovanni 174
Anonymity 227
Anonymous ftp 132
Anti-spam war 169—174. See also spam
Anti-spam war, secrecy and 172
Anti-spam war, tactics and strategy 173—174
Anti-spam war, weapons 170—171
anti-virus software 155
AOL 135 136
Apple’s Macintosh 17—18
Archie 133
Archives, access 181
Archives, copyright and 199—201
Archives, location 181
Archives, power 180
Archives, rules 180
Archives, violence of 179—181
Archives, web 181
Ask web search service 122 136
Association for Computing Machinery 208
Associative leaps 12
Attacks, targeted network 153
Attacks, viral 154—155
Attacks, vulnerability to 153—154
Authority control 44
Authors, authority control 44
Authors, digital documents 43
Authors, library catalog 43—44
Authors, spelling variants 44
Avatars 74
Bender, Todd 143
Berne Convention 194 195 197
Berners-Lee, Tim 19 26—27 62 131 133 143 177
Bibliometrics 123—124
Bibliometrics, defined 123
Bibliometrics, impact factor 123
Bibliometrics, influence weight 123
Binary searching 109
Blogs 76
Blogs, metadata tags 128
Blogs, searching 128
Bombs 170
Book of Kells 37—38
Books See also libraries
Books, ancient 38
Books, beauty of 37—41
Books, digitizing 48—57
Books, electronic 38 39 40
Books, interactions with 40
Books, random access 41
Books, retirement 37
Boosting 157—161
Boosting, defined 156
Boosting, development 162
Boosting, link 158—161
Boosting, techniques 156
Boosting, term 157—158
Borges, Jorge Luis 3—4 20 26
Bow tie architecture 91—94 99
Brin, Sergey 118 135
British National Library 34 35 39
Britney Spears 138—139 140
broken links 66—67 152
bubbles 167—168
Bush, Vannevar 12—13
calculus ratiocinator 21
Case folding 108
catalogs 41—42
Catalogs, contents 42
Catalogs, library 42—46
Censorship 191—193
Censorship, Chinese 192
Censorship, justifiable 192—193
CERN 19 133
Chatbots 74—75
Chatbots, defined 74
Chatbots, Eliza 75
Chatbots, personalized as avatars 75
Chinese search engine 192
Clarke, Charles 101
Clarke, Mary 101 105 108 142
Classification codes, defined 45
Classification codes, Dewey Decimal Classification 46
Classification codes, library catalog 45—46
Classification codes, Library of Congress Classification 46
Click fraud 167
clickstream 205 218—219
Clickthrough rate 166
cloaking 161 164
Codex 41
Collaborative environments 75—76
Collaborative environments, blogs 76
Collaborative environments, wikis 75—76
Collaborative filtering 231
collections 230
Commercialization 202—203
Communities 94—95 219—223
Communities, defined 94 222—223
Communities, degree of membership 94
Communities, desire to belong to 212
Communities, discovering 121—123
Communities, distributed global 213
Communities, extreme example 95
Communities, information access mediation 230
Communities, locating 94
Communities, metadata 230—232
Communities, naturally occurring 122
Communities, organization 212
Communities, perspective 213
Communities, perspective, searching within 221—222
Communities, visibility determination 222
Computers, as communication tools 14—15
Computers, concordance entries 103
Computers, personal file spaces 234—235
Computers, remote 150
Computers, web and 233—238
Concordance 101—103. See also full-text indexes
Concordance, Bible 142
Concordance, computer entry illustration 103
Concordance, defined 102
Concordance, earliest 142
Concordance, entry illustration 102
Concordance, generation 103
Concordance, Greek 143
Concordance, Shakespeare 142
Connection matrix 115
Content-targeted advertising 167
Control, information 177—209 240
| Control, mechanisms 178
Control, privacy and 187
cookies, defined 73
Cookies, on user's computer 189
Cookies, uses 73—74
Copyleft 198
Copyright 49 178—179
Copyright law 193—195
Copyright on the web 198—199
Copyright owner 194 198
Copyright Term Extension Act 52 195
Copyright, complexity 200
Copyright, digital material and 179
Copyright, distribution right 194
Copyright, duration 195—196
Copyright, expiration 195 196
Copyright, fair use 195
Copyright, legal liability and 199
Copyright, other rights 194
Copyright, peer-to-peer systems and 224
Copyright, public domain and 193—201
Copyright, relinquishing 197—198
Copyright, renewing 196
Copyright, reproduction right 194
Copyright, traditional publishing and 178
Copyright, web searching/archiving and 199—201
Copyright, WIPO Treaty and 201
Corporate continent 93
Cosine similarity measure 107
Crawling 70—71 148—149.
Crawling hazards 71
Crawling permission 149
Crawling process 70
Crawling strategies 70 80
Crawling, defined 70
Crawling, uniformity and 84
Cybernetics 9—10 15
Deep Web 96—97
Deep web, control 97
Deep web, defined 96
Deep web, digital libraries 97
Deep web, information discovery 96
Deep web, size estimate 99
Defense advanced research projects agency (DARPA) 131
Denning, Dorothy 175
Derrida, Jacques 180—181
Dewey Decimal Classification 46
Digital documents, authors 43
Digital libraries 30 232—233.
Digital libraries, as deep web example 97
Digital libraries, content licensing 57
Digital libraries, metadata 232
Digital libraries, software systems for building 233
Digital libraries, threshold 32
Digital Millennium Copyright Act (DMCA) 56 201
Digital rights management 56
Digital rights management, defined 56
Digital rights management, scholarly publishing 57
Digitization 48—57
Digitization, Amazon 52—53
Digitization, commercially successful books 52—53
Digitization, Google Book Search 53—54
Digitization, Internet Archive 51—52
Digitization, Million Book project 50—51
Digitization, new publishing models 55—57
Digitization, Open Content Alliance 55
Digitization, Project Gutenberg 49—50
Disk storage growth 83
Distribution right 194
Diversity 203—204
Documents, approval/disapproval 126
Documents, cosine similarity measure 107
Documents, feature values 124
Documents, low-quality 146
Documents, relevance estimation 107
Documents, relevant, number retrieved 110
Dogpile 137
Doorways 161 164
Dublin Core 46—47 58
Dublin Core, augmentation 47
Dublin Core, defined 46
Dublin Core, metadata standard 47
Dublin Core, uniformity 47
Duke August 33
dynamic Web pages 73
Eco, Umberto 146 175
Economic bubbles 168
Economic issues 165
Electronic books, consumer adoption 56
Electronic books, designers 39
Electronic books, parameters 40
Electronic books, potential threat 56
Electronic books, reading experience 39
Electronic books, sales models 56
Electronic books, three-dimensional 39—40
Electronic books, views 40
Elmer Social Science Dictionary 99
Engelbart, Doug 17 26
Ernandes, Marco 174
ESP game 77
European search engine 203
Evolutionary models 90—91
Excite 136
ExpertRank 136
Extensible markup language (XML) 69 78—79
Extensible Markup Language (XML), defined 78
Extensible Markup Language (XML), documents 79
Extensible Markup Language (XML), expressive mechanisms 72
Extensible Markup Language (XML), message representation 78
Extensible Markup Language (XML), schemas 79
Extensible Markup Language (XML), tags 79
Fair use 194
Farewell to Alexandria 58
File Transfer Protocol (FTP) 131 132
Filespaces, personal 234—235
Folksonomy 231
Foucault, Michel 26
Free Culture (Lessig) 197 208
French National Library 35 36
Freshness, page 84
Friend-to-friend protocols 228
full-text indexes 104—105. See also concordance; indexes
Full-text indexes, beginning 134
Full-text indexes, building 110
Full-text indexes, contents 103
Full-text indexes, creation illustration 104
Full-text indexes, defined 103
Full-text indexes, order-of-relevance results 110
Full-text indexes, pointers list 105
Full-text indexes, results evaluation 110—111
Full-text indexes, search engines 105
Full-text indexes, space 105
Full-text indexes, word order 104
Future Libraries: Dreams, Madness, and Reality 58
Gaussian distribution 88 89 153.
Gaussian distribution in daily phenomena 89
Gaussian distribution, defined 88
Gaussian distribution, illustrated 89
Gaussian distribution, prediction of links 99
Generalized Markup Language (GML) 98
GNU Free Documentation License 198
GNU General Public License 198
Goldfarb, Charles 98
Google appearance 135
Google, beginnings 135
Google, Book Search 53—54
Google, clickthrough rate 166
Google, evaluation criteria 166
Google, founders 135
Google, global office and 237
