Hastie T., Tibshirani R., Friedman J. — The Elements of Statistical Learning Data Mining, Inference and Prediction
Íàçâàíèå: The Elements of Statistical Learning Data Mining, Inference and Prediction
Àâòîðû: Hastie T., Tibshirani R., Friedman J.
Àííîòàöèÿ: During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics.
Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes theimprtant ideas in these areas ina common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a vluable resource for statisticians and anyone interested in data mining in science or industry.
The book's coverage is broad, from supervised learing (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting — the first comprehensive treatment of this topic in any book.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
Ãîä èçäàíèÿ: 2002
Êîëè÷åñòâî ñòðàíèö: 533
Jones, M. 144 148 155 368 514
Jooste, P. 100 520
Jordaan, P. 100 520
Jordan, M. 296 516 517
K medoid clustering 468—472
K-means clustering 412 461—465
K-nearest neighbor classifiers 415
Kabalin, J. 3 47 521
Karhunen — Loeve transformation (principal components) 62—63 66 485—491
Karush — Kuhn — Tucker conditions 110 374
Kaski, S. 485 503 517
Kaufman, L. 469 480 503 517 518
Kearns, M. 517
Kelly, C. 477 517
Kennard, R.W 60 75 516
Kent, J. 75 111 495 504 518
Kerkyachairan, G. 331 511
Kernel density classification 184
Kernel density estimation 182—189
Kernel function 183
Kernel methods 182—189
Kittier, J.V. 432 511
Knight, K. 255 521
Knot 117 283
Kohonen, T. 414 433 485 503 517
Kooperberg, C. 289 521
Kotze, J. 100 520
Kressel, Ulrich 517
Kriging 147
Krogh, A. 367 516
Kruskal — Shephard scaling 502
Kullback — Leibler distance 497
Lagrange multipliers 256
Lagus, K. 485 503 517
Laird, N.. 255 400 511
Laplacian distribution 72
Lasso 64—65 69—72 330—331
Lawson, C. 75 517
Le Cun, Y. 362 363 365 366 368 517 520
Learning 1
Learning rate 354
Learning Vector Quantization 414
Least squares 11 32
Leave-one-out cross-validation 215
Leblanc, M. 255 517
Lee, W. 343 520
Left singular vectors 487
LeNet 363
Li, K.-C. 432 512
Life, ultimate meaning of 534
Likelihood function 229 237
Lin, H. 293 518
Lin, Y. 382 406 522
Linear basis expansion 115—124
Linear combination splits 273
Linear discriminant function 84—94
Linear methods for classification 79—114
Linear methods for regression 41—78
Linear models and least squares 11
Linear regression of an indicator matrix 81
Linear separability 105
Linear smoother 129
Link function 258
Little, R. 293 518
Littman, M. 504 510
Lloyd, S.P. 433 503 518
Loader, C. 183 190 518
Local likelihood 179
Local methods in high dimensions 22—27
Local minima 359
Local polynomial regression 171
Local regression 168 174
Localization in time and in frequency 149
Loess (local regression) 168 174
Log-odds ratio (logit) 96
Logistic (sigmoid) function 352
Logistic regression 95—104 261
Logit (log-odds ratio) 96
Loss function 18 21 193—195 308
Loss matrix 272
Lossless compression 467
Lossy compression 467
LVQ see Learning Vector Quantization
Macnaughton Smith, P. 518
MacQueen, J. 433 503 518
Madigan, D. 222 255 518
Mahalanobis distance 392
Majority vote 249 299
Mannila, H. 442 443 503 509
MAP (maximum aposteriori) estimate 234
Mardia, K.V. 75 111 495 504 518
Margin 110 372
Market basket analysis 440 451
Markov chain Monte Carlo (MCMC) methods 243
MARS see Multivariate adaptive regression splines
MART see Multiple additive regression trees
Massart, D. 469 518
Maximum Likelihood Estimation 32 225
Maximum likelihood inference 229
McCulloch, C.E. 293 518
McCulloch, W.S. 367 518
McLachlan, Geoffrey J. 11 518
MCMC see Markov Chain Monte Carlo Methods
McNeal, J. 3 47 521
MDL see Minimum description length
Mean squared error 24 247
Memory-based method 415
Metropolis-Hastings algorithm 245
Michie, D. 89 390 422 518
Minimum description length (MDL) 208
Misclassification error 17 271
Missing data 240 293—294
Missing predictor values 293—294
Mixing proportions 189
Mixture discriminant analysis 399—405
Mixture modeling 188—189 236—240 399—405
Mixture of experts 290—292
Mixtures and the EM algorithm 236—240
Mockerr, L.G. 518
Mode seekers 459
Model averaging and stacking 250
Model combination 251
Model complexity 194—195
Model selection 195—196 203—204
Monte Carlo method 217 447
Morgan, James N. 296 518
Mother wavelet 152
Mukinomial distribution 98
Mulier, F 39 211 510
Multi-dimensional splines 138
Multi-edit algorithm 432
Multi-layer perceptron 358 362
Multi-resolution analysis 152
Multidimensional scaling 502—503
Multiple additive regression trees (MART) 322
Multiple minima 253 359
Multiple outcome shrinkage and selection 73
Multiple outputs 54 73 81—84
Multiple regression from simple univariate regression 50
Multivariate adaptive regression splines (MARS) 283—289
Multivariate nonparametric regression 395
Munro, S. 355 519
Murray, W. 75 519
Myles, J.P. 429 519
Nadaray — Watson estimate 166
Naive Bayes classifier 86 184—185
Natural cubic splines 120—121
Neal, R. 255 519
Nearest neighbor methods 415—436
Network diagram 351
Neural networks 347—370
Newton's method (Newton — Raphson procedure) 98—99
Nonparametric logistic regression 261—265
Normal (Gaussian) distribution 17 31
Normal equations 12
Nowlan, S. 296 516
Numerical optimization 319 353—354
Object dissimilarity 457—458
Oja, E. 496 497 498 504 516
Olshen, R. 219 270 272 296 331 405 510
Online algorithm 355
Optimal scoring 395—397 401—402
Optimal separating hyperplanes 108—110
Optimism of the training error rate 200—202
Ordered categorical (ordinal) predictor 10 456
Orthogonal predictors 51
Overfitting 194 200—203 324
Paatero, A. 485 503 517
Pace, R.Kelley 335 519
Palmer, R.G. 367 516
Parametric bootstrap 228
Parker, David 367 519
Partial dependence plots 333—334
Partial least squares 66—68
Parzen window 182
Pasting 279
Patient rule induction method (PRIM) 279—282 451—452
Peeling 279
Penalizatio see regularization
Penalized discriminant analysis 397—398
Penalized polynomial regression 147
Penalized regression 34 59—65 147
Penalty matrix 128 163
Perception 350—370
Picard, D. 331 511
Piecewise polynomials and splines 36 119
Pitts, W. 367 518
Plastria, F. 469 518
Platt, J. 405 519
Poggio, T. 144 148 155 368 406 512 514
Pontil, M. 144 155 406 512
Posterior distribution 232
Posterior probability 206—207 232
Prediction accuracy 290
Prediction error 18
Predictive distribution 232
Prim see patient rule induction method
Principal components 62—63 66—67 485—491
Principal components regression 66—67
Principal curves and surfaces 491—493
Principal points 491
Prior distribution 232—235
Projection pursuit 347—349 500
Projection pursuit regression 347—349
Prototype classifier 411—415
Prototype methods 411—415
Proximity matrices 455
Pruning 270
QR decomposition 53
Quadratic approximations and inference 102
Quadratic discriminant function 86 89
Quinlan, R. 273 296 519
Radial basis function (RBF) network 350
Radial basis functions 186—187 240 351
Raftery, A.E. 222 255 518
Ramsay, J. 155 519
Rao score test 103
Rao, C.R. 406 519
Rayleigh quotient 94
Receiver operating characteristic (ROC) curve 277—278
Reduced-rank linear discriminant analysis 91
Redwine, E. 3 47 521
Regression 11—13 41—78 174—178
Regression spline 120
Regularization 34 144—149
Regularized discriminant analysis 90—91
Representer of evaluation 145
Reproducing kernel Hilbert space 144—149
Reproducing property 145
Responsibilities 238—240
Rice, J. 477 517
Ridge regression 59—64
Ripley, B.D. 39 108 111 13 270 359 367 368 406 420 432 433 519
risk factor 100
Rissanen, Jorma 222 519
Robbins, H. 355 519
Robust fitting 308—310
Roosen, C. 519
Rosenblatt's perceptron learning algorithm 107
Rosenblatt, F. 80 106 367 520
Rousseauw, J. 100 520
Rousseeuw, P. 469 480 503 517
Rubin, D. 255 293 400 511 514 518
Rug plot 265
Rumelhart, D. 367 520
Saarela, A. 485 503 517
Salojaetrvi, J. 485 503 517
Sammon mapping 502
Scaling of the inputs 358
Schapire, R. 299 340 341 343 513 520
Schnitzler, C. 295 515
Schroeder, A. 514
Schwartz's criterion 206—207
Schwartz, G. 206 222 520
Score equations 98 229
Scott, D. 190 520
Seber, G.A.F 75 520
Sejnowski, T. 504 509
Self-consistency property 491—492
Self-organizing map (SOM) 480—484
Sensitivity of a test 277—278
Separating hyperplanes 108 371—373
Shao, J. 222 520
Shape averaging 434
Short, R.S. 429 520
Shrinkage methods 59—66
Shustek, L.J. 513
Shyu, M. 333 509
Sigmoid 352
Silverman, B. 155 157 190 295 296 514 515 519 520
Silvey, S.D. 254 520
Simard, P. 432 515 520
Similarity measure see dissimilarity measure
Singer, Y. 343 520
Single index model 348
Singular value decomposition (SVD) 487—488
Singular values 487
Skin of the orange example 384—385
Slate, E.H. 293 518
Sliced inverse regression 432
Smith, A. 255 514
Smoother 115—134 165—173
Smoother matrix 129
Smoothing parameter 37 134—136 172—173
Smoothing spline 127—133
Soft clustering 463
Softmax function 351
SOM see self-organizing map
Sonquist, John A. 296 518
Sparseness 149
Specificity of a test 277—278
Spector, P. 222 510
Spiegelhalter, D. 255 518 520
Spline, additive 259—260
Spline, cubic 127—128
Spline, cubic smoothing 127—128
Spline, interaction 382
Spline, regression 120
Spline, smoothing 127—133
Spline, thin plate 140