Авторизация
Поиск по указателям
Powell W.B. — Approximate dynamic programming: Solving the curses of dimensionality
Обсудите книгу на научном форуме
Нашли опечатку? Выделите ее мышкой и нажмите Ctrl+Enter
Название: Approximate dynamic programming: Solving the curses of dimensionality
Автор: Powell W.B.
Аннотация: A complete and accessible introduction to the real-world applications of approximate dynamic programming
With the growing levels of sophistication in modern-day operations, it is vital for practitioners to understand how to approach, model, and solve complex industrial problems. Approximate Dynamic Programming is a result of the author's decades of experience working in large industrial settings to develop practical and high-quality solutions to problems that involve making decisions in the presence of uncertainty. This groundbreaking book uniquely integrates four distinct disciplines—Markov design processes, mathematical programming, simulation, and statistics—to demonstrate how to successfully model and solve a wide range of real-life problems using the techniques of approximate dynamic programming (ADP). The reader is introduced to the three curses of dimensionality that impact complex problems and is also shown how the post-decision state variable allows for the use of classical algorithmic strategies from operations research to treat complex stochastic optimization problems.
Designed as an introduction and assuming no prior training in dynamic programming of any form, Approximate Dynamic Programming contains dozens of algorithms that are intended to serve as a starting point in the design of practical solutions for real problems. The book provides detailed coverage of implementation challenges including: modeling complex sequential decision processes under uncertainty, identifying robust policies, designing and estimating value function approximations, choosing effective stepsize rules, and resolving convergence issues.
With a focus on modeling and algorithms in conjunction with the language of mainstream operations research, artificial intelligence, and control theory, Approximate Dynamic Programming:
Models complex, high-dimensional problems in a natural and practical way, which draws on years of industrial projects
Introduces and emphasizes the power of estimating a value function around the post-decision state, allowing solution algorithms to be broken down into three fundamental steps: classical simulation, classical optimization, and classical statistics
Presents a thorough discussion of recursive estimation, including fundamental theory and a number of issues that arise in the development of practical algorithms
Offers a variety of methods for approximating dynamic programs that have appeared in previous literature, but that have never been presented in the coherent format of a book
Motivated by examples from modern-day operations research, Approximate Dynamic Programming is an accessible introduction to dynamic modeling and is also a valuable guide for the development of high-quality solutions to problems that exist in operations research and engineering. The clear and precise presentation of the material makes this an appropriate text for advanced undergraduate and beginning graduate courses, while also serving as a reference for researchers and practitioners. A companion Web site is available for readers, which includes additional exercises, solutions to exercises, and data sets to reinforce the book's main concepts.
Язык:
Рубрика: Computer science /
Статус предметного указателя: Готов указатель с номерами страниц
ed2k: ed2k stats
Год издания: 2007
Количество страниц: 487
Добавлена в каталог: 23.12.2013
Операции: Положить на полку |
Скопировать ссылку для форума | Скопировать ID
Предметный указатель
-leaming 276
-leaming, infinite horizon 310
A* algorithm 114
Actions 147
Actor-critic 285
Affine function 53
Aggregation 226
Aggregation, modeling 229
Aggregation, multiple levels 233
Algorithm, -leaming, finite horizon 278
Algorithm, -leaming, infinite-horizon 310
Algorithm, ADP for asset acquisition 389
Algorithm, ADP for infinite horizon 304
Algorithm, ADP for policy iteration 305
Algorithm, ADP using post-decision state 105
Algorithm, ADP with exact expectation 97
Algorithm, ADP with pre-decision state 276
Algorithm, approximate expectation 98
Algorithm, approximate hybrid value/policy iteration 284
Algorithm, approximate policy iteration with VFA 282
Algorithm, asynchronous dynamic programming 114
Algorithm, backward dynamic programming 54
Algorithm, bias-adjusted Kalman filter stepsize 204
Algorithm, CUPPS algorithm 373
Algorithm, double-pass ADP 273
Algorithm, Gauss — Seidel variation 58
Algorithm, generic ADP 110
Algorithm, hybrid value/policy iteration 63
Algorithm, infinite horizon generic ADP 305
Algorithm, policy iteration 62
Algorithm, real-time dynamic programming 115
Algorithm, relative value iteration 58
Algorithm, roll-out policy 293
Algorithm, SHAPE algorithm 362
Algorithm, shortest path 18
Algorithm, single-pass ADP 272
Algorithm, SPAR 355
Algorithm, stochastic decomposition 372
Algorithm, synchronous ADP 291
Algorithm, synchronous dynamic programming 114
Algorithm, temporal-difference learning for infinite horizon 309
Algorithm, tree-search 292
Algorithm, value iteration 57
Aliasing 235
American option 239
Apparent convergence 210
Asset acquisition 28-29
Asset acquisition, ADP algorithm 389
Asset acquisition, lagged 30
Asset acquisition, variations 391
Asset pricing 26
Asynchronous dynamic programming 114
Attribute transition function 164
Backpropagation through time 273
Backward dynamic programming 54
Bandit problem 37
Bandit problems 332
Basis functions 127 237 362
Basis functions, approximate linear programming 311
Basis functions, geometric view 244
Basis functions, Longstaff and Schwartz 241
Basis functions, neural network 253
Basis functions, recursive time-series 251
Basis functions, tic-tac-toe 243
Batch process 257
Batch replenishment 31 65
Bellman error 98
Bellman functional equation 4
Bellman Hamilton — Jacobi 3
Bellman optimality equation 4
Bellman Recurrence Equation 4
Bellman’s equation 3 28 48
Bellman’s equation, deterministic 49
Bellman’s equation, expectation form 49
Bellman’s equation, operator form 53
Bellman’s equation, standard form 49
Bellman’s equation, vector form 51
Benders’ decomposition 370
Benders’ decomposition, CUPPS algorithm 373
Benders’ decomposition, stochastic decomposition 372
Bias 195
Bias, due to value iteration 286
Bias, statistical error in max operator 287
Blood management, ADP algorithm 397
Blood management, model 393
Boltzmann exploration 328
Budgeting problem, continuous 21
Budgeting problem, discrete 19
Contribution function 40 166
Controls 147
Cost function 166
CUPPS algorithm 372
Curses of dimensionality 92
Curses of dimensionality, action space 5
Curses of dimensionality, outcome space 5
Curses of dimensionality, state space 5
Cutting planes 365
Decision node 23
Decision tree 23
Decisions 147
Double-pass algorithm 273
Dynamic assignment problem 34
Error measures 315
Exogenous information 29 40 151
Exogenous information, lagged 155
Exogenous information, outcomes 153
Exogenous information, scenarios 153
Experimental issues, convergence 295
Experimental issues, starting 294
Exploitation 327
Exploration 326-327
Exploration vs. exploitation 116 323
Exponential smoothing 99
Factored representation of a state 146
Finite horizon, for infinite horizon models 317
Flat representation of a state 146
Forward dynamic programming 93
Gambling problem 25
Gittins exploration 344
Gittins indices 332
Gittins indices, basic theory 334
Gittins indices, foundations 332
Gittins indices, normally distributed rewards 336
Gradients 352
Greedy strategy 95
Infinite horizon 55 304
Infinite horizon, -leaming 310
Infinite horizon, finite-horizon approximations 317
Infinite horizon, policy iteration 305
Infinite horizon, temporal-difference learning 307
Infinite horizon, value iteration 305
Information acquisition 36
Information acquisition, illustration 330
Initialization 112
Interval estimation 337
Knowledge gradient 339
L-shaped decomposition 372
Lagged information 155
Lattice 67
Learning rate 181
Learning rate schedules 183
Learning strategies, Boltzmann exploration 328
Learning strategies, epsilon-greedy exploration 329
Learning strategies, exploitation 327
Learning strategies, exploration 326
Learning strategies, Gittins exploration 344
Learning strategies, Gittins indices 332
Learning strategies, interval estimation 337
Learning strategies, knowledge gradient 339
Learning strategies, mixed 327
Learning strategies, upper confidence bound 338
Leveling algorithm 355
Linear filter 99
Linear operator 53
Linear programming method, approximate 311
Linear programming method, exact 64
Linear regression 238
Linear regression, Longstaff and Schwartz 239
Linear regression, recursive estimation, derivation 263
Linear regression, recursive estimation, multiple observations 250
Linear regression, recursive estimation, time-series 251
Linear regression, recursive least squares, nonstationary data 249
Linear regression, recursive least squares, stationary data 248
Linear regression, recursive methods 246
Linear regression, stochastic gradient algorithm 247
Longstaff and Schwartz 239
Lookup-table 99
Markov decision processes 47
Max operator 53
Measure-theoretic view of information 170
Min operator 53
Model, contribution function 166
Model, decisions 147
Model, elements of a dynamic program, contribution function 130
Model, elements of a dynamic program, decision variable 130
Model, elements of a dynamic program, exogenous-information 130
Model, elements of a dynamic program, objective function 130
Model, elements of a dynamic program, state 130
Model, elements of a dynamic program, transition function 130
Model, policies 149
Model, transition function 159
Model-free dynamic programming 118
Modeling dynamic programs 40
Models, contribution function 119
Models, elements of a dynamic program 130
Models, elements of a dynamic program, state variable 130
Models, exogenous information 119
Models, resources 135
Models, resources, multiple 137
Models, resources, single discrete 136
Models, state 139
Models, time 132
Models, transition function 118
Monotone policies 64-65
Monte — Carlo sampling 100
Myopic policy 150
Neural networks 253
Nomadic trucker 137
Nomadic trucker, learning 323
Objective function 40 48 169
On-line applications 317
Optimality equation 48
Optimality equations, post-decision state 104
Optimality equations, proof 70
Outcome node 23
Outcomes 153
Partially observable states 145
policies 149 159
Policies, randomized 151
Policy iteration 62 282
Policy iteration, hybrid 63
Policy iteration, infinite horizon 305
Policy iteration, with look-up table representation 282
Policy iteration, with myopic rules 283
Policy iteration, with neural networks 283
Policy iteration, with regression function 283
Policy iteration, with value function approximation 282
Post-decision state 142
Post-decision state, optimality equations 104
Post-decision state, perspective 107
Proof of optimality 81
Randomized policies 80 151
Real-time dynamic programming 114
Reinforcement learning 119
Resource allocation, asset acquisition 388
Resource allocation, blood management 392
Resource allocation, fleet management 416
Resource allocation, general model 404
Resource allocation, portfolio optimization 401
Resource allocation, trucking application 421
Resources, multiple 137
Resources, nomadic trucker 137
Resources, single discrete 136
Reward function 166
RTDP 114
Sample path 95
Scenarios 153
SHAPE algorithm 359
SHAPE algorithm, proof of convergence 377
Sherman — Morrison 264
Shortest path, deterministic 2 18
Shortest path, information collecting 39
Shortest path, stochastic 24
Single-pass algorithm 272
Smoothing factor 181
SPAR algorithm, projection operation 382
SPAR, projection 357
SPAR, weighted projection 359
State sampling, all states 290
State sampling, roll-out heuristic 293
State sampling, tree search 291
State variable, definition 139
State, alias 235
State, asset acquisition 1 28
State, asset acquisition If 30
State, asset pricing 27
State, bandit problem 38
State, budgeting problem 20
state, definition 40 139
State, dynamic assignment problem 36
State, factored 146
State, flat 146
State, gambling problem 25
State, partially observable 145
State, post-decision 101 103 142
State, pre-decision 103
State, sampling strategies 114
State, shortest path 19
State, state of knowledge 38
State, transformer replacement 33
States of a system, hyperstate 141
States of a system, information state 141
States of a system, resource state 141
States of a system, single resource 141
stepsize 99 181
Stepsize rule 183
Stepsize, apparent convergence 210
Stepsize, bias and variance 195
Stepsize, bias-adjusted Kalman filter 204
Stepsize, convergence conditions 184
Stepsize, deterministic 183
Stepsize, deterministic, 187
Stepsize, deterministic, constant 186
Stepsize, deterministic, harmonic 187
Stepsize, deterministic, McClain 188
Stepsize, deterministic, polynomial learning rate 187
Stepsize, deterministic, search-then-converge 189
Stepsize, infinite horizon 313
Stepsize, infinite horizon, bounds 314
Stepsize, optimal, nonstationary I 200
Stepsize, optimal, nonstationary II 201
Stepsize, optimal, stationary 198
Stepsize, stochastic 190
Stepsize, stochastic, Belgacem’s rule 194
Stepsize, stochastic, convergence conditions 191
Stepsize, stochastic, Gaivoronski’s rule 193
Stepsize, stochastic, Godfrey’s rule 194
Stepsize, stochastic, Kesten’s rule 192
Stepsize, stochastic, Mirozahmedov’s rule 193
Stepsize, stochastic, stochastic gradient adaptive stepsize 193
Реклама