Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
Powell W.B. — Approximate dynamic programming: Solving the curses of dimensionality
Powell W.B. — Approximate dynamic programming: Solving the curses of dimensionality



Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: Approximate dynamic programming: Solving the curses of dimensionality

Автор: Powell W.B.

Аннотация:

A complete and accessible introduction to the real-world applications of approximate dynamic programming

With the growing levels of sophistication in modern-day operations, it is vital for practitioners to understand how to approach, model, and solve complex industrial problems. Approximate Dynamic Programming is a result of the author's decades of experience working in large industrial settings to develop practical and high-quality solutions to problems that involve making decisions in the presence of uncertainty. This groundbreaking book uniquely integrates four distinct disciplines—Markov design processes, mathematical programming, simulation, and statistics—to demonstrate how to successfully model and solve a wide range of real-life problems using the techniques of approximate dynamic programming (ADP). The reader is introduced to the three curses of dimensionality that impact complex problems and is also shown how the post-decision state variable allows for the use of classical algorithmic strategies from operations research to treat complex stochastic optimization problems.

Designed as an introduction and assuming no prior training in dynamic programming of any form, Approximate Dynamic Programming contains dozens of algorithms that are intended to serve as a starting point in the design of practical solutions for real problems. The book provides detailed coverage of implementation challenges including: modeling complex sequential decision processes under uncertainty, identifying robust policies, designing and estimating value function approximations, choosing effective stepsize rules, and resolving convergence issues.

With a focus on modeling and algorithms in conjunction with the language of mainstream operations research, artificial intelligence, and control theory, Approximate Dynamic Programming:

  • Models complex, high-dimensional problems in a natural and practical way, which draws on years of industrial projects

  • Introduces and emphasizes the power of estimating a value function around the post-decision state, allowing solution algorithms to be broken down into three fundamental steps: classical simulation, classical optimization, and classical statistics

  • Presents a thorough discussion of recursive estimation, including fundamental theory and a number of issues that arise in the development of practical algorithms

  • Offers a variety of methods for approximating dynamic programs that have appeared in previous literature, but that have never been presented in the coherent format of a book

Motivated by examples from modern-day operations research, Approximate Dynamic Programming is an accessible introduction to dynamic modeling and is also a valuable guide for the development of high-quality solutions to problems that exist in operations research and engineering. The clear and precise presentation of the material makes this an appropriate text for advanced undergraduate and beginning graduate courses, while also serving as a reference for researchers and practitioners. A companion Web site is available for readers, which includes additional exercises, solutions to exercises, and data sets to reinforce the book's main concepts.



Язык: en

Рубрика: Computer science/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 2007

Количество страниц: 487

Добавлена в каталог: 23.12.2013

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
$Q$-leaming      276
$Q$-leaming, infinite horizon      310
A* algorithm      114
Actions      147
Actor-critic      285
Affine function      53
Aggregation      226
Aggregation, modeling      229
Aggregation, multiple levels      233
Algorithm, $Q$-leaming, finite horizon      278
Algorithm, $Q$-leaming, infinite-horizon      310
Algorithm, ADP for asset acquisition      389
Algorithm, ADP for infinite horizon      304
Algorithm, ADP for policy iteration      305
Algorithm, ADP using post-decision state      105
Algorithm, ADP with exact expectation      97
Algorithm, ADP with pre-decision state      276
Algorithm, approximate expectation      98
Algorithm, approximate hybrid value/policy iteration      284
Algorithm, approximate policy iteration with VFA      282
Algorithm, asynchronous dynamic programming      114
Algorithm, backward dynamic programming      54
Algorithm, bias-adjusted Kalman filter stepsize      204
Algorithm, CUPPS algorithm      373
Algorithm, double-pass ADP      273
Algorithm, Gauss — Seidel variation      58
Algorithm, generic ADP      110
Algorithm, hybrid value/policy iteration      63
Algorithm, infinite horizon generic ADP      305
Algorithm, policy iteration      62
Algorithm, real-time dynamic programming      115
Algorithm, relative value iteration      58
Algorithm, roll-out policy      293
Algorithm, SHAPE algorithm      362
Algorithm, shortest path      18
Algorithm, single-pass ADP      272
Algorithm, SPAR      355
Algorithm, stochastic decomposition      372
Algorithm, synchronous ADP      291
Algorithm, synchronous dynamic programming      114
Algorithm, temporal-difference learning for infinite horizon      309
Algorithm, tree-search      292
Algorithm, value iteration      57
Aliasing      235
American option      239
Apparent convergence      210
Asset acquisition      28-29
Asset acquisition, ADP algorithm      389
Asset acquisition, lagged      30
Asset acquisition, variations      391
Asset pricing      26
Asynchronous dynamic programming      114
Attribute transition function      164
Backpropagation through time      273
Backward dynamic programming      54
Bandit problem      37
Bandit problems      332
Basis functions      127 237 362
Basis functions, approximate linear programming      311
Basis functions, geometric view      244
Basis functions, Longstaff and Schwartz      241
Basis functions, neural network      253
Basis functions, recursive time-series      251
Basis functions, tic-tac-toe      243
Batch process      257
Batch replenishment      31 65
Bellman error      98
Bellman functional equation      4
Bellman Hamilton — Jacobi      3
Bellman optimality equation      4
Bellman Recurrence Equation      4
Bellman’s equation      3 28 48
Bellman’s equation, deterministic      49
Bellman’s equation, expectation form      49
Bellman’s equation, operator form      53
Bellman’s equation, standard form      49
Bellman’s equation, vector form      51
Benders’ decomposition      370
Benders’ decomposition, CUPPS algorithm      373
Benders’ decomposition, stochastic decomposition      372
Bias      195
Bias, due to value iteration      286
Bias, statistical error in max operator      287
Blood management, ADP algorithm      397
Blood management, model      393
Boltzmann exploration      328
Budgeting problem, continuous      21
Budgeting problem, discrete      19
Contribution function      40 166
Controls      147
Cost function      166
CUPPS algorithm      372
Curses of dimensionality      92
Curses of dimensionality, action space      5
Curses of dimensionality, outcome space      5
Curses of dimensionality, state space      5
Cutting planes      365
Decision node      23
Decision tree      23
Decisions      147
Double-pass algorithm      273
Dynamic assignment problem      34
Error measures      315
Exogenous information      29 40 151
Exogenous information, lagged      155
Exogenous information, outcomes      153
Exogenous information, scenarios      153
Experimental issues, convergence      295
Experimental issues, starting      294
Exploitation      327
Exploration      326-327
Exploration vs. exploitation      116 323
Exponential smoothing      99
Factored representation of a state      146
Finite horizon, for infinite horizon models      317
Flat representation of a state      146
Forward dynamic programming      93
Gambling problem      25
Gittins exploration      344
Gittins indices      332
Gittins indices, basic theory      334
Gittins indices, foundations      332
Gittins indices, normally distributed rewards      336
Gradients      352
Greedy strategy      95
Infinite horizon      55 304
Infinite horizon, $Q$-leaming      310
Infinite horizon, finite-horizon approximations      317
Infinite horizon, policy iteration      305
Infinite horizon, temporal-difference learning      307
Infinite horizon, value iteration      305
Information acquisition      36
Information acquisition, illustration      330
Initialization      112
Interval estimation      337
Knowledge gradient      339
L-shaped decomposition      372
Lagged information      155
Lattice      67
Learning rate      181
Learning rate schedules      183
Learning strategies, Boltzmann exploration      328
Learning strategies, epsilon-greedy exploration      329
Learning strategies, exploitation      327
Learning strategies, exploration      326
Learning strategies, Gittins exploration      344
Learning strategies, Gittins indices      332
Learning strategies, interval estimation      337
Learning strategies, knowledge gradient      339
Learning strategies, mixed      327
Learning strategies, upper confidence bound      338
Leveling algorithm      355
Linear filter      99
Linear operator      53
Linear programming method, approximate      311
Linear programming method, exact      64
Linear regression      238
Linear regression, Longstaff and Schwartz      239
Linear regression, recursive estimation, derivation      263
Linear regression, recursive estimation, multiple observations      250
Linear regression, recursive estimation, time-series      251
Linear regression, recursive least squares, nonstationary data      249
Linear regression, recursive least squares, stationary data      248
Linear regression, recursive methods      246
Linear regression, stochastic gradient algorithm      247
Longstaff and Schwartz      239
Lookup-table      99
Markov decision processes      47
Max operator      53
Measure-theoretic view of information      170
Min operator      53
Model, contribution function      166
Model, decisions      147
Model, elements of a dynamic program, contribution function      130
Model, elements of a dynamic program, decision variable      130
Model, elements of a dynamic program, exogenous-information      130
Model, elements of a dynamic program, objective function      130
Model, elements of a dynamic program, state      130
Model, elements of a dynamic program, transition function      130
Model, policies      149
Model, transition function      159
Model-free dynamic programming      118
Modeling dynamic programs      40
Models, contribution function      119
Models, elements of a dynamic program      130
Models, elements of a dynamic program, state variable      130
Models, exogenous information      119
Models, resources      135
Models, resources, multiple      137
Models, resources, single discrete      136
Models, state      139
Models, time      132
Models, transition function      118
Monotone policies      64-65
Monte — Carlo sampling      100
Myopic policy      150
Neural networks      253
Nomadic trucker      137
Nomadic trucker, learning      323
Objective function      40 48 169
On-line applications      317
Optimality equation      48
Optimality equations, post-decision state      104
Optimality equations, proof      70
Outcome node      23
Outcomes      153
Partially observable states      145
policies      149 159
Policies, randomized      151
Policy iteration      62 282
Policy iteration, hybrid      63
Policy iteration, infinite horizon      305
Policy iteration, with look-up table representation      282
Policy iteration, with myopic rules      283
Policy iteration, with neural networks      283
Policy iteration, with regression function      283
Policy iteration, with value function approximation      282
Post-decision state      142
Post-decision state, optimality equations      104
Post-decision state, perspective      107
Proof of optimality      81
Randomized policies      80 151
Real-time dynamic programming      114
Reinforcement learning      119
Resource allocation, asset acquisition      388
Resource allocation, blood management      392
Resource allocation, fleet management      416
Resource allocation, general model      404
Resource allocation, portfolio optimization      401
Resource allocation, trucking application      421
Resources, multiple      137
Resources, nomadic trucker      137
Resources, single discrete      136
Reward function      166
RTDP      114
Sample path      95
Scenarios      153
SHAPE algorithm      359
SHAPE algorithm, proof of convergence      377
Sherman — Morrison      264
Shortest path, deterministic      2 18
Shortest path, information collecting      39
Shortest path, stochastic      24
Single-pass algorithm      272
Smoothing factor      181
SPAR algorithm, projection operation      382
SPAR, projection      357
SPAR, weighted projection      359
State sampling, all states      290
State sampling, roll-out heuristic      293
State sampling, tree search      291
State variable, definition      139
State, alias      235
State, asset acquisition      1 28
State, asset acquisition If      30
State, asset pricing      27
State, bandit problem      38
State, budgeting problem      20
state, definition      40 139
State, dynamic assignment problem      36
State, factored      146
State, flat      146
State, gambling problem      25
State, partially observable      145
State, post-decision      101 103 142
State, pre-decision      103
State, sampling strategies      114
State, shortest path      19
State, state of knowledge      38
State, transformer replacement      33
States of a system, hyperstate      141
States of a system, information state      141
States of a system, resource state      141
States of a system, single resource      141
stepsize      99 181
Stepsize rule      183
Stepsize, apparent convergence      210
Stepsize, bias and variance      195
Stepsize, bias-adjusted Kalman filter      204
Stepsize, convergence conditions      184
Stepsize, deterministic      183
Stepsize, deterministic, $1/n$      187
Stepsize, deterministic, constant      186
Stepsize, deterministic, harmonic      187
Stepsize, deterministic, McClain      188
Stepsize, deterministic, polynomial learning rate      187
Stepsize, deterministic, search-then-converge      189
Stepsize, infinite horizon      313
Stepsize, infinite horizon, bounds      314
Stepsize, optimal, nonstationary I      200
Stepsize, optimal, nonstationary II      201
Stepsize, optimal, stationary      198
Stepsize, stochastic      190
Stepsize, stochastic, Belgacem’s rule      194
Stepsize, stochastic, convergence conditions      191
Stepsize, stochastic, Gaivoronski’s rule      193
Stepsize, stochastic, Godfrey’s rule      194
Stepsize, stochastic, Kesten’s rule      192
Stepsize, stochastic, Mirozahmedov’s rule      193
Stepsize, stochastic, stochastic gradient adaptive stepsize      193
1 2
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2020
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте