Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
BertsekasD., Tsitsiklis J. — Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3)
BertsekasD., Tsitsiklis J. — Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3)

Читать книгу
бесплатно

Скачать книгу с нашего сайта нельзя

Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3)

Авторы: BertsekasD., Tsitsiklis J.

Аннотация:

This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control. Neuro-dynamic programming uses neural network approximations to overcome the "curse of dimensionality" and the "curse of modeling" that have been the bottlenecks to the practical application of dynamic programming and stochastic control to complex problems. The methodology allows systems to learn about their behavior through simulation, and to improve their performance through iterative reinforcement. This book provides the first systematic presentation of the science and the art behind this exciting and far-reaching methodology. The book develops a comprehensive analysis of neuro-dynamic programming algorithms, and guides the reader to their successful application through case studies from complex problem areas.


Язык: en

Рубрика: Computer science/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 1996

Количество страниц: 504

Добавлена в каталог: 01.09.2014

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
Action network      262 273 316 374 380
Actor      36 224
Advantages      339 382
Aggregation of states      68 341 382
Approximation in policy space      9 261
Architectures      5 60
Architectures, linear      61
Architectures, local-global      71 88
Architectures, nonlinear      62
Architectures, power      257
Average cost problems      15 386
Backgammon      8 452
Backpropagation      65 374 375
Basis functions      61
Batch gradient method      108
batching      113
Bellman error methods      364 408
Bellman's equation      2 16 22 40 389 411
Bellman's equation, reduced      49 51 52 368
Boundedness in stochastic approximation      159 178
Cellular systems      53
Chain rule      465
Channel allocation      53 448
Chattering      320 369
Checkers      380
Chess      4
Combinatorial optimization      54 56 58 74 440 456
Compact representations      4 180
Condition number      100 115
Consistent estimator      182
Continuous-state problems      305 370
Continuously differentiable functions      464
Contraction      154
Contraction, weighted Euclidean norm      353
Contraction, weighted maximum norm      23 155
Control space complexity      268
Convergence, $\lambda$-policy iteration      45
Convergence, approximate value iteration      344 354 364
Convergence, asynchronous policy iteration      33
Convergence, asynchronous value iteration      27 240
Convergence, constant stepsize gradient methods      94
Convergence, diminishing stepsize gradient methods      96
Convergence, extended Kalman filter      125
Convergence, geometric      100
Convergence, gradient methods with errors      121
Convergence, incremental gradient methods      115 123 143 145
Convergence, issues      93
Convergence, linear      100
Convergence, martingale      149
Convergence, off-line TD methods      208
Convergence, on-line TD methods      219
Convergence, policy iteration      30 41
Convergence, Q-learning      247 337 361 402
Convergence, rate of heavy ball method      105
Convergence, rate of steepest descent      99
Convergence, simulation-based value iteration      240
Convergence, stochastic approximation      154
Convergence, stochastic gradient methods      142
Convergence, stochastic pseudogradient methods      141 148
Convergence, sublinear      100
Convergence, supermartingale      148
Convergence, synchronous optimistic TD(1)      232
Convergence, TD methods for discounted problems      222
Convergence, TD($\lambda$)      199
Convergence, TD($\lambda$) with linear architectures      294 308
Convergence, TD-based policy iteration      45
Convex sets and functions      80 465
Cost-to-go      2 13
Cost-to-go, approximate      3 256
Cost-to-go, reduced      49 51
Critic      36 224 262 380
Data block      81 108
Decomposition of linear least squares problems      87
Decomposition, state space      28 71 331
Descent direction      89 140
Differential cost      390
discount factor      12
Discounted problems      15 37
Discounted problems, conversion to stochastic shortest paths      39
Discounted problems, error bounds      262 275
Discounted problems, optimistic TD($\lambda$)      313
Discounted problems, Q-learning      250
Discounted problems, TD methods      204 222 290 294
Discrete optimization      54 56 74
Dual heuristic DP      380
Dynamic programming operator      19 37
Eigenvalues      461
Elevator dispatching      456
Eligibility coefficients      202 209 295
Error bounds, approximate policy iteration      275 381
Error bounds, approximate value iteration      332 349 357
Error bounds, Bellman error methods      369
Error bounds, greedy policies      262
Error bounds, optimistic policy iteration      318 328
Error bounds, Q-learning      361
Error bounds, TD($\lambda$)      302 312
Every-visit method      187 189 194 197 202 251
Exploration      238 251 317
Exponential forgetting      85
Extended NDP solution      444
Fading factor      85 126
features      6 66
Features and aggregation      68
Features and heuristic policies      72 441
Features, feature iteration      67 380 427
Finite horizon problems      12 13 17 329
First passage time      47 473
First-visit method      189 194 197 202 251
Fixed point      134
Football      426
Games      408 453
Gauss — Newton method      90 124 126 130
Global minimum      78
Gradient      464
Gradient matrix      465
Gradient methods      89
Gradient methods with errors      105 121 142
Gradient methods, initialization      115
Hamilton — Jacobi equation      371
Heavy ball method      104
Heuristic DP      380
Heuristic policies      72 433
Hidden layer      62
Ill-conditioning      100 103
Imperfect state information      69
Incremental gradient methods      108 130 143 145 178 275 335 365 367 372
Incremental gradient methods with constant stepsize      113
Incremental gradient methods, convergence      115 123 143 145
Infinite horizon problems      12 14
Initialization      115
Invariant distribution      473
Iterative resampling      193
Jacobian      465
Job-shop scheduling      456
Kalman filter      83 128 252 275 316 430
Kalman filter, extended      124 130
Least squares      77
Levenberg — Marquardt method      91
Limit cycling      112
Limit inferior      463
Limit point      463
Limit superior      463
Linear least squares      81 128
Linear least squares, incremental gradient methods      117
Linear programming      36 375 383 398
Linearly (in)dependent vectors      459
Lipschitz continuity      95
Local minimum      78
Local search      56
Logistic function      62
Lookup table representations      4 180
Lyapunov function      139
Maintenance problem      51 58 268 440
Markov noise      173
Martingales      149
Maximum norm      39
Mean value theorem      464
Momentum term      104 111
Monotone mappings      158
Monotonicity Lemma      21 38
Monte Carlo simulation      181
Monte Carlo simulation and policy iteration      270
Monte Carlo simulation and temporal differences      193
Multilayer perceptrons      62 129 429 454
Multilayer perceptrons, approximation capability      66
Multistage lookahead      30 41 264 266
Neural networks      6 62 76 103 129
Newton's method      89 91 101
Norm      459
Norm, Euclidean      459
Norm, maximum      39 459
Norm, weighted maximum      23 155 460
Norm, weighted quadratic      296
ODE approach      171 178 183 402
Optimal stopping      358
Optimality conditions      78
Parameter estimation      77
Parking      422
Partition, greedy      227 322
Partitioning      70 427
Policy      2 12
Policy evaluation      29 186
Policy evaluation, approximate      271 284 366 371
Policy improvement      29 192
Policy iteration      29 41
Policy iteration, $\lambda$-policy iteration      42 58 315 437
Policy iteration, approximate      269
Policy iteration, asynchronous      32 41 58
Policy iteration, average cost problems      397 405
Policy iteration, convergence      30
Policy iteration, games      412 414
Policy iteration, modified      32 41 58
Policy iteration, multistage lookahead      30
Policy iteration, optimistic      224 252 312 427 430 437 448 454
Policy iteration, partially optimistic      231 317
Policy iteration, Q-learning      338
Policy iteration, simulation-based      192
Policy iteration, temporal difference-based      41 437
Policy, greedy      226 238 259
Policy, improper      18
Policy, proper      18
Policy, stationary      13
Position evaluators      4 453
Positive definite matrices      462
Potential function      139
Projection      304
Projection in stochastic approximation      160
Pseudo-contraction, Euclidean norm      146
Pseudo-contraction, weighted maximum norm      155
Pseudogradient      140 178
Q-factors      4 192 245 256 260 261 271 367
Q-learning      180 245 253 337 352 358 380 399 415
Quadratic minimization      80 99
Quasi-Newton methods      90
Rank      460
Real-time dynamic programming      253
Recurrent class      472
Regular NDP solution      444
Reinforcement learning      8 10
Robbins — Monro      133 195 196
Rollout policy      266 455
Sarsa      339
Scaling, diagonal      101
Scaling, input      114
Selective lookahead      265
Semidefinite matrices      462
Sequential games      412 453
Sigmoidal function      62
Singular value decomposition      82 128 430 447
Soft partitioning      72
Stationary point      79
Steady-state distribution      473
Steepest descent method      89 99
Steepest descent method with momentum      104 111
Steepest descent method, diagonally scaled      101
stepsize      92
Stepsize, diminishing      92 111 129
Stepsize, incremental gradient methods      111 120
Stepsize, randomness      137
Stepsize, stochastic approximation      135
Stepsize, TD methods      218
Stochastic approximation      133
Stochastic gradient method      142 365
Stochastic matrix      471
Stochastic programming      441
Stochastic shortest paths      15 17
Stopping time      202
Strict minimum      78
Strong Law of Large Numbers      182
Sufficient statistics      69
Super martingale convergence      148
Supervised learning      453
TD($\lambda$)      195 429 438 454
TD($\lambda$), approximate policy evaluation      284
TD($\lambda$), choice of $\lambda$      200
TD($\lambda$), convergence      199
TD($\lambda$), discounted problems      204 290 294
TD($\lambda$), divergence      291
TD($\lambda$), every-visit      197 202
TD($\lambda$), first-visit      197 202
TD($\lambda$), games      417
TD($\lambda$), least squares      252
TD($\lambda$), off-line      198 208 286
TD($\lambda$), on-line      198 204 219 224 252 286
TD($\lambda$), optimistic      225 313 314 454
TD($\lambda$), replace      252
TD($\lambda$), restart      202 252
TD($\lambda$), stepsize selection      218
TD($\lambda$), synchronous      232
TD(0)      196 229 231 287 306 325 336 337 351 369 448
TD(1)      196 232 284 285 289 325 327 328
Temporal differences      41 180 193 381
Temporal differences, general temporal difference methods      201
Temporal differences, least squares      252
Temporal differences, Monte Carlo simulation      193
Termination state      15 17
Tetris      50 58 435
Training      6 60 76
Transient states      472
Two-pass methods      147 178
Two-sample methods      366 368 382
Unbiased estimator      181
Value iteration      25 42
Value iteration with state aggregation      341
Value iteration, approximate      329 341 353 362
Value iteration, asynchronous      26 58 396
Value iteration, average cost problems      391
Value iteration, contracting      393 401
Value iteration, Gauss — Seidel      26 28
Value iteration, incremental      335
Value iteration, relative      392 399
Value iteration, simulation-based      237
Wald's identity      185
Weighted maximum norm      23 155
Weighted maximum norm, contractions      23
Weights, neural network      64
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2019
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте