引用来自ShangtongZhang的代码chapter08/maze.py
通过maze问题帮助对8.1-8.4的内容有一个更好的理解^_^
Dyna-Q:8.2
Dyna-Q+:8.3
Prioritized Sweeping:8.4
引入模块
1 | import numpy as np |
定义迷宫类maze,实现算法和环境的交互方法
1 | # A wrapper class for a maze, containing all the information about the maze. |
dyna算法的参数类
1 | # a wrapper class for parameters of dyna algorithms |
根据epsilon-greedy policy选择action
1 | # choose an action based on epsilon-greedy algorithm |
建立模型类,使用一般的模型建立方法,也是Dyna-Q算法的model建立方法
1 | # Trivial model for planning in Dyna-Q |
建立模型类,使用time-based模型建立方法
1 | # Time-based model for planning in Dyna-Q+ |
建立优先队列类,用于prioritized sweeping方法
1 | # 在8.4 Prioritized Sweeping的优先更新算法中使用 |
建立模型类,使用基于prioritized sweeping的模型建立方法
1 | # Model containing a priority queue for Prioritized Sweeping |
使用Dyna-Q算法完成一次episode并更新value function
1 | # play for an episode for Dyna-Q algorithm |
使用prioritized sweeping方法进行一次episode更新
1 | # play for an episode for prioritized sweeping algorithm |
改变planning-step,比较不同的Dyna-Q算法的性能(找到终点的平均step)
1 | # Figure 8.2, DynaMaze, use 10 runs instead of 30 runs |
100%|██████████| 10/10 [00:55<00:00, 5.00s/it]
[18.06 17.26 15.9 ]
改变maze障碍的位置,并计算相应的累计reward
1 | # wrapper function for changing maze |
改变迷宫障碍的位置,比较Dyna-Q和Dyna-Q+方法的性能
1 | # Figure 8.5, BlockingMaze |
100%|██████████| 20/20 [01:16<00:00, 3.79s/it]
给迷宫添加一条更近的可用路径,比较两个算法的更新情况
1 | # Figure 8.6, ShortcutMaze |
100%|██████████| 5/5 [02:32<00:00, 30.57s/it]
检查当前的Q是否已经是最优
1 | # Check whether state-action values are already optimal |
比较Dyna-Q方法和Priority Sweeping方法的性能
1 | # Example 8.4, mazes with different resolution |
run 0, Prioritized Sweeping, maze size 54
run 0, Prioritized Sweeping, maze size 216
run 0, Prioritized Sweeping, maze size 486
run 0, Prioritized Sweeping, maze size 864
run 0, Prioritized Sweeping, maze size 1350
run 0, Dyna-Q, maze size 54
run 0, Dyna-Q, maze size 216
run 0, Dyna-Q, maze size 486
run 0, Dyna-Q, maze size 864
run 0, Dyna-Q, maze size 1350
run 1, Prioritized Sweeping, maze size 54
run 1, Prioritized Sweeping, maze size 216
run 1, Prioritized Sweeping, maze size 486
run 1, Prioritized Sweeping, maze size 864
run 1, Prioritized Sweeping, maze size 1350
run 1, Dyna-Q, maze size 54
run 1, Dyna-Q, maze size 216
run 1, Dyna-Q, maze size 486
run 1, Dyna-Q, maze size 864
run 1, Dyna-Q, maze size 1350
run 2, Prioritized Sweeping, maze size 54
run 2, Prioritized Sweeping, maze size 216
run 2, Prioritized Sweeping, maze size 486
run 2, Prioritized Sweeping, maze size 864
run 2, Prioritized Sweeping, maze size 1350
run 2, Dyna-Q, maze size 54
run 2, Dyna-Q, maze size 216
run 2, Dyna-Q, maze size 486
run 2, Dyna-Q, maze size 864
run 2, Dyna-Q, maze size 1350
run 3, Prioritized Sweeping, maze size 54
run 3, Prioritized Sweeping, maze size 216
run 3, Prioritized Sweeping, maze size 486
run 3, Prioritized Sweeping, maze size 864
run 3, Prioritized Sweeping, maze size 1350
run 3, Dyna-Q, maze size 54
run 3, Dyna-Q, maze size 216
run 3, Dyna-Q, maze size 486
run 3, Dyna-Q, maze size 864
run 3, Dyna-Q, maze size 1350
run 4, Prioritized Sweeping, maze size 54
run 4, Prioritized Sweeping, maze size 216
run 4, Prioritized Sweeping, maze size 486
run 4, Prioritized Sweeping, maze size 864
run 4, Prioritized Sweeping, maze size 1350
run 4, Dyna-Q, maze size 54
run 4, Dyna-Q, maze size 216
run 4, Dyna-Q, maze size 486
run 4, Dyna-Q, maze size 864
run 4, Dyna-Q, maze size 1350