引用来自ShangtongZhang的代码chapter02/ten_armed_testbed.py
通过建立10-armed-Testbed来仿真第二章讲的几种Bandit算法
1、引入模块
1 | import matplotlib |
2、创建Testbed类,实现基本的action和update value方法
1 | class Bandit: |
3、训练Bandit
1 | def simulate(runs, time, bandits): |
4、结果显示(折线图格式)
1、10-Bandit-Testbed value 和reward分布
1 | def figure_2_1(): |
2、different epsilons
1 | def figure_2_2(runs=2000, time=1000): |
3、Initial value = 5 VS Initial value = 0
1 | def figure_2_3(runs=2000, time=1000): |
4、UCB VS epsilon-greey
1 | def figure_2_4(runs=2000, time=1000): |
5、softmax baseline VS non-baseline
1 | def figure_2_5(runs=2000, time=1000): |
6、epsilon-greey vs softmax vs UCB vs opt-initial
1 | def figure_2_6(runs=2000, time=1000): |