Chapter09 square_wave

引用来自ShangtongZhang的代码chapter09/square_wave.py

使用Coarse Coding方法构造feature来比较不同参数对近似函数性能的影响

问题描述

这个例子是书上的Example 9.3: Coarseness of Coarse Coding:

使用Coarse Coding的方法建立近似函数去近似一个方波函数,即将方波函数的随机采样作为U_t来使用,通过修改区间之间的间隔、尺寸等参数来比较不同参数对Coarse Coding特征的泛化特性的影响。

9.5讲的几个feature构造方法,并不只是用于强化学习,在函数拟合也是可以适用的,即回归问题上也是适用的。话说本来value function approximation就是监督学习的思想。。。所以本质不要搞混了。

引入模块

1
2
3
4
5
import numpy as np
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
from tqdm import tqdm

定义区间类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# wrapper class for an interval
# readability is more important than efficiency, so I won't use many tricks
class Interval:
# [@left, @right)
def __init__(self, left, right):
self.left = left
self.right = right

# whether a point is in this interval
def contain(self, x):
return self.left <= x < self.right

# length of this interval
def size(self):
return self.right - self.left


# domain of the square wave, [0, 2)
DOMAIN = Interval(0.0, 2.0)

定义需要估计的方波波形,以及随机抽样函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# square wave function
def square_wave(x):
if 0.5 < x < 1.5:
return 1
return 0

# get @n samples randomly from the square wave
# 返回长度为n的抽样序列
def sample(n):
samples = []
for i in range(0, n):
x = np.random.uniform(DOMAIN.left, DOMAIN.right)
y = square_wave(x)
samples.append([x, y])
return samples

定义Coarse Coding类来建立近似函数和更新方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# wrapper class for value function
class ValueFunction:
# @domain: domain of this function, an instance of Interval
# @alpha: basic step size for one update
def __init__(self, feature_width, domain=DOMAIN, alpha=0.2, num_of_features=50):
self.feature_width = feature_width
self.num_of_featrues = num_of_features
self.features = []
self.alpha = alpha
self.domain = domain

# 选择了一种方法来放置feature,也就是区间Interval
# there are many ways to place those feature windows,
# following is just one possible way
# num_of)feature需要大于1
step = (domain.size() - feature_width) / (num_of_features - 1)
left = domain.left
for i in range(0, num_of_features - 1):
self.features.append(Interval(left, left + feature_width))
left += step
self.features.append(Interval(left, domain.right))

# initialize weight for each feature
self.weights = np.zeros(num_of_features)

# for point @x, return the indices of corresponding feature windows
def get_active_features(self, x):
active_features = []
for i in range(0, len(self.features)):
if self.features[i].contain(x):
active_features.append(i)
return active_features

# estimate the value for point @x
def value(self, x):
active_features = self.get_active_features(x)
# 所有active-feature的weight的总和
return np.sum(self.weights[active_features])

# update weights given sample of point @x
# @delta: y - x,这里的delta并没有包含step-size,总感觉这个代码和1000-state不是一个作者。。。
def update(self, delta, x):
active_features = self.get_active_features(x)
delta *= self.alpha / len(active_features)
for index in active_features:
self.weights[index] += delta

训练并绘制图表,比较不同参数的性能,这里只修改了样本数量和feature的width

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# train @value_function with a set of samples @samples
def approximate(samples, value_function):
for x, y in samples:
delta = y - value_function.value(x)
value_function.update(delta, x)

# Figure 9.8
def figure_9_8():
num_of_samples = [10, 40, 160, 640, 2560, 10240]
feature_widths = [0.2, 0.4, 1.0]
plt.figure(figsize=(30, 20))
axis_x = np.arange(DOMAIN.left, DOMAIN.right, 0.02)
for index, num_of_sample in enumerate(num_of_samples):
print(num_of_sample, 'samples')
samples = sample(num_of_sample)
value_functions = [ValueFunction(feature_width) for feature_width in feature_widths]
plt.subplot(2, 3, index + 1)
plt.title('%d samples' % (num_of_sample))
for value_function in value_functions:
approximate(samples, value_function)
values = [value_function.value(x) for x in axis_x]
plt.plot(axis_x, values, label='feature width %.01f' % (value_function.feature_width))
plt.legend()

plt.savefig('./figure_9_8.png')
plt.show()

figure_9_8()
10 samples
40 samples
160 samples
640 samples
2560 samples
10240 samples

png

可以看到feature-width对训练结果影响很大,width大对应board feature,泛化范围广,曲线较平坦;width小对应narrow feature,泛化范围窄,曲线毛刺比较多。总体的渐进效果影响不大,但是对具体state的泛化影响就比较大了。