Chapter09 square_wave

引用来自ShangtongZhang的代码chapter09/square_wave.py

使用Coarse Coding方法构造feature来比较不同参数对近似函数性能的影响

问题描述

这个例子是书上的Example 9.3: Coarseness of Coarse Coding：

使用Coarse Coding的方法建立近似函数去近似一个方波函数，即将方波函数的随机采样作为U_t来使用，通过修改区间之间的间隔、尺寸等参数来比较不同参数对Coarse Coding特征的泛化特性的影响。

9.5讲的几个feature构造方法，并不只是用于强化学习，在函数拟合也是可以适用的，即回归问题上也是适用的。话说本来value function approximation就是监督学习的思想。。。所以本质不要搞混了。

引入模块

import numpy as np
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
from tqdm import tqdm

定义区间类

# wrapper class for an interval
# readability is more important than efficiency, so I won't use many tricks
class Interval:
    # [@left, @right)
    def __init__(self, left, right):
        self.left = left
        self.right = right

    # whether a point is in this interval
    def contain(self, x):
        return self.left <= x < self.right

    # length of this interval
    def size(self):
        return self.right - self.left


# domain of the square wave, [0, 2)
DOMAIN = Interval(0.0, 2.0)

定义需要估计的方波波形，以及随机抽样函数

# square wave function
def square_wave(x):
    if 0.5 < x < 1.5:
        return 1
    return 0

# get @n samples randomly from the square wave
# 返回长度为n的抽样序列
def sample(n):
    samples = []
    for i in range(0, n):
        x = np.random.uniform(DOMAIN.left, DOMAIN.right)
        y = square_wave(x)
        samples.append([x, y])
    return samples

定义Coarse Coding类来建立近似函数和更新方法

# wrapper class for value function
class ValueFunction:
    # @domain: domain of this function, an instance of Interval
    # @alpha: basic step size for one update
    def __init__(self, feature_width, domain=DOMAIN, alpha=0.2, num_of_features=50):
        self.feature_width = feature_width
        self.num_of_featrues = num_of_features
        self.features = []
        self.alpha = alpha
        self.domain = domain
        
        # 选择了一种方法来放置feature，也就是区间Interval
        # there are many ways to place those feature windows,
        # following is just one possible way
        # num_of)feature需要大于1
        step = (domain.size() - feature_width) / (num_of_features - 1)
        left = domain.left
        for i in range(0, num_of_features - 1):
            self.features.append(Interval(left, left + feature_width))
            left += step
        self.features.append(Interval(left, domain.right))

        # initialize weight for each feature
        self.weights = np.zeros(num_of_features)

    # for point @x, return the indices of corresponding feature windows
    def get_active_features(self, x):
        active_features = []
        for i in range(0, len(self.features)):
            if self.features[i].contain(x):
                active_features.append(i)
        return active_features

    # estimate the value for point @x
    def value(self, x):
        active_features = self.get_active_features(x)
        # 所有active-feature的weight的总和
        return np.sum(self.weights[active_features])

    # update weights given sample of point @x
    # @delta: y - x，这里的delta并没有包含step-size，总感觉这个代码和1000-state不是一个作者。。。
    def update(self, delta, x):
        active_features = self.get_active_features(x)
        delta *= self.alpha / len(active_features)
        for index in active_features:
            self.weights[index] += delta

训练并绘制图表，比较不同参数的性能，这里只修改了样本数量和feature的width

# train @value_function with a set of samples @samples
def approximate(samples, value_function):
    for x, y in samples:
        delta = y - value_function.value(x)
        value_function.update(delta, x)

# Figure 9.8
def figure_9_8():
    num_of_samples = [10, 40, 160, 640, 2560, 10240]
    feature_widths = [0.2, 0.4, 1.0]
    plt.figure(figsize=(30, 20))
    axis_x = np.arange(DOMAIN.left, DOMAIN.right, 0.02)
    for index, num_of_sample in enumerate(num_of_samples):
        print(num_of_sample, 'samples')
        samples = sample(num_of_sample)
        value_functions = [ValueFunction(feature_width) for feature_width in feature_widths]
        plt.subplot(2, 3, index + 1)
        plt.title('%d samples' % (num_of_sample))
        for value_function in value_functions:
            approximate(samples, value_function)
            values = [value_function.value(x) for x in axis_x]
            plt.plot(axis_x, values, label='feature width %.01f' % (value_function.feature_width))
        plt.legend()

    plt.savefig('./figure_9_8.png')
    plt.show()
    
figure_9_8()

10 samples
40 samples
160 samples
640 samples
2560 samples
10240 samples

png

可以看到feature-width对训练结果影响很大，width大对应board feature，泛化范围广，曲线较平坦；width小对应narrow feature，泛化范围窄，曲线毛刺比较多。总体的渐进效果影响不大，但是对具体state的泛化影响就比较大了。