提升方法
AdaBoost 算法
Data Information: 二分类训练数据集 \(T\) \[ \begin{align} T& = \{(x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)\} \\ x_i \in \mathcal{X}& = \mathbb{R}^n, \quad y_i \in \mathcal{Y} = \{+1, -1\}, \quad i = 1, 2, \ldots, N \end{align} \]
Algorithm Process 1. 初始化训练数据的权值分布: \[ D_1 = (w_{11}, w_{12}, \dots, w_{1N}), \quad w_{1i} = \frac{1}{N}, \quad i = 1, 2, \dots, N \]
- 用权值分布 \(D_i\)
计算不正确率: \[ e_m = \sum_{i=1}^{N} P(G_m(x_i) \neq y_i) = \sum_{i=1}^{N} w_{mi} I(G_m(x_i) \neq y_i) \]
计算 \(\alpha_m\): \[ \alpha_m = \frac{1}{2} \ln \frac{1 - e_m}{e_m} \]
更新权值分布: \[ D_{m+1} = (w_{m+1,1}, \dots, w_{m+1, N}) \] \[ w_{m+1,i} = \frac{w_{mi}}{Z_m} \exp(-\alpha_m y_i G_m(x_i)) \] \(Z_m\) 是归一化因子。 \[ Z_m = \sum_{i=1}^{N} w_{mi} \exp(-\alpha_m y_i G_m(x_i)) \]
- 构造基本分类器的线性组合 \[ f(x) = \sum_{m=1}^{M} \alpha_m G_m(x) \]
- 得到最终分类器 \[ G(x) = \operatorname{sign}\left( \sum_{m=1}^{M} \alpha_m G_m(x) \right) \]
回归问题的提升树算法
Algorithm Process 输入:训练数据集 \(T = \{(x_1, y_1),(x_2, y_2), \cdots, (x_N, y_N)\}\); 输出:提升树 \(f_M(x)\)。 (1)初始化 \(f_0(x) = 0\)。 (2)对 \(m = 1, 2, \cdots, M\)。