决策树
Background Knowledge
sample group $D$
attribute $a$
information entropy
information gain
$a = \\{a^1, a^2, \ldots, a^V \\}$
$ D^v = \\{D(\text{attribute}(a) = a^v)\\}$
gain ratio
Gini index
test
test 1
Table 1: Loan Application Sample Dataset
This table presents the features of 15 applicants (Age, Employed, Owns House, Credit Status) and their loan application results (Class). This dataset is commonly used for training and analyzing decision tree algorithms.
ID | Age | Employed | Owns House | Credit Status | Class |
---|---|---|---|---|---|
1 | Young | No | No | Fair | No |
2 | Young | No | No | Good | No |
3 | Young | Yes | No | Good | Yes |
4 | Young | Yes | Yes | Fair | Yes |
5 | Young | No | No | Fair | No |
6 | Middle-aged | No | No | Fair | No |
7 | Middle-aged | No | No | Good | No |
8 | Middle-aged | Yes | Yes | Good | Yes |
9 | Middle-aged | No | Yes | Excellent | Yes |
10 | Middle-aged | No | Yes | Excellent | Yes |
11 | Senior | No | Yes | Excellent | Yes |
12 | Senior | No | Yes | Good | Yes |
13 | Senior | Yes | No | Good | Yes |
14 | Senior | Yes | No | Excellent | Yes |
15 | Senior | No | No | Fair | No |
ID3
Firstly
Then
</span>
</span>
</span>
</span>
Since attribute ‘Owns House’ has the highest information gain, it is selected as the optimal feature. Dividing D by attribute ‘Owns House’, we get two data sets($D_1$ and $D_2$). All labels in $D_1$ are ‘yes’, so we set $D_1$ as a leaf node with label ‘yes’. But for $D_2$, we need divide it.
graph TD
A[Owns House] -- yes --> B["yes, D1"]
A[Owns House] -- no --> C["D2"]
next
Since attribute ‘Employed’ has the highest information gain, it is selected as the optimal feature. Apparently, we get the final result.
graph TD
A[Owns House] -- yes --> B["yes"]
A[Owns House] -- no --> C["Employed"]
C["Employed"] -- yes --> D["yes"]
C["Employed"] -- no --> E["no"]
Pruning
Definition: loss function
Suppose a decisionTree having $T$ leaf nodes, each leaf node $t$ having $N_t$ samples and there are $N_{tk}$ sort $k = 1, 2, \ldots, K$ samples.
loss function
empirical entropy
combination
purpose
algorithm
We can define a DP[] and use the idea of dynamic programming to find the optimal pruning
- input data $X = \\{(\vec{x}_1, y_1), (\vec{x}_2, y_2), \ldots, (\vec{x}_N, y_N) \\}$
- output $Tree_\alpha$
- for t in T, Calculate $H_t(T)$
- Recursively backtrack from the leaf nodes of the tree upwards, compare $C_\alpha(T_A)$ and $C_\alpha(T_B)$
- return 2
CART
Regression Tree
A regression tree corresponds to a partition of the input space and the output values within the partition units.
input space $M = R_1, R_2, \ldots, R_M$, and each space $R_m$ has a output value $c_m$, we can define regression tree model as follow.
regression tree model
Mean Squared Error
dividing algorithm
- input: training data $D$
- output: regression tree $f(x)$
- choose $x_j$, its output is $s$. Space $R_1(j, s) = \\{x \mid y_i \leq s\\}$,space $R_2(j, s) = \\{x \mid y_i > s\\}$
- repeat 2 and 3
Classification Tree
Gini index