决策树

Background Knowledge

sample group $D$

attribute $a$

information entropy

information gain

$a = \\{a^1, a^2, \ldots, a^V \\}$

$ D^v = \\{D(\text{attribute}(a) = a^v)\\}$

gain ratio

Gini index

test

test 1

Table 1: Loan Application Sample Dataset
This table presents the features of 15 applicants (Age, Employed, Owns House, Credit Status) and their loan application results (Class). This dataset is commonly used for training and analyzing decision tree algorithms.

ID Age Employed Owns House Credit Status Class
1 Young No No Fair No
2 Young No No Good No
3 Young Yes No Good Yes
4 Young Yes Yes Fair Yes
5 Young No No Fair No
6 Middle-aged No No Fair No
7 Middle-aged No No Good No
8 Middle-aged Yes Yes Good Yes
9 Middle-aged No Yes Excellent Yes
10 Middle-aged No Yes Excellent Yes
11 Senior No Yes Excellent Yes
12 Senior No Yes Good Yes
13 Senior Yes No Good Yes
14 Senior Yes No Excellent Yes
15 Senior No No Fair No

ID3

Firstly

Then

</span>

</span>

</span>

</span>

Since attribute ‘Owns House’ has the highest information gain, it is selected as the optimal feature. Dividing D by attribute ‘Owns House’, we get two data sets($D_1$ and $D_2$). All labels in $D_1$ are ‘yes’, so we set $D_1$ as a leaf node with label ‘yes’. But for $D_2$, we need divide it.

graph TD
    A[Owns House] -- yes --> B["yes, D1"]
    A[Owns House] -- no --> C["D2"]
Figure 1: first step

next

Since attribute ‘Employed’ has the highest information gain, it is selected as the optimal feature. Apparently, we get the final result.

graph TD
    A[Owns House] -- yes --> B["yes"]
    A[Owns House] -- no --> C["Employed"]
    C["Employed"] -- yes --> D["yes"]
    C["Employed"] -- no --> E["no"]
Figure 2: finial decision tree

Pruning

Definition: loss function

Suppose a decisionTree having $T$ leaf nodes, each leaf node $t$ having $N_t$ samples and there are $N_{tk}$ sort $k = 1, 2, \ldots, K$ samples.

loss function

empirical entropy

combination

purpose

algorithm

We can define a DP[] and use the idea of dynamic programming to find the optimal pruning

  • input data $X = \\{(\vec{x}_1, y_1), (\vec{x}_2, y_2), \ldots, (\vec{x}_N, y_N) \\}$
  • output $Tree_\alpha$
  1. for t in T, Calculate $H_t(T)$
  2. Recursively backtrack from the leaf nodes of the tree upwards, compare $C_\alpha(T_A)$ and $C_\alpha(T_B)$
  3. return 2

CART

Regression Tree

A regression tree corresponds to a partition of the input space and the output values within the partition units.

input space $M = R_1, R_2, \ldots, R_M$, and each space $R_m$ has a output value $c_m$, we can define regression tree model as follow.

regression tree model

Mean Squared Error

dividing algorithm

  • input: training data $D$
  • output: regression tree $f(x)$
  1. choose $x_j$, its output is $s$. Space $R_1(j, s) = \\{x \mid y_i \leq s\\}$,space $R_2(j, s) = \\{x \mid y_i > s\\}$
  2. repeat 2 and 3

Classification Tree

Gini index


决策树
http://example.com/2025/05/28/决策树/
作者
ddccffq
发布于
2025年5月28日
许可协议