Decision Tree

Representation

Learning

Denote the set of features by and the set of classes by . Starting with the full training data of size , we grow a tree by splitting data at each node into two parts according to the values of a selected feature, until no more split can be generated (or some regularization techniques are applied).

Conceptually, we want to split the data at a node such that the division of class labels becomes clearer at the two children nodes. Technically, we choose the most informative feature at each node, and the split results in the biggest reduction in uncertainty.

The math below defines the measure of uncertainty and specifies what we exactly mean by an informative feature.

Given the training data of size at node , the frequency of samples in class is

where is a sample in and y is its label.

There are two common measures of uncertainty (or impurity):

To understand these measures, we consider a binary problem with classes 0 and 1, and plot gini score and entropy against .

It is easy to see that both measures reach the maximum when , which agrees with our intuition.

Given feature ,

where

Split with the feature that reduces uncertainty the most:

To avoid overfitting, we may tune the following hyperparameters to regularize decision tree learning.

the maximum depth of the tree
the minimum number of samples required to split a node
the minimum number of samples required to be at a leaf node
the maximum number of leaf nodes
the minimum decrease in uncertainty required for learning to continue
the minumum uncertainty required for splitting a node

Inference

Given a new sample , move down the tree by choosing the right path at each split. Finally, you end up in a leaf node that tells you which class the sample belongs to.

Software

criterion: the measure of uncertainty (gini or entropy)
max_depth: the maximum depth of the tree (None if no restriction)
min_samples_split: the minimum number of samples required to create a split
min_samples_leaf: the minimum number of samples required at a leaf node
class_weight: weights associated with the classes
- uniform: every class receives the same weight.
- balanced: class weights are inversely proportional to class frequencies.

Check out the documentation listed below to view the attributes that are available in sklearn but not exposed to the user in the software.

Further readings

sklearn tutorial on Decision Trees.

sklearn DecisionTreeClassifier documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision_Tree.md

Decision_Tree.md

Decision Tree

Representation

Learning

Inference

Software

Further readings

Files

Decision_Tree.md

Latest commit

History

Decision_Tree.md

File metadata and controls

Decision Tree

Representation

Learning

Inference

Software

Further readings