


It behaves exactly like a tree in structure, where each decision is a branch and all end at a decision. In machine learning a decision tree is an algorithm used for either of the two tasks, Regression, and Classification. Regression is a type of algorithm where we deal with continuous data such as Housing Prices, and Classification deals with discrete values where output is categorical. In a visual representation, the branches represent the data split and the leaves are the outcomes.
To explain how a decision tree works we can imagine a scenario such as deciding to go on a walk, you would be considering factors such as temperature, wind, rain, etc. At the heart of a decision tree is the process of splitting down the dataset to help make decisions
We start from the root node which represents the complete dataset, this is where decision-making begins.
The key to constructing a Decision Tree lies in selecting attributes that the data is split on. Each split aims to split the data into subsets where purity is enhanced subsequently, making it easier to classify or predict an outcome. An attribute is chosen based on its ability to maximize Information Gain for that split.
Information Gain is a measure that shows the amount of impurity reduced by a certain attribute splitting the data. It is measured by comparing the entropy of data before and after splitting. let's take our scenario here and choose a suitable attribute. Is it raining? is a pivotal attribute when deciding to go for a walk. If this attribute decreases our entropy enough, it raises our information gain and therefore is a suitable attribute to split the data on.
When data is split based on an attribute, generated branches represent the different outcomes. For example, the question, Is it raining? would be answered in yes or no and a separate branch carrying subsets of the dataset corresponding to each would be created.
The process of attribute selection and splitting occurs for each branch, creating a class-biased hierarchical structure till a stopping criterion is met. Common stopping criteria include reaching a maximum depth, having a node with pure subsets (all instances belong to the same class), or having a minimum number of instances in a node.

The splitting criteria for a decision tree is determining how to create subsets of a given dataset while maximizing the homogeneity of the data within each subset. The measure of impurities in a dataset varies on the type of problem.
Classification: for classifying problems, Gini(measuring the probability of misclassifying a random element), Information Gain(reduction of entropy after a dataset is split based on an attribute), or Entropy-based(amount of information needed to classify a member of the dataset) splitting is used.
Regression: for regression tasks, mean squared error (MSE) or minimizing variance is used where the goal is to create splits that reduce the difference between output and predicted values.

S is the dataset.
c is the number of classes.
pi is the proportion of instances in class i.
𝑆 is the dataset.
𝐴 is the attribute.
𝑣 are the values of attribute 𝐴
𝑆𝑣 is the subset of 𝑆 where attribute 𝐴 has value 𝑣
Example: If we're classifying fruits and asking "Is it red?" reduces entropy by half, that's a good question with high information gain.
𝑆 is the dataset.
𝑐 is the number of classes.
𝑝𝑖 is the proportion of instances in class 𝑖.
Example: If asking "Is it an apple?" results in most answers being yes, that's low Gini impurity.
The word pruning may ring a bell if you are familiar with gardening. It refers to the act of selectively cutting down individual branches to improve the structure of the tree. In decision trees, it is a method that helps avoid overfitting which is when a model trains too well where it also learns noise and other such outliers which decrease its performance on new data. Pruning works the same way in machine learning by removing unwanted or unnecessary branches and simplifying the tree. There are two types of pruning:
Pre-Pruning: Stopping the tree-building process before it becomes too complex.
Post-Pruning: Allowing the tree-building process to complete and then pruning(removing) the branches contributing to overfitting.

We use two well-known implementations for both problem types to implement Decision Trees in Python.
In Python “DecissionTreeClassifier” is a popular implementation for classifying tasks using decision trees, the “sklearn.tree” module from the SciKit-Learn library can be used. Here is a basic code example of how to implement it in Python (I would use a code box here) “ from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split # Load a dataset iris = load_iris() X = iris.data y = iris.target # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a DecisionTreeClassifier clf = DecisionTreeClassifier(criterion='gini', max_depth=3) # Train the classifier clf.fit(X_train, y_train) # Make predictions predictions = clf.predict(X_test) # Evaluate the classifier accuracy = clf.score(X_test, y_test) print(f'Accuracy: {accuracy}') “ In the example code above, criterion = ‘gini’ shows that Gini Impurity is being used as the splitting criterion, and max_depth=3 limits the depth of the tree to prevent overfitting.
For regression tasks, the “DecissionTreeRegressor” is also a part of the “sklearn.tree” module and works similarly to the “DecissionTreeClassifier” but is used to predict continuous values instead. Here is an example Python implementation: “ from sklearn.tree import DecisionTreeRegressor from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split # Load a dataset housing = fetch_california_housing() X = housing.data y = housing.target # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a DecisionTreeRegressor regressor = DecisionTreeRegressor(max_depth=3) # Train the regressor regressor.fit(X_train, y_train) # Make predictions predictions = regressor.predict(X_test) # Evaluate the regressor mse = mean_squared_error(y_test, predictions) print(f'Mean Squared Error: {mse}') “
Decision trees are versatile and can be used in a wide range of real-life problems, here are some common use cases:
Banking: Decision trees can be used to predict loan defaults based on customer data, helping banks to manage risk.
Healthcare: They can assist in diagnosing diseases by analyzing symptoms and patient history, aiding doctors in making informed decisions.
Marketing: Decision trees can segment customers and predict their purchasing behavior, enabling targeted marketing campaigns.
E-commerce: They can be used for product recommendations by predicting what items a user might be interested in based on their browsing and purchase history.
Fraud Detection: Decision trees can identify patterns that indicate fraudulent transactions in financial data.
Weather Forecasting: They can predict weather patterns by analyzing historical data and various meteorological factors.
Stock Market Analysis: Based on historical data, decision trees can help predict stock prices and market trends.



