Glosario

Árbol de decisión

Descubre el poder de los árboles de decisión en el aprendizaje automático para la clasificación, la regresión y las aplicaciones del mundo real como la sanidad y las finanzas.

Entrena los modelos YOLO simplemente
con Ultralytics HUB

Saber más

A Decision Tree is a versatile and widely used machine learning (ML) algorithm that falls under the category of supervised learning. It uses a tree-like structure to model decisions and their possible consequences, similar to a flowchart. Each internal node represents a test on an attribute (or feature), each branch represents the outcome of the test, and each leaf node represents a class label (in classification tasks) or a continuous value (in regression tasks). Due to their intuitive structure, decision trees are known for being relatively easy to understand and interpret, making them valuable for explainable AI (XAI).

Cómo funcionan los árboles de decisión

The core idea is to split the dataset into smaller and smaller subsets based on the values of input features, creating a tree structure. The process starts at the root node, which represents the entire dataset. At each node, the algorithm selects the best feature and threshold to split the data in a way that increases the purity or homogeneity of the resulting subsets with respect to the target variable. Common criteria for finding the best split include Gini impurity and information gain (based on entropy), which measure the disorder or randomness in a set. This splitting process continues recursively until a stopping criterion is met, such as reaching a maximum depth, having a minimum number of samples in a node, or achieving pure leaf nodes (nodes containing samples of only one class). To make a prediction for a new data point, it traverses the tree from the root down to a leaf node based on the outcomes of the feature tests, and the prediction is the majority class or average value in that leaf. Careful data preprocessing and feature engineering can significantly impact a decision tree's performance.

Types of Decision Trees

Decision trees can be broadly categorized into two main types:

  • Classification Trees: Used when the target variable is categorical (e.g., predicting 'spam' or 'not spam'). The leaf nodes represent class labels.
  • Regression Trees: Used when the target variable is continuous (e.g., predicting house prices). The leaf nodes represent a predicted numerical value, often the average of the target values of the training samples that reach that leaf.

Ventajas y desventajas

Decision trees offer several benefits:

  • Interpretability: Their graphical structure makes them easy to visualize and understand.
  • Minimal Data Preparation: They often require less data cleaning compared to other algorithms, such as needing less data normalization.
  • Handles Non-linear Data: They can capture non-linear relationships between features and the target variable.
  • Feature Importance: They inherently provide a measure of feature importance based on how early or often a feature is used for splitting.

Sin embargo, también tienen inconvenientes:

  • Overfitting: Decision trees can easily become too complex and capture noise in the training data, leading to poor generalization on unseen test data. Techniques like pruning or setting constraints on tree growth help mitigate overfitting.
  • Instability: Small variations in the data can result in a completely different tree being generated.
  • Bias: Trees can be biased towards features with more levels or dominant classes if the dataset is imbalanced.

Aplicaciones en el mundo real

Decision trees are employed in various domains:

  1. Medical Diagnosis: Assisting doctors by creating models that suggest diagnoses based on patient symptoms and test results. For example, a tree could guide diagnosis by asking questions about symptoms sequentially (AI in healthcare applications).
  2. Customer Churn Prediction: Businesses use decision trees to identify customers likely to stop using their service based on usage patterns, demographics, and interaction history, allowing for targeted retention efforts (Predicting Customer Churn).
  3. Financial Risk Assessment: Evaluating creditworthiness by analyzing factors like income, debt, and credit history (Computer vision models in finance).
  4. Manufacturing Quality Control: Identifying potential defects in products based on sensor readings or process parameters (Improving Manufacturing with Computer Vision).

Relationship to Other Models

Decision trees form the basis for more complex ensemble methods like Random Forests and Gradient Boosted Trees (like XGBoost or LightGBM). Random Forests, for instance, build multiple decision trees on different subsets of data and features and aggregate their predictions, often leading to better accuracy and robustness against overfitting compared to a single tree. While powerful for many tabular data problems, decision trees differ significantly from models like Convolutional Neural Networks (CNNs) or Vision Transformers (ViT) used in computer vision. Models like Ultralytics YOLO11 leverage deep learning architectures optimized for tasks like object detection, image classification, and instance segmentation, which involve processing complex, high-dimensional data like images, a domain where single decision trees are less effective. Understanding foundational models like decision trees provides valuable context within the broader landscape of AI and predictive modeling. Tools like Scikit-learn provide popular implementations for decision trees, while platforms like Ultralytics HUB streamline the development and deployment of advanced vision models.

Leer todo