Glosario

Vecinos más próximos K (KNN)

Descubre cómo K-Nearest Neighbors (KNN) simplifica el aprendizaje automático con su enfoque intuitivo y no paramétrico para tareas de clasificación y regresión.

Entrena los modelos YOLO simplemente
con Ultralytics HUB

Saber más

K-Nearest Neighbors (KNN) is a fundamental algorithm in machine learning (ML), used for both classification and regression tasks. It stands out for its simplicity and intuitive approach, making it a great starting point for understanding instance-based learning. KNN is classified as a non-parametric method because it doesn't make assumptions about the underlying data distribution. It's also known as a "lazy learning" algorithm because it doesn't build a general model during the training data phase; instead, it stores the entire dataset and performs calculations only when a prediction is needed.

Cómo funciona KNN

The core idea behind KNN is based on similarity, often defined using distance metrics like Euclidean distance. When predicting a new, unseen data point, the algorithm identifies the 'K' closest data points (neighbors) to it from the stored training dataset. The value 'K' is a user-defined integer and represents the number of neighbors considered.

For classification, the new point is assigned to the class that is most common among its K neighbors (majority voting). For regression, the prediction is typically the average value of the K neighbors. The choice of distance metric (e.g., Manhattan, Minkowski) and the value of 'K' are crucial hyperparameters that significantly influence the model's performance. Efficient implementation often relies on data structures like KD-trees or Ball trees to speed up neighbor searches, especially with larger datasets.

Elegir el valor de "K

Selecting the optimal 'K' is critical. A small 'K' value (e.g., K=1) makes the model highly sensitive to noise and outliers in the data, potentially leading to overfitting, where the model performs well on training data but poorly on unseen data. Conversely, a large 'K' value can oversmooth the decision boundaries, making the model less sensitive to local patterns and potentially leading to underfitting and high computational cost during prediction. Techniques like cross-validation (see the Scikit-learn Cross-validation Guide) are often employed to find a suitable 'K' that balances the bias-variance tradeoff. The Scikit-learn library provides tools for implementing KNN and performing hyperparameter searches, and you can find general tips in the Ultralytics Hyperparameter Tuning Guide.

Aplicaciones de KNN

La simplicidad de KNN se presta a diversas aplicaciones, sobre todo cuando se valora la interpretabilidad:

Ventajas y desventajas de KNN

KNN ofrece varias ventajas, pero también tiene limitaciones:

Ventajas:

Desventajas:

KNN frente a conceptos relacionados

Es importante distinguir el KNN de otros algoritmos:

While KNN is valuable for certain tasks and understanding fundamental ML concepts, complex problems like real-time object detection often benefit from more advanced models like Ultralytics YOLO, which offer superior speed and performance, especially on large-scale computer vision datasets. You can train and deploy such models using platforms like Ultralytics HUB.

Leer todo