Discover keypoints in computer vision: pose estimation with Ultralytics YOLO11 for fitness, gesture recognition, and fast, accurate tracking.
Keypoints are specific, informative spatial locations within an image that define distinct features of an object or scene. In the realm of computer vision (CV), these points—typically represented as X and Y coordinates—mark significant areas of interest, such as the corners of a building, facial features like the eyes and nose, or the anatomical joints of a human body. Unlike processing every pixel in a dense grid, focusing on these sparse, semantically rich points allows artificial intelligence (AI) models to efficiently understand geometry, analyze shapes, and track movement with high precision. This concept is foundational to advanced tasks requiring a structural understanding of the subject rather than just its presence or location.
Keypoints serve as the fundamental building blocks for mapping the structure of dynamic objects. When multiple keypoints are detected and connected, they form a skeletal graph or wireframe that represents the object's pose. This is most commonly applied in pose estimation, where deep learning (DL) algorithms predict the location of joints—shoulders, elbows, hips, and knees—to reconstruct human or animal posture.
By leveraging advanced architectures like the Ultralytics YOLO26 model, systems can regress these coordinates directly from input images with remarkable speed. This process involves complex feature extraction, where the neural network learns to identify local patterns invariant to lighting, rotation, and scale. Because keypoints represent a condensed summary of an object's state, they are computationally efficient, making them ideal for real-time inference on edge computing devices.
To understand the specific utility of keypoints, it is helpful to compare them with other primary computer vision tasks found in the Ultralytics Platform:
The ability to track specific points on a subject opens the door to diverse applications across various industries:
Modern libraries make it straightforward to implement keypoint detection using pre-trained models. The
ultralytics package provides instant access to state-of-the-art models like YOLO26 and
YOLO11, which can be trained on datasets like
COCO or
Tiger-Pose.
The following example demonstrates how to load a pose estimation model and visualize the detected keypoints using Python:
from ultralytics import YOLO
# Load a pretrained YOLO26n-pose model
model = YOLO("yolo26n-pose.pt")
# Run inference on a local image
results = model("path/to/runner.jpg")
# Visualize the results, showing the skeletal keypoints
results[0].show()
In this workflow, the model outputs a result object containing the coordinates and a
confidence score for each detected point. Developers can
extract these raw x, y values to build custom logic, such as counting repetitions in a gym application or
controlling a game character via motion capture.