Meet YOLO26: next-gen vision AI.
Ultralytics
Back to Ultralytics Glossary

Visual Prompting

Explore visual prompting to guide AI models with points and boxes. Learn how Ultralytics YOLO and SAM enable precise segmentation and faster data annotation.

Visual prompting is an emerging technique in computer vision where users provide spatial or visual cues—such as points, bounding boxes, or scribbles—to guide an AI model's focus toward specific objects or regions within an image. Unlike traditional prompt engineering which relies primarily on text descriptions, visual prompting allows for more precise and intuitive interaction with Artificial Intelligence (AI) systems. This method leverages the capabilities of modern foundation models to perform tasks like segmentation and detection without requiring extensive retraining or large labeled datasets. By effectively "pointing" at what matters, users can adapt general-purpose models to novel tasks instantaneously, bridging the gap between human intent and machine perception.

Link to this sectionMechanisms of Visual Prompting#

At its core, visual prompting works by injecting spatial information directly into the model's processing pipeline. When a user clicks on an object or draws a box, these inputs are converted into coordinate-based embeddings that the neural network integrates with the image features. This process is central to interactive architectures like the Segment Anything Model (SAM), where the model predicts masks based on geometric prompts.

The flexibility of visual prompting allows for various interaction types:

  • Point Prompts: A user clicks on a specific pixel to indicate the object of interest. The model then expands this selection to the entire object boundaries.
  • Box Prompts: Drawing a bounding box provides a coarse localization, signaling the model to segment or classify everything contained within that area.
  • Scribble Prompts: Freehand lines drawn over an object can help disambiguate complex scenes where objects overlap or have similar textures.

Recent research presented at CVPR 2024 highlights how visual prompting significantly reduces the time required for data annotation, as human annotators can correct model predictions in real-time with simple clicks rather than manually tracing polygons.

Link to this sectionVisual Prompting vs. Text Prompting#

While both techniques aim to guide model behavior, it is important to distinguish visual prompting from text-based methods. Text-to-image generation or zero-shot detection relies on natural language processing (NLP) to interpret semantic descriptions (e.g., "find the red car"). However, language can be ambiguous or insufficient for describing precise spatial locations or abstract shapes.

Visual prompting resolves this ambiguity by grounding the instruction in the pixel space itself. For instance, in medical image analysis, it is far more accurate for a radiologist to click on a suspicious nodule than to attempt to describe its exact coordinates and irregular shape via text. Often, the most powerful workflows combine both approaches—using text for semantic filtering and visual prompts for spatial precision—a concept known as multi-modal learning.

Link to this sectionReal-World Applications#

The adaptability of visual prompting has led to its rapid adoption across diverse industries:

  • Interactive Medical Diagnostics: Doctors use visual prompting tools to isolate tumors or organs in MRI scans. By simply clicking on a region of interest, they can instantly generate 3D volumetric measurements, aiding in precise tumor detection and surgical planning.
  • Smart Photo Editing: In consumer software like Adobe Photoshop or mobile apps, visual prompting powers "magic select" tools. Users can tap a person or object to remove the background or apply targeted filters, utilizing underlying instance segmentation technologies without needing manual masking skills.
  • Robotic Manipulation: In AI in Robotics, robots can be instructed to pick up specific items through a visual interface. An operator clicks on an object in the robot's camera feed, providing a visual prompt that the robot translates into grasping coordinates, facilitating human-in-the-loop automation in warehouses.

Link to this sectionImplementation with Ultralytics#

The Ultralytics ecosystem supports visual prompting workflows, particularly through models like FastSAM and SAM. These models allow developers to pass point or box coordinates programmatically to retrieve segmentation masks.

The following example demonstrates how to use the ultralytics package to apply a point prompt to an image, instructing the model to segment the object located at specific coordinates.

from ultralytics import SAM

# Load the Segment Anything Model (SAM)
model = SAM("sam2.1_b.pt")

# Apply a visual point prompt to the image
# The 'points' argument accepts [x, y] coordinates
# labels: 1 indicates a foreground point (include), 0 indicates background
results = model("https://ultralytics.com/images/bus.jpg", points=[[300, 350]], labels=[1])

# Display the segmented result
results[0].show()

Link to this sectionAdvancing Model Agility#

Visual prompting represents a shift towards "promptable" computer vision, where models are no longer static "black boxes" but interactive tools. This capability is essential for active learning loops, where models rapidly improve by incorporating user feedback.

For developers looking to integrate these capabilities into production, the Ultralytics Platform offers tools to manage datasets and deploy models that can handle dynamic inputs. As research progresses, we expect to see even tighter integration between visual prompts and large language models (LLMs), enabling systems that can reason about visual inputs with the same fluency they currently handle text.

Explore solutions

Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning