Meet YOLO26: next-gen vision AI.
Ultralytics
Ultralytics YOLO

Getting hands-on with YOLO-World

Learn about YOLO-World, an innovative object detection model that can identify objects through text prompts. Explore how YOLO-World works and its applications, and get hands-on with a quick code example.

ABAbirami Vina
4 min read
Getting hands-on with YOLO-World

Computer vision projects often involve spending a lot of time annotating data and training object detection models. But, that might soon be a thing of the past. Tencent’s AI Lab released YOLO-World, a real-time, open-vocabulary object detection model, on January 31st, 2024. YOLO-World is a zero-shot model, meaning you can run object detection inferences on images without having to train it.

Zero-shot models have the potential to change the way we approach computer vision applications. In this blog, we'll explore how YOLO-World works and its potential uses and share a practical code example to get you started.

Link to this sectionA peek into YOLO-World#

You can pass an image and text prompt describing what objects you're looking for through the YOLO-World model. For example, if you're interested in finding "a person wearing a red shirt" within a photo, YOLO-World takes this input and gets to work.

The model’s unique architecture combines three main elements:

  • A detector based on the Ultralytics YOLOv8 object detection model, to analyze the visual content of the image.
  • A text encoder that is pre-trained by OpenAI’s CLIP, specifically designed to understand your text prompt.
  • A network, the Vision-Language Path Aggregation Network (RepVL-PAN), which integrates the processed image data with the text data.

The YOLO detector scans your input image to identify potential objects. The text encoder transforms your description into a format that the model can understand. These two streams of information are then merged through the RepVL-PAN using multi-level cross-modality fusion. It lets YOLO-World precisely detect and locate the objects described in your prompt within the image.

An example of results from YOLO-World

An example of results from YOLO-World.

Link to this sectionBenefits of choosing YOLO-World#

One of the biggest advantages of using YOLO-World is that you don't have to train the model for a specific class. It has already learned from pairs of images and texts, so it knows how to find objects based on descriptions. You can avoid hours of collecting data, annotating data, training on expensive GPUs, and so on.

Here are some other benefits of using YOLO-World:

  • Real-Time Performance - YOLO-World supports real-time performance just like the original YOLO architecture. It’s ideal for applications requiring immediate object detection such as autonomous vehicles and surveillance systems.
  • Instance Segmentation - YOLO-World can neatly outline and separate objects in pictures, even if those objects weren't specifically taught during its training.
  • Efficiency - YOLO-World combines high accuracy with computational efficiency, making it practical for real-world applications. Its streamlined architecture makes rapid object detection possible without excessive demands on processing power.

Link to this sectionThe applications of YOLO-World#

YOLO-World models can be used for a wide variety of applications. Let’s explore some of them.

Link to this sectionQuality control in manufacturing#

Products manufactured on an assembly line are checked visually for defects before packing them. The defect detection is often done by hand, which takes time and can lead to mistakes. These mistakes can cause problems like high costs and the need for repairs or recalls. To help with this, special machine vision cameras and AI systems have been created to perform these checks.

YOLO-World models are a big advancement in this area. They can find defects in products even when they haven't been trained for that specific problem using their zero-shot abilities. For example, a factory manufacturing water bottles can easily identify between a bottle sealed properly with a bottle cap versus a bottle where a cap was missed out or faulty using YOLO-World.

An example of bottle cap inspection

An example of bottle cap inspection.

Link to this sectionRobotics#

YOLO-World models allow robots to interact with unfamiliar environments. Without being trained on specific objects that may be in a room, they can still identify what objects are present. So, let’s say a robot enters a room it has never been in before. With a YOLO-World model, it can still recognize and identify objects like chairs, tables, or lamps, even though it hasn't been specifically trained on those items.

In addition to object detection, YOLO-World can also determine the conditions of those objects, thanks to its 'prompt-then-detect' feature. For instance, in agricultural robotics, it can be used to identify ripe fruits versus not ripe fruits by programming the robot to detect them.

Link to this sectionAI in the automobile industry#

The automobile industry involves many moving parts, and YOLO-World can be used for different car applications. For example, when it comes to car maintenance, YOLO-World's ability to recognize a wide variety of objects without manual tagging or extensive pre-training is extremely useful. YOLO-World can be used to identify car parts that need to be replaced. It could even automate tasks like quality checks, spotting defects or missing pieces in new cars.

Another application is zero-shot object detection in self-driving cars. YOLO-World’s zero-shot detection capabilities can improve an autonomous vehicle’s capability to detect and classify objects on the road, such as pedestrians, traffic signs, and other vehicles, in real time. By doing so, it can help detect obstacles and prevent accidents for a safer journey.

An example of detecting objects on a road

An example of detecting objects on a road.

Link to this sectionInventory management for retail stores#

Identifying objects on shelves in retail stores is an important part of tracking inventory, maintaining stocks, and automating processes. Ultralytics YOLO-World's ability to recognize a wide variety of objects without manual tagging or extensive pre-training is extremely useful for inventory management.

For instance, in inventory management, YOLO-World can swiftly spot and categorize items on a shelf, such as different brands of energy drinks. Retail stores can keep accurate inventory, manage stock levels efficiently, and smooth out supply chain operations.

All of the applications are unique and show just how extensively YOLO-World can be used. Next, let’s get hands-on with YOLO-World and take a look at a coding example.

Link to this sectionA code walk through#

As we mentioned before, YOLO-World can be used to detect different parts of a car for maintenance. A computer vision application that detects any repairs needed would involve taking a picture of the car, identifying car parts, examining each part of the car for damage, and recommending repairs. Every part of this system would use different AI techniques and approaches. For the purpose of this code walkthrough, let’s focus on the part when car parts are detected.

With YOLO-World, you can identify different car parts in an image in under 5 minutes. You can extend this code to try out different applications using YOLO-World as well! To get started, we’ll need to pip install the Ultralytics package as shown below.

For more instructions and best practices related to the installation process, check our Ultralytics Installation guide. While installing the required packages for YOLOv8, if you encounter any difficulties, take a look at our Common Issues guide for solutions and tips.

Once you’ve installed the needed package, we can download an image from the Internet to run our inferences on. We are going to use the image below.

Our input image

Our input image.

Then, we’ll import the needed package, initialize our model, and set the classes we are looking for in our input image. Here, we are interested in the following classes: car, wheel, car door, car mirror, and license plate.

We'll then use the predict method, providing the image's path along with parameters for the maximum number of detections, and thresholds for intersection over union (IoU) and confidence (conf) to run an inference on the image. Lastly, the detected objects are saved to a file named 'result.jpg.'

The following output image will be saved to your files.

Our output image

Our output image.

If you’d prefer to see what YOLO-World can do without coding, you can go to the YOLO-World Demo page, upload an input image, and enter the custom classes.

Read our docs page on YOLO-World to learn how to save the model with the custom classes so that it can be used directly later without entering custom classes repeatedly.

Link to this sectionDid you notice the car doors weren’t detected?#

If you take a look at the output image again, you’ll notice the custom class “car door” wasn’t detected. Despite its great achievements, YOLO-World has certain limitations. To combat these limitations and use the YOLO-World model effectively, it’s important to use the correct types of textual prompts.

Here’s some insight into it:

  • YOLO-World may not need high confidence levels for accurate predictions, so reducing confidence thresholds can improve detection rates.
  • Add classes you aren’t interested in. It’ll help improve primary object detection by reducing false positives for secondary objects.
  • Detecting larger objects first before focusing on smaller details can improve detection accuracy.
  • Mention colors in your classes to detect objects based on color cues.
  • Describing object sizes in prompts can also help YOLO-World identify specific objects more accurately.
  • Post-processing methods, such as filtering predictions by size or adjusting confidence levels per class, can further improve object detection results.

Link to this sectionThe limits are endless#

Overall, YOLO-World models can be made into a powerful tool with their advanced object detection capabilities. It provides great efficiency, accuracy, and helps automate different tasks across various applications, like the example of identifying car parts that we practically discussed.

Feel free to explore our GitHub repository to learn more about our contributions to computer vision and AI. If you're curious about how AI is reshaping sectors like healthcare technology, check out our solutions pages. The possibilities with innovations like YOLO-World seem to be endless!

Explore solutions

Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more
Real-time defect detection with Ultralytics YOLO

Defect Detection

YOLO-based vision AI detects defects in steel, PCBs, fabric, solar panels, and welds, with peer-reviewed accuracy up to 99.4% and up to 94.5% lower inspection cost.
Learn more
Real-time AI that works with your team

AI in Robotics

Power smarter machines with Ultralytics YOLO models. Vision AI in robotics drives autonomous navigation, perception, object tracking, and real-time control.
Learn more
Real-time AI that works with your team

AI in Logistics

Streamline logistics with Ultralytics YOLO models. Vision AI enables package inspection, sorting, vehicle tracking, and real-time warehouse safety monitoring.
Learn more
Real-time AI that works with your team

AI in Retail

Reimagine retail with Ultralytics YOLO models. Vision AI powers inventory tracking, shelf monitoring, queue management, and smarter customer insights.
Learn more
Real-time AI that works with your team

AI in Healthcare

Build healthcare solutions with Ultralytics YOLO models. Vision AI in healthcare powers faster medical imaging, smarter diagnostics, and patient monitoring.
Learn more
Real-time AI that works with your team

AI in Manufacturing

Optimize manufacturing with Ultralytics YOLO models. Vision AI drives quality control, defect detection, PPE compliance, and assembly line automation.
Learn more
Real-time AI that works with your operation

AI in Automotive

Apply computer vision in automotive with Ultralytics YOLO models. Vision AI elevates road safety, driver assistance, and vehicle automation for smarter roads.
Learn more
Real-time AI tailored to your operation

AI in Agriculture

Bring vision AI to smart agriculture with Ultralytics YOLO models. Power crop monitoring, livestock tracking, and precision farming for higher, smarter yields.
Learn more

Let's build the future of AI together!

Begin your journey with the future of machine learning