Glossary

Kubernetes

Discover how Kubernetes streamlines AI/ML workloads with scalable model deployment, distributed training, and efficient resource management.

Train YOLO models simply
with Ultralytics HUB

Learn more

Kubernetes, often shortened to K8s, is an open-source platform designed to automate deploying, scaling, and operating application containers. Originally designed by Google, Kubernetes is now maintained by the Cloud Native Computing Foundation. In essence, it acts as an orchestrator for containerized applications, managing them across a cluster of computers so that they run efficiently and reliably. For users familiar with machine learning, think of Kubernetes as the conductor of an orchestra, ensuring all the different instruments (your AI/ML application components) play together harmoniously and at scale.

What is Kubernetes?

At its core, Kubernetes is a system for managing containerized applications. Containers package up software code and its dependencies so applications can run uniformly and consistently across different computing environments. Docker is a popular containerization technology often used with Kubernetes. Kubernetes automates many of the manual processes involved in deploying, managing, and scaling these containerized applications. It groups containers that make up an application into logical units for easy management and discovery. These units, called pods, are deployed across a cluster of machines. Kubernetes then handles tasks such as:

  • Service discovery and load balancing: Kubernetes can expose a container using the DNS name or using their own IP address. If traffic to a container is high, Kubernetes can load balance and distribute the network traffic so deployment is stable.
  • Storage orchestration: Kubernetes allows you to automatically mount the storage system of your choice, such as local storage, public cloud providers, and more.
  • Automated rollouts and rollbacks: You can describe the desired state for your deployed containers using Kubernetes, and it changes the actual state to the desired state at a controlled rate. For example, Kubernetes can automate the creation of new containers for your deployment, remove existing containers and adopt all their resources to the new container.
  • Automatic bin packing: Kubernetes allows you to specify how much CPU and RAM (memory) each container needs. Kubernetes can fit containers onto your nodes to make the best use of your resources.
  • Self-healing: Kubernetes restarts containers that fail, replaces and reschedules containers when nodes die, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
  • Secret and configuration management: Kubernetes lets you store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys. You can deploy and update secrets and application configuration without rebuilding your container images, and without exposing secrets in your stack configuration.

Why is Kubernetes Relevant to AI and ML?

Kubernetes is particularly relevant in the field of AI and machine learning due to the resource-intensive and scalable nature of ML workloads. Training large models, especially Ultralytics YOLO models for object detection, often requires distributed computing across multiple GPUs or TPUs. Kubernetes provides the infrastructure to manage these distributed resources efficiently.

Furthermore, deploying AI/ML models for inference at scale requires robust and scalable infrastructure. Kubernetes simplifies model deployment by allowing users to containerize their models and serve them through scalable APIs. This is crucial for real-world applications requiring low inference latency and high throughput.

Applications of Kubernetes in AI/ML

  1. Scalable Model Serving: Consider a real-time object detection application, such as a smart city traffic management system using Ultralytics YOLOv8. As the city grows, the demand for processing video feeds from more cameras increases. Kubernetes allows you to scale the model serving infrastructure dynamically. By deploying your YOLOv8 model as a containerized service on Kubernetes, you can easily increase or decrease the number of model instances based on the incoming traffic, ensuring consistent performance even under heavy load. This scalability is essential for maintaining low latency and high availability in real-time AI applications.

  2. Distributed Training: Training state-of-the-art AI models often requires massive datasets and significant computational power. Distributed training across a cluster of machines becomes necessary to reduce training time. Kubernetes can orchestrate distributed training jobs by managing the distribution of the workload across multiple nodes, monitoring progress, and handling failures. For example, you could use Kubernetes to manage a distributed training job for a large image classification model using a dataset like ImageNet. Kubernetes ensures that each training node is properly configured, data is efficiently distributed, and the overall training process is resilient to node failures.

In summary, Kubernetes is a powerful tool for managing the complexities of AI and ML workloads, providing scalability, resilience, and efficiency for both training and deployment phases. Its ability to orchestrate containerized applications makes it an ideal platform for building and running modern, scalable AI systems.

Read all