Glossary

Kubernetes

Discover how Kubernetes streamlines AI/ML workloads with scalable model deployment, distributed training, and efficient resource management.

Kubernetes, often abbreviated as K8s, is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. Originally developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF), Kubernetes provides a robust framework for running resilient, distributed systems. In the context of Artificial Intelligence (AI) and Machine Learning (ML), it has become an essential tool for managing the entire lifecycle of ML models, from training to deployment in production environments.

How Kubernetes Works

Kubernetes operates on a cluster of machines, which can be physical servers or virtual machines, on-premises or in the cloud. The main components include:

Cluster: A set of nodes (worker machines) that run containerized applications.
Node: A worker machine in a Kubernetes cluster. Each node runs a Kubelet, which is an agent for managing the node and communicating with the control plane.
Pod: The smallest and simplest unit in the Kubernetes object model. A Pod represents a single instance of a running process in a cluster and can contain one or more containers, such as Docker containers.
Deployment: Manages a set of replica Pods, ensuring that a specified number of them are running at all times. It handles updates and rollbacks automatically.

By abstracting the underlying hardware, Kubernetes allows developers and MLOps engineers to define their application's desired state, and it works to maintain that state, handling failures and scaling needs automatically. You can learn more from the official Kubernetes documentation.

Kubernetes in AI and Machine Learning

Kubernetes is particularly powerful for Machine Learning Operations (MLOps) because it addresses many challenges associated with building and deploying AI systems at scale. Its ability to manage resources efficiently makes it ideal for resource-intensive tasks like model training. Kubernetes can scale training jobs across multiple GPUs and nodes, significantly reducing training time.

For inference, Kubernetes ensures high availability and scalability. Here are a couple of real-world examples:

Scalable Object Detection Service: A company deploys an Ultralytics YOLO11 model for real-time object detection as a web service. The model is packaged into a container. Using Kubernetes, they can automatically scale the number of inference pods up or down based on incoming traffic. If a node fails, Kubernetes automatically reschedules the pods onto healthy nodes, ensuring the service remains available without manual intervention. This is a common pattern for deploying models in smart surveillance systems.
Complex NLP Pipeline as Microservices: A team builds a Natural Language Processing (NLP) application that involves multiple steps: text preprocessing, sentiment analysis, and named entity recognition. Each component is a separate microservice, containerized independently. Kubernetes orchestrates these services, managing their networking and allowing each part to be updated and scaled independently. This architecture provides flexibility and resilience for complex AI-driven applications.

Kubernetes vs. Related Technologies

Kubernetes vs. Docker: Docker is a tool for building and running individual containers. Kubernetes is an orchestrator for containers, managing thousands of them across many machines. They are not competitors but collaborators; you build container images with Docker and then manage them with Kubernetes. You can get started with the basics by following the Docker Quickstart guide.
Kubernetes vs. Serverless Computing: Serverless platforms like AWS Lambda abstract away all server management. In contrast, Kubernetes offers more control over the infrastructure, making it better for long-running or stateful applications. While serverless is simpler for event-driven functions, serverless frameworks can run on Kubernetes using tools like Knative.

Tools and Ecosystem

The Kubernetes ecosystem is vast and includes many tools to extend its functionality:

Helm: Often called the package manager for Kubernetes, Helm helps you manage Kubernetes applications.
Prometheus & Grafana: A popular combination for monitoring Kubernetes clusters and applications.
Cloud Provider Integrations: Major cloud providers offer managed Kubernetes services, such as Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), and Azure Kubernetes Service (AKS), which simplify cluster setup and maintenance.
ML Platforms: Tools like Kubeflow are built on Kubernetes to provide ML-specific workflows for pipelines, training, and deployment. Platforms such as Ultralytics HUB streamline the MLOps pipeline, often abstracting away Kubernetes complexities for easier model deployment.

Kubernetes

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

How Kubernetes Works

Kubernetes in AI and Machine Learning

Kubernetes vs. Related Technologies

Tools and Ecosystem

Read more in this category

Deploy Ultralytics YOLO models using the ExecuTorch integration

Key highlights from Ultralytics at PyTorch Conference 2025

Using self-supervised learning to denoise images

Join the Ultralytics community