Yolo 비전 선전
선전
지금 참여하기
용어집

모델 서빙

모델 서빙의 필수 사항을 배우십시오. 실시간 예측, 확장성 및 애플리케이션에 대한 원활한 통합을 위해 AI 모델을 배포하십시오.

Model serving is the process of hosting a trained machine learning model and making its functionality available to software applications via a network interface. It acts as the bridge between a static model file saved on a disk and a live system that processes real-world data. Once a model has completed the machine learning (ML) training phase, it must be integrated into a production environment where it can receive inputs—such as images, text, or tabular data—and return predictions. This is typically achieved by wrapping the model in an Application Programming Interface (API), allowing it to communicate with web servers, mobile apps, or IoT devices.

The Role of Model Serving in AI

The primary goal of model serving is to operationalize predictive modeling capabilities effectively. While training focuses on accuracy and loss minimization, serving focuses on performance metrics like latency (how fast a prediction is returned) and throughput (how many requests can be handled per second). Robust serving infrastructure ensures that computer vision (CV) systems remain reliable under heavy loads. It often involves technologies like containerization using tools such as Docker, which packages the model with its dependencies to ensure consistent behavior across different computing environments.

실제 애플리케이션

모델 서비스는 데이터를 기반으로 즉각적인 의사 결정을 내릴 수 있도록 지원하여 다양한 산업에서 유비쿼터스 AI 기능을 강화합니다. 데이터.

  • 스마트 제조: 산업 환경에서 제조 시스템의 인공지능은 서비스형 모델을 활용해 조립 라인을 검사합니다. 부품의 고해상도 이미지가 로컬 서버로 전송되면 YOLO26 모델이 흠집이나 정렬 불량 같은 결함을 감지하여 불량품 제거를 위한 즉각적인 경보를 발령합니다.
  • 소매 자동화: 소매업체들은 고객 경험을 향상시키기 위해 소매업에 인공지능(AI)을 활용합니다. 물체 감지 모델이 적용된 카메라가 계산대에서 상품을 식별하여 수동 바코드 스캔 없이도 총 금액을 자동으로 계산합니다.

실제 구현

To serve a model effectively, it is often beneficial to export models to a standardized format like ONNX, which promotes interoperability between different training frameworks and serving engines. The following example demonstrates how to load a model and run inference, simulating the logic that would exist inside a serving endpoint using Python.

from ultralytics import YOLO

# Load the YOLO26 model (this typically happens once when the server starts)
model = YOLO("yolo26n.pt")

# Simulate an incoming API request with an image source URL
image_source = "https://ultralytics.com/images/bus.jpg"

# Run inference to generate predictions for the user
results = model.predict(source=image_source)

# Process results (e.g., simulating a JSON response to a client)
print(f"Detected {len(results[0].boxes)} objects in the image.")

올바른 전략 선택

The choice of serving strategy depends heavily on the specific use case. Online Serving provides immediate responses via protocols like REST or gRPC, which is essential for user-facing web applications. Conversely, Batch Serving processes large volumes of data offline, suitable for tasks like nightly report generation. For applications requiring privacy or low latency without internet dependence, Edge AI moves the serving process directly to the device, utilizing optimized formats like TensorRT to maximize performance on constrained hardware. Many organizations leverage the Ultralytics Platform to simplify the deployment of these models to various endpoints, including cloud APIs and edge devices.

관련 용어와의 차이점

While closely related, "Model Serving" is distinct from Model Deployment and Inference.

  • Model Deployment: This refers to the broader lifecycle stage of releasing a model into a production environment. Serving is the specific mechanism or software (like NVIDIA Triton Inference Server or TorchServe) used to execute the deployed model.
  • Inference: This is the mathematical act of calculating a prediction from an input. Model serving provides the infrastructure (networking, scalability, and security) that allows inference to happen reliably for end-users.
  • Microservices: Serving is often architected as a set of microservices, where the model runs as an independent service that other parts of an application can query, often exchanging data in lightweight formats like JSON.

Ultralytics 커뮤니티 가입

AI의 미래에 동참하세요. 글로벌 혁신가들과 연결하고, 협력하고, 성장하세요.

지금 참여하기