深圳Yolo 视觉
深圳
立即加入
词汇表

可扩展性

Explore how scalability empowers AI systems to handle growth. Learn to optimize MLOps with [Ultralytics YOLO26](https://docs.ultralytics.com/models/yolo26/) and the [Ultralytics Platform](https://platform.ultralytics.com) for high-performance, distributed training and seamless deployment.

Scalability refers to the capability of a system, network, or process to handle a growing amount of work by adding resources. In the context of Artificial Intelligence (AI) and Machine Learning (ML), scalability describes a model's or infrastructure's ability to maintain performance levels as demand increases. This demand typically manifests as larger datasets during training, higher user traffic during inference, or increased complexity in computational tasks. A scalable architecture allows for seamless expansion—whether deploying a computer vision model to a single embedded device or serving millions of API requests via cloud clusters—ensuring that inference latency remains low even under heavy load.

人工智能中可扩展性的重要性

Designing for scalability is a critical component of successful Machine Learning Operations (MLOps). A model that functions perfectly in a controlled research environment may fail when exposed to the high-velocity data streams found in production. Effectively managing Big Data requires systems that can scale horizontally (adding more machines to a cluster) or vertically (adding more power, such as RAM or GPUs, to existing machines).

可扩展人工智能系统的关键优势包括:

  • 可靠性:可扩展系统确保在意外流量激增期间保持稳定的服务运行时间, 防止关键应用程序崩溃。
  • 成本效益:动态扩展功能可在低使用率期间缩减资源规模,该特性通常由AWSGoogle 云计算平台管理。
  • 面向未来:可扩展的基础设施能够适应更新、更复杂的算法(如视觉变换器(ViT)),而无需彻底改造硬件生态系统。

实现可扩展性的策略

创建可扩展的人工智能解决方案需要优化模型架构和部署基础设施。

  • Distributed Training: When training datasets become too large for a single processor, distributed training splits the workload across multiple Graphics Processing Units (GPUs). Frameworks like PyTorch Distributed allow developers to parallelize computations, significantly reducing the time required to train foundation models. Tools like the Ultralytics Platform simplify this process by managing cloud training resources automatically.
  • 高效模型架构:选择正确的模型架构对吞吐量至关重要。 Ultralytics 经过精心设计,比其前代产品更小巧、更快速, 使其能够原生扩展至各种硬件平台,从边缘AI设备到大型服务器集群。
  • Containerization and Orchestration: Packaging applications with Docker ensures they run consistently across different environments. For managing large clusters of containers, Kubernetes automates the deployment, scaling, and management of containerized applications.
  • 模型优化:诸如 模型量化和 剪枝等技术可降低模型的内存占用和计算 成本。诸如 NVIDIA TensorRT 可进一步 加速推理速度,在现有硬件上实现更高吞吐量。

代码示例:可扩展批量推理

在推理过程中提升可扩展性的有效方法之一是批量处理输入数据而非顺序处理。 这能最大限度GPU 并提升整体吞吐量。

from ultralytics import YOLO

# Load a scalable YOLO26 model (smaller 'n' version for speed)
model = YOLO("yolo26n.pt")

# Define a batch of images (URLs or local paths)
# Processing multiple images at once leverages parallel computation
batch_images = ["https://ultralytics.com/images/bus.jpg", "https://ultralytics.com/images/zidane.jpg"]

# Run inference on the batch
results = model(batch_images)

# Print the number of detections for the first image
print(f"Detected {len(results[0].boxes)} objects in the first image.")

实际应用

可扩展性使人工智能技术能够从理论研究过渡到全球工业工具。

  • 智能制造:制造业人工智能领域,自动化检测系统必须每小时分析高速装配线上数千个组件。可扩展的物体检测系统确保随着生产速度提升,质量控制流程能保持高精度且不会成为瓶颈。
  • 零售推荐引擎:主流电商平台通过推荐系统即时为数百万用户提供个性化商品建议。可扩展的基础设施使这些平台能够应对黑色星期五等流量激增100倍的重大事件——通过Microsoft 云服务商动态调配额外服务器节点来实现。

可扩展性与相关概念

尽管常被混为一谈,可扩展性与性能和效率是截然不同的概念。

  • 可扩展性与性能: 性能通常指系统在特定时刻的运行速度 或准确度(例如每秒帧数)。可扩展性则描述系统 在工作负载增加时维持该性能的能力。
  • 可扩展性与效率:效率衡量完成特定任务所消耗的资源(例如每次推理的能耗)。系统可能高效但不可扩展(若无法处理并行任务),或可扩展但低效(若为应对增长消耗过多资源)。
  • 可扩展性与灵活性:灵活性使系统能够处理不同类型的任务,例如 YOLO11 同时处理检测、分割和姿势估计 。可扩展性则专注于处理更多相同类型的任务。

加入Ultralytics 社区

加入人工智能的未来。与全球创新者联系、协作和共同成长

立即加入