Releases · volcano-sh/volcano

01 Jun 03:33

v1.15.0

8fc394c

v1.15.0 Latest

Latest

Summary

Volcano v1.15.0 further strengthens Volcano as a unified scheduling platform for converged general-purpose and AI computing at scale. As clusters increasingly run batch training, inference, AI Agent, HPC, big-data, and heterogeneous accelerator workloads together, the scheduler must make higher-quality decisions under resource contention while preserving workload-level semantics, queue fairness, topology locality, and operational stability.

The most important addition is Gang-Aware Preemption and Resource Reclamation. It makes eviction decisions gang-aware on both sides: the incoming gang is placed as a unit, and victim jobs are ordered and selected at job/gang granularity to avoid arbitrary partial disruption. v1.15.0 also introduces DRA queue quota in the capacity plugin, pluggable multi-sharding policies, benchmark and performance observability tooling, Kubernetes 1.35 support, NodeGroup preference ordering, Agent Scheduler stability improvements, incremental GPU/vGPU improvements, and opt-in scheduling gates for queue admission control.

What's New

Key Features Overview

Gang-Aware Preemption and Resource Reclamation (Alpha): Makes eviction decisions gang-aware on both sides. Volcano evaluates the preemptor as a whole gang and selects victims at job/gang granularity, reducing random partial disruption across training jobs and supporting HyperNode-scoped victim search when topology is enabled.
DRA Queue Quota in Capacity Plugin: Adds queue quota accounting for Pods that request DRA resources, bringing ResourceClaim usage into the existing capability, deserved, and guarantee model.
Pluggable Multi-Sharding Policy Support (Alpha): Moves sharding configuration from fixed shard parameters to a pluggable policy framework with composable filter, score, and select phases, built-in allocation-rate, warmup, and node-limit policies, and ConfigMap live reload.
Volcano Benchmark and Performance Observability: Provides a benchmark framework for one-click environment setup and performance report collection on Kind/KWOK or existing clusters, helping users establish baseline data and identify scheduler bottlenecks.
Scheduling Gates for Queue Admission (Alpha): Uses opt-in scheduling gates to hold pods blocked by queue capacity, so Cluster Autoscaler and Karpenter do not scale up for quota-only blockers.

Key Feature Details

Gang-Aware Preemption and Resource Reclamation (Alpha)

Background and Motivation:

Volcano's legacy preempt and reclaim actions are task-centric. For gang-style jobs, evicting individual tasks from many different victim jobs can create wide disruption without guaranteeing that the pending gang can be scheduled afterward. Some scheduling systems only make the preemptor gang-aware: they try to place the incoming gang as a unit, but still choose victims task by task. That can protect the incoming job while randomly breaking multiple victim gangs.

Volcano v1.15.0 makes both sides of the eviction decision gang-aware. Victim jobs are ordered and selected at job/gang granularity, so the scheduler can reason about the disruption cost of breaking a victim gang instead of treating every task as an interchangeable victim. This distinction is important even when HyperNode topology is not used, because the scheduler still avoids spreading arbitrary partial evictions across unrelated jobs.

This matters especially for training-style workloads. A task-by-task victim loop can evict one replica from many different jobs. If each job depends on gang semantics, one scheduling cycle may break every victim job while still failing to place the incoming gang. Volcano now groups candidate victim tasks by job and evaluates victim bundles. When bundle splitting is available, the scheduler treats resources above the gang requirement, such as replicas - minAvailable, as lower-cost safe bundles before considering whole-job disruption. Bundle ordering then follows the action policy first. gangPreempt is priority-driven, and gangReclaim is fairness-driven. Efficiency based on local gain inside the selected HyperNode versus global disruption is only used after those policy constraints are satisfied.

When HyperNode topology is configured, the new actions additionally scope victim search to HyperNode candidates. Volcano evaluates preemption and reclaim inside a selected topology scope rather than freely preempting across topology domains. The scheduler then runs plugin filtering and placement simulation before committing evictions and nominations, so eviction and placement are evaluated as one scheduling decision.

Alpha Feature Notice: Gang-aware preemption and reclamation is alpha and must be enabled explicitly through gangPreempt and gangReclaim. Future releases may merge these dedicated actions with the legacy preempt and reclaim actions after the rollout is validated.

Key Capabilities:

Preemptor-gang placement: Evaluates whether the incoming gang can be placed as a whole before eviction is selected.
Victim-gang awareness: Groups victim candidates by job/gang, prioritizes lower-cost victim bundles such as replicas above minAvailable, and avoids spreading partial disruption across many jobs.
Topology-scoped eviction: When HyperNode topology is enabled, searches victims inside the selected topology scope instead of freely preempting across topology domains.
Policy-aware victim ordering: Uses priority for gangPreempt and queue fairness for gangReclaim, with efficiency used as a secondary ordering signal.

Configuration:

actions: "enqueue, allocate, backfill, gangPreempt, gangReclaim"
tiers:
- plugins:
  - name: priority
  - name: gang
  - name: drf
  - name: predicates
  - name: nodeorder
  - name: binpack

Note: Do not configure gangPreempt/gangReclaim together with the legacy preempt/reclaim actions in the same scheduler action list.

PRs: #5250, #4780, #5170
Design Docs: Gang-Aware Eviction Design, EvictableFn Evolution for Gang Eviction
Contributors: @vzhou-p

DRA Queue Quota in Capacity Plugin

Background and Motivation:

Previous Volcano releases already supported scheduling Pods that request Kubernetes Dynamic Resource Allocation resources. The missing part was queue quota: DRA ResourceClaim requests were not accounted against capability, deserved, or guarantee, so queues could not control DRA resource usage the same way they control CPU, memory, and extended resources.

Kubernetes DRA introduces DeviceClass, ResourceClaim, ResourceClaimTemplate, and ResourceSlice, while Volcano queues already manage quota through capability, deserved, and guarantee. v1.15.0 brings DRA resources into that queue quota model instead of requiring a separate DRA-only quota API.

The capacity plugin now accounts DRA resource requests for queue enqueue and allocation decisions. Operators can limit whole devices or consumable device dimensions such as virtual GPU cores and memory. Shared ResourceClaims are deduplicated so multiple pods referencing the same logical claim do not inflate queue usage.

Compatibility Note: DRA quota requires Kubernetes DRA support and a DRA-capable driver. Some DRA allocation modes remain outside the first quota-accounting scope.

Key Capabilities:

Whole-device quota: Controls DRA DeviceClass device counts at queue level.
Consumable-capacity quota: Controls device dimensions such as cores or memory through queue quota.
Existing queue semantics: Applies the same capability, deserved, and guarantee model used by other queue resources.
ResourceClaim-aware accounting: Accounts direct claims, template-created claims, and shared claims without inflating queue usage.

Configuration:

kind: ConfigMap
apiVersion: v1
metadata:
  name: volcano-scheduler-configmap
  namespace: volcano-system
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill, reclaim"
    tiers:
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
    - plugins:
      - name: drf
      - name: predicates
      - name: capacity
        arguments:
          capacity.DynamicResourceAllocationEnable: true
          capacity.DRAConsumableCapacityEnable: true
      - name: nodeorder

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: ml-team
spec:
  reclaimable: true
  capability:
    cpu: "100"
    memory: "200Gi"
    "nvidia.com/gpu": "4"
    "deviceclass/gpu.nvidia.com": "8"
    "cores.deviceclass/hami-core-gpu.project-hami.io": "800"
    "memory.deviceclass/hami-core-gpu.project-hami.io": "320Gi"

PRs: #5058
Design Doc: DeviceClass Quota Support in Capacity Plugin
User Guide: DeviceClass Quota User Guide
Contributors: @xu-wentao

Pluggable Multi-Sharding Policy Support (Alpha)

Background and Motivation:

The v1.14.0 Sharding Controller introduced dynamic node scheduling shards for multi-scheduler deployments. v1.15.0 builds on that architecture by adding pluggable multi-sharding policy support. In...

Contributors

goyalankit, praveen0raj, and 42 other contributors

Assets 2

09 May 01:42

JesseStutler

v1.14.2

92d6e4c

v1.14.2

Important:
This release addresses a security vulnerability and multiple bug fixes. We strongly advise all users to upgrade immediately to protect your systems and data.

Security Fixes

CVE-2026-44247: Webhook Server OOM via unbounded HTTP request body size

A security vulnerability has been discovered in the Volcano webhook server that could allow a pod with network access to the webhook endpoint to cause a denial of service by sending an arbitrarily large HTTP request body, leading to the webhook server being killed by OOM.

Affected Versions:

volcano <= v1.14.1
volcano <= v1.13.2
volcano <= v1.12.3

Fixed Versions:

volcano v1.14.2
volcano v1.13.3
volcano v1.12.4

This vulnerability was reported by @bugbunny-research and mitigated by @JesseStutler.

CVSS Rating: Moderate (6.8) CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H

Bug Fixes

fix: remove duplicated session close call (#5056 @qi-min)
Update KUBE_VERSION to 1.34.1 in webhook-manager Dockerfile (#5063 @hajnalmt)
Update root queue capability and enhance queue validation logic (#5080 @guoqinwill)
Fix shared mutable objects in scheduler snapshot clones (#5092 @zhifei92)
fix: panic and restart of volcano scheduler pods on install (#5144 @Tau721)
Fix Agent Scheduler multi worker optimistic parallel scheduling concurrently conflict error (#5162 @JesseStutler)
Fix inaccurate E2E duration metric in agent-scheduler (#5165 @Copilot)
fix: process panic caused by concurrent map writes (#5182 @zhifei92)
wait event handler completed before start scheduling (#5183 @qi-min)
Rollback unnecessary deepcopy in snapshot (#5185 @zhifei92)
fix event handlers cache (#5188 @qi-min)
fix: highestTierName in partitionPolicy or subGroupPolicy fails to restrict scheduling to specified HyperNode tiers (#5203 @Tau721)
fix(capacity): avoid false exceeds on missing parent scalar keys (#5218 @hajnalmt)
reduce node priority if nodes wait to be checked in binder (#5260 @qi-min)
fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption (#5263 @hajnalmt)
enhancement(scheduler): honor QueueOrderFn in preempt action (#5268 @hajnalmt)
Fix: Stabilize predicates plugin execution order and rollback semantics (#5286 @wangyang0616)

Full Changelog: v1.14.1...v1.14.2

Contributors

qi-min, hajnalmt, and 6 other contributors

Assets 2

09 May 01:47

JesseStutler

v1.13.3

5fe5adf

v1.13.3

Important:
This release addresses a security vulnerability and multiple bug fixes. We strongly advise all users to upgrade immediately to protect your systems and data.

Security Fixes

CVE-2026-44247: Webhook Server OOM via unbounded HTTP request body size

Affected Versions:

volcano <= v1.14.1
volcano <= v1.13.2
volcano <= v1.12.3

Fixed Versions:

volcano v1.14.2
volcano v1.13.3
volcano v1.12.4

This vulnerability was reported by @bugbunny-research and mitigated by @JesseStutler.

CVSS Rating: Moderate (6.8) CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H

Bug Fixes

Rollback unnecessary deepcopy in snapshot (#5186 @zhifei92)
wait event handler completed before start scheduling (#5200 @qi-min)
fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption (#5265 @hajnalmt)
enhancement(scheduler): honor QueueOrderFn in preempt action (#5269 @hajnalmt)

Full Changelog: v1.13.2...v1.13.3

Contributors

qi-min, hajnalmt, and 3 other contributors

Assets 2

09 May 01:49

JesseStutler

v1.12.4

28043a4

v1.12.4

Important:
This release addresses a security vulnerability and multiple bug fixes. We strongly advise all users to upgrade immediately to protect your systems and data.

Security Fixes

CVE-2026-44247: Webhook Server OOM via unbounded HTTP request body size

Affected Versions:

volcano <= v1.14.1
volcano <= v1.13.2
volcano <= v1.12.3

Fixed Versions:

volcano v1.14.2
volcano v1.13.3
volcano v1.12.4

This vulnerability was reported by @bugbunny-research and mitigated by @JesseStutler.

CVSS Rating: Moderate (6.8) CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H

Bug Fixes

wait event handler completed before start scheduling (#5201 @qi-min)
fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption and honor QueueOrderFn (#5270 @hajnalmt)

Full Changelog: v1.12.3...v1.12.4

Contributors

qi-min, hajnalmt, and 2 other contributors

Assets 2

30 Mar 13:16

JesseStutler

v1.13.2

8f7a52a

v1.13.2

What's Changed

Bug fixes

cherrypick 4829 to release 1.13: keep terminating pod in job by @kingeasternsun in #4860
[release-1.13] fix potential panic on numa resources info updating in snapshot by @qi-min in #4897
[release-1.13] Fix gpu resource error by @sailorvii in #4916
[release-1.13] Update metrics_client_prometheus.go by @nitindhiman314e in #4931
[release-1.13] Fix shared mutable objects in scheduler snapshot clones by @zhifei92 in #5093

Full Changelog: v1.13.1...v1.13.2

Contributors

kingeasternsun, qi-min, and 3 other contributors

Assets 2

14 Feb 01:57

JesseStutler

v1.14.1

49b8f40

v1.14.1

What's Changed

Bug fixes

[release-1.14] Fixed issue where jobs with subgroups but not hard networkTopology.mode could not be scheduled. by @JesseStutler in #5041
[release-1.14] fix: The AllocatedHyperNode recovery for SubJobs during scheduler restart may not be the lowest tier. by @ouyangshengjia in #5012

Full Changelog: v1.14.0...v1.14.1

Contributors

JesseStutler and ouyangshengjia

Assets 2

31 Jan 06:34

JesseStutler

v1.14.0

6f86a47

v1.14.0

Summary

Volcano v1.14.0 establishes Volcano as a unified scheduling platform for diverse workloads at scale. This release introduces a scalable multi-scheduler architecture with dynamic node scheduling shard, enabling multiple schedulers to coordinate efficiently across large clusters. A new Agent Scheduler provides fast scheduling for latency-sensitive AI Agent workloads while working seamlessly with the Volcano batch scheduler. Network topology aware scheduling gains significant enhancements including HyperNode-level binpacking, SubGroup policies, and multi-level gang scheduling across Job and SubGroup scopes. Volcano Global integration advances with HyperJob for multi-cluster training and data-aware scheduling. Colocation now support generic operating systems with CPU Throttling, Memory QoS, and Cgroup V2. Additionally, integrated Ascend vNPU scheduling enables efficient sharing of Ascend AI accelerators.

What's New

Key Features Overview

Scalable Multi-Scheduler with Dynamic Node Scheduling Shard (Alpha): Dynamically compute candidate node pools for schedulers with extensible strategies
Fast Scheduling for AI Agent Workloads (Alpha): A new Agent Scheduler for latency-sensitive AI Agent workloads is introduced, working in coordination with Volcano batch scheduler to establish a unified scheduling platform
Network Topology Aware Scheduling Enhancements: Support hyperNode-level binpacking, SubGroup level network topology aware scheduling, and multi-level gang scheduling across Job and SubGroup scopes for distributed workloads
Volcano Global Enhancements: HyperJob for multi-cluster training and data-aware scheduling for federated environments
Colocation for Generic OS: CPU Throttling, Memory QoS, CPU Burst with Cgroup V2 support on Ubuntu, CentOS, and other generic operating systems
Ascend vNPU Scheduling: Integrated support for Ascend 310P/910 series vNPU scheduling with MindCluster and HAMi modes

Key Feature Details

Scalable Multi-Scheduler with Dynamic Node Scheduling Shard (Alpha)

Background and Motivation:

As Volcano evolves to support diverse scheduling workloads at massive scale, the single scheduler architecture faces significant challenges. Different workload types (batch training, AI agents, microservices) have distinct scheduling requirements and resource utilization patterns. A single scheduler becomes a bottleneck, and static resource allocation leads to inefficient cluster utilization.

The Sharding Controller introduces a scalable multi-scheduler architecture that dynamically computes candidate node pools for each scheduler. Unlike strict partitioning, the Sharding Controller calculates dynamic candidate node pools rather than enforcing hard isolation between schedulers. This flexible approach enables Volcano to serve as a unified scheduling platform for diverse workloads while maintaining high throughput and low latency.

Alpha Feature Notice: This feature is currently in alpha stage. The NodeShard CRD (Node Scheduling Shard) API structure and the underlying scheduling shard concepts are actively evolving.

Key Capabilities:

Dynamic Node Scheduling Shard Strategies: Compute dynamic candidate node pools based on various policies. Currently supports scheduling shard by CPU utilization, with an extensible design to support more policies in the future.
NodeShard CRD: Manages dynamic candidate node pools for specific schedulers.
Large-scale Cluster Support: Architecture designed to support large-scale clusters by distributing load across multiple schedulers
Scheduler Coordination: Enable seamless coordination among various scheduler combinations (e.g., multiple Batch Schedulers, or a mix of Agent and Batch Schedulers), establishing Volcano as a unified scheduling platform

Configuration:

# Sharding Controller startup flags
--scheduler-configs="volcano:volcano:0.0:0.6:false:2:100,agent-scheduler:agent:0.7:1.0:true:2:100"
--shard-sync-period=60s
--enable-node-event-trigger=true

# Config format: name:type:min_util:max_util:prefer_warmup:min_nodes:max_nodes

PR: #4777
Design Doc: Sharding Controller Design
Contributors: @ssfffss, @Haoran, @qi-min

Fast Scheduling for AI Agent Workloads (Alpha)

Background and Motivation:

AI Agent workloads are latency-sensitive with frequent task creation, requiring ultra-fast scheduling with high throughput. The Volcano batch scheduler is optimized for batch workloads and processes pods at fixed intervals, which cannot guarantee low latency for Agent workloads. To establish Volcano as a unified scheduling platform for both batch and latency-sensitive workloads, we introduce a dedicated Agent Scheduler.

The Agent Scheduler works in coordination with the Volcano batch scheduler through the Sharding Controller (which is introduced in "Scalable Multi-Scheduler with Dynamic Node Scheduling Shard" feature). This architecture positions Volcano as a unified scheduling platform capable of handling diverse workload types.

Alpha Feature Notice: This feature is currently in alpha stage and under active development. The Agent Scheduler related APIs, configuration options, and scheduling algorithms may be refined in future releases.

Key Capabilities:

Fast-Path Scheduling: Independent scheduler optimized for latency-sensitive workloads such as AI Agent workloads
Multi-Worker Parallel Scheduling: Multiple workers process pods concurrently from the scheduling queue, increasing throughput
Optimistic Concurrency Control: Conflict-Aware Binder resolves scheduling conflicts before executing real binding
Optimized Scheduling Queue: Enhanced queue mechanism with urgent retry support
Unified Platform Integration: Seamless coordination with Volcano batch scheduler via Sharding Controller

Issue: #4722
PRs: #4804, #4801, #4805
Design Doc: Agent Scheduler Design
Contributors: @qi-min, @JesseStutler, @handan-yxh

Network Topology Aware Scheduling Enhancements

Background and Motivation:

Volcano v1.14.0 brings significant enhancements to network topology aware scheduling, addressing the growing demands of distributed workloads including LLM training, HPC, and other network-intensive applications.

Key Enhancements:

SubGroup Level Topology Awareness: Support fine-grained network topology constraints at the SubGroup/Partition level.
Flexible Network Tier Configuration: Support highestTierName for specifying maximum network tier constraints by name.
Multi-Level Gang Scheduling: Improved gang scheduling to support both Job-level and SubGroup-level consistency.
Volcano Job Partitioning: Enable partitioning of Volcano Jobs for better resource management and fault isolation.
HyperNode-Level Binpacking: Optimization for resource utilization across network topology boundaries.

Configuration Example - Volcano Job:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: llm-training-job
spec:
  # ...other fields
  networkTopology:
    mode: hard
    highestTierAllowed: 2  # Job can cross up to Tier 2 HyperNodes
  tasks:
  - name: trainer
    replicas: 8
    partitionPolicy:
      totalPartitions: 2    # Split into 2 partitions
      partitionSize: 4      # 4 pods per partition
      minPartitions: 2      # Minimum 2 partitions required
      networkTopology:
        mode: hard
        highestTierAllowed: 1  # Each partition must stay within Tier 1
    template:
      spec:
        containers:
        - name: trainer
          image: training-image:v1
          resources:
            requests:
              nvidia.com/gpu: 8

Configuration Example - PodGroup SubGroupPolicy:

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: llm-training-pg
spec:
  minMember: 4
  networkTopology:
    mode: hard
    highestTierAllowed: 2
  subGroupPolicy:
  - name: "trainer"
    subGroupSize: 4
    labelSelector:
      matchLabels:
        volcano.sh/task-spec: trainer
    matchLabelKeys:
    - volcano.sh/partition-id
    networkTopology:
      mode: hard
      highestTierAllowed: 1

Issues: #4188, #4368, #4869
PRs: #4721, #4810, #4795, #4785, #4889
Design Doc: Network Topology Aware Scheduling
Contributors: @ouyangshengjia, @3sunny, @zhaoqi, @wangyang0616, @MondayCha, @Tau721

Colocation for Generic OS

This release brings comprehensive improvements to Volcano's colocation capabilities, with a major milestone: support for generic operating systems (Ubuntu, CentOS, etc.) in addition to OpenEuler. This enables broader adoption of Volcano Agent for resource sharing between online and offline workloads.

New Features in v1.14.0:

CPU Throttling (CPU Suppression)

The CPU usage of online pod...

Contributors

aman-kumar, zhaoqi, and 53 other contributors

Assets 2

18 Jan 08:34

JesseStutler

v1.12.3

afdefdd

v1.12.3

What's Changed

Bug fixes

[Cherry-pick v1.12] add hcclrank job plugin by @wangdongyang1 in #4555
Automated cherry pick of #4347: When some scalar resources are 0 in deserved, hierarychical queues validation can not pass by @wuxiaobao in #4586
Automated cherry pick of #4590: add permissions for managing namespaces in admission rules by @suyiiyii in #4594
[Cherry-pick v1.12] fix mpi job plugin panic when mpi job only has master task by @wangdongyang1 in #4619
[Cherry-pick v1.12]Sync kube-scheduler:Improve CSILimits plugin accuracy by using VolumeAttachments by @guoqinwill in #4627
Automated cherry pick of #4599: fix: report all scalar metrics for each queue by @hajnalmt in #4651
[Cherry-pick 1.12] fix: Initialize realCapability field in newQueueAttr by @dafu-wu in #4695
[cherry-pick 1.12]Scheduling main loop blocked and timeout due to un-released PreBind lock in Volcano by @guoqinwill in #4699
[release-1.12] Cherry-pick #4786 and #4792: fix replicaset KubeGroupNameAnnotation handling and replicaSet podgroup update synchronization by @hajnalmt in #4843
Automated cherry pick of #4829: keep terminating pod in job by @wangdongyang1 in #4861
[release-1.12] fix potential panic on numa resources info updating in snapshot by @qi-min in #4898
[release-1.12] Fix gpu resource error by @ChenW66 in #4915
[release-1.12] Fix: Changes to task members in a PodGroup caused task validity checks to fail during scheduling by @ouyangshengjia in #4920
[release-1.12] Fix scheduler panic when metrics are disabled by @Copilot in #4921
[release-1.12] Update metrics_client_prometheus.go by @nitindhiman314e in #4932

Maintenance

[release-1.12] Add Free Disk Space step to E2E workflows by @Copilot in #4851

Full Changelog: v1.12.2...v1.12.3

Contributors

qi-min, hajnalmt, and 8 other contributors

Assets 2

23 Dec 11:32

JesseStutler

v1.13.1

0c6f3bf

v1.13.1

What's Changed

Bug fixes

Automated cherry pick of #4670: fix: ci err caused bt ray e2e default image by @Wonki4 in #4681
[Cherry-pick 1.13] fix: Initialize realCapability field in newQueueAttr by @dafu-wu in #4694
[cherry-pick 1.13]Scheduling main loop blocked and timeout due to un-released PreBind lock in Volcano by @guoqinwill in #4700
[release-1.13] Fix scheduler panic when metrics are disabled by @Copilot in #4770
Cherry-pick PR #4786 to release-1.13: Fix replicaSet podgroup update synchronization by @jiahuat in #4799
[release-1.13] fix: replicaset KubeGroupNameAnnotation handling by @hajnalmt in #4826
[release-1.13] fix: constant cache warnings by @hajnalmt in #4831
[release-1.13] fix: capacity plugin's preemptivefn logic by @hajnalmt in #4830
[release-1.13] Fix: Changes to task members in a PodGroup caused task validity checks to fail during scheduling by @ouyangshengjia in #4852

Maintenance

[release-1.13] Add Free Disk Space step to E2E workflows by @Copilot in #4763

Full Changelog: v1.13.0...v1.13.1

Contributors

hajnalmt, jiahuat, and 4 other contributors

Assets 2

29 Sep 11:40

Monokaix

v1.13.0

943b8c2

v1.13.0

What's New

Welcome to the v1.13.0 release of Volcano! 🚀 🎉 📣
In this release, we have brought a series of significant enhancements that have been long-awaited by community users:

AI Training and Inference Enhancements
Resource Management and Scheduling Enhancements
- Introduce ResourceStrategyFit Plugin
  - Independent Scoring Strategy by Resource Type
  - Scarce Resource Avoidance (SRA)
- Enhance NodeGroup Functionality
Colocation Enhancements
- Decouple Colocation from OS
- Support Custom OverSubscription Resource Names

Support LeaderWorkerSet for Large Model Inference Scenarios

LeaderWorkerSet (LWS) is an API for deploying a group of Pods on Kubernetes. It is primarily used to address multi-host inference in AI/ML inference workloads, especially scenarios that require sharding large language models (LLMs) and running them across multiple devices on multiple nodes.

Since its open-source release, Volcano has actively integrated with upstream and downstream ecosystems, building a comprehensive community ecosystem for batch computing such as AI and big data. In the v0.7 release of LWS, it natively integrated Volcano's AI scheduling capabilities. When used with the new version of Volcano, LWS automatically creates PodGroups, which are then scheduled and managed by Volcano, thereby implementing advanced capabilities like Gang scheduling for large model inference scenarios.

Looking ahead, Volcano will continue to expand its ecosystem integration capabilities, providing robust scheduling and resource management support for more projects dedicated to enabling distributed inference on Kubernetes.

Usage documentation: LeaderWorkerSet With Gang.

Introduce Cron VolcanoJob

This release introduces support for Cron Volcano Jobs. Users can now periodically create and run Volcano Jobs based on a predefined schedule, similar to native Kubernetes CronJobs, to achieve periodic execution of batch computing tasks like AI and big data. Detailed features are as follows:

Scheduled Execution: Define the execution cycle of jobs using standard Cron expressions (spec.schedule).
Timezone Support: Set the timezone in spec.timeZone to ensure jobs execute at the expected local time.
Concurrency Policy: Control concurrent behavior via spec.concurrencyPolicy:
- AllowConcurrent: Allows concurrent execution of multiple jobs (default).
- ForbidConcurrent: Skips the current scheduled execution if the previous job has not completed.
- ReplaceConcurrent: Terminates the previous job if it is still running and starts a new one.
History Management: Configure the number of successful (successfulJobsHistoryLimit) and failed (failedJobsHistoryLimit) job history records to retain; old jobs are automatically cleaned up.
Missed Schedule Handling: The startingDeadlineSeconds field allows tolerating scheduling delays within a certain timeframe; timeouts are considered missed executions.
Status Tracking: The CronJob status (status) tracks currently active jobs, the last scheduled time, and the last successful completion time for easier monitoring and management.

Related PRs: volcano-sh/apis#192, #4560, @GoingCharlie, @hwdef, @Monokaix

Usage example: Cron Volcano Job Example.

Support Label-based HyperNode Auto Discovery

Volcano officially launched network topology-aware scheduling capability in v1.12 and pioneered the UFM auto-discovery mechanism based on InfiniBand (IB) networks. However, for hardware clusters that do not support IB networks or use other network architectures (such as Ethernet), manually maintaining the network topology remains cumbersome.

To address this issue, the new version introduces a Label-based HyperNode auto-discovery mechanism. This feature provides users with a universal and flexible way to describe network topology, transforming complex topology management tasks into simple node label management.

This mechanism allows users to define the correspondence between topology levels and node labels in the volcano-controller-configmap. The Volcano controller periodically scans all nodes in the cluster and automatically performs the following tasks based on their labels:

Automatic Topology Construction: Automatically builds multi-layer HyperNode topology structures from top to bottom (e.g., rack -> switch -> node) based on a set of labels on the nodes.
Dynamic Maintenance: When node labels change, or nodes are added or removed, the controller automatically updates the members and structure of the HyperNodes, ensuring the topology information remains consistent with the cluster state.
Support for Multiple Topology Types: Allows users to define multiple independent network topologies simultaneously to adapt to different hardware clusters (e.g., GPU clusters, NPU clusters) or different network partitions.

Configuration example:

# volcano-controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: volcano-controller-configmap
  namespace: volcano-system
data:
  volcano-controller.conf: |
    networkTopologyDiscovery:
      - source: label
        enabled: true
        interval: 10m # Discovery interval
        config:
          networkTopologyTypes:
            # Define a topology type named topology-A
            topology-A:
              # Define topology levels, ordered from top to bottom
              - nodeLabel: "volcano.sh/hypercluster" # Top-level HyperNode
              - nodeLabel: "volcano.sh/hypernode"   # Middle-level HyperNode
              - nodeLabel: "kubernetes.io/hostname" # Bottom-level physical node

This feature is enabled by adding the label source to the Volcano controller's ConfigMap. The above configuration defines a three-layer topology structure named topology-A:

Top Level (Tier 2): Defined by the volcano.sh/hypercluster label.
Middle Level (Tier 1): Defined by the volcano.sh/hypernode label.
Bottom Level: Physical nodes, identified by the Kubernetes built-in kubernetes.io/hostname label.

When a node is labeled as follows, it will be automatically recognized and classified into the topology path cluster-s4 -> node-group-s0:

# Labels for node node-0
labels:
  kubernetes.io/hostname: node-0
  volcano.sh/hypernode: node-group-s0
  volcano.sh/hypercluster: cluster-s4

The label-based network topology auto-discovery feature offers excellent generality and flexibility. It is not dependent on specific network hardware (like IB), making it suitable for various heterogeneous clusters, and allows users to flexibly define hierarchical structures of any depth through labels. It automates complex topology maintenance tasks into simple node label management, significantly reducing operational costs and the risk of errors. Furthermore, this mechanism dynamically adapts to changes in cluster nodes and labels, maintaining the accuracy of topology information in real-time without manual intervention.

Related PR: #4629, @zhaoqi612

Usage documentation: HyperNode Auto Discovery.

Add Native Ray Framework Support

Ray is an open-source unified distributed computing framework whose core goal is to simplify parallel computing from single machines to large-scale clusters, especially suitable for scaling Python and AI applications. To manage and run Ray on Kubernetes, the community provides KubeRay—an operator specifically designed for Kubernetes. It acts as a bridge between Kubernetes and the Ray framework, greatly simplifying the deployment and management of Ray clusters and jobs.

Historically, running Ray workloads on Kubernetes primarily relied on the KubeRay Operator. KubeRay integrated Volcano in its v0.4.0 release (released in 2022) for scheduling and resource management of Ray Clusters, addressing issues like resource deadlocks in distributed training scenarios. With this new version of Volcano, users can now directly create and manage Ray clusters and submit computational tasks through native Volcano Jobs. This provides Ray users with an alternative usage scheme, allowing them to more directly utilize Volcano's capabilities such as Gang Scheduling, queue management and fair scheduling, and job lifecycle management for runni...