[Bugfix] Fix cuda graph sizes when running with speculative decoding #30330

PatrykSaffer · 2025-12-09T10:45:30Z

Purpose

When specifying max_seq_len and running with num_speculative_tokens > 1 not all cuda graph sizes are being captured.

Eg.
max_seq_len == 1, num_speculative_tokens==3
Only graph for batch_size==2 is being captured.

Test Plan

Manual test

Test Result

Tested manually

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>

chatgpt-codex-connector · 2025-12-09T10:45:39Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

The pull request addresses a bug where CUDA graph sizes were not correctly captured when speculative decoding was enabled, particularly when max_seq_len was small and num_speculative_tokens was greater than 1. The fix correctly incorporates the number of speculative tokens into the calculation of max_cudagraph_capture_size, ensuring that the system captures appropriate graph sizes for speculative decoding scenarios. This improves the correctness and efficiency of CUDA graph utilization under these conditions.

Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>

mergify · 2025-12-09T11:26:57Z

Hi @PatrykSaffer, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>

…llm-project#30330) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>

…llm-project#30330) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: Nathan Price <nathan@abridge.com>

…llm-project#30330) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: Nathan Price <nathan@abridge.com> Signed-off-by: Nathan Price <nathan@abridge.com>

…llm-project#30330) Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai> Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com> Signed-off-by: Nathan Price <nathan@abridge.com>

Fix cuda graph bug with spec dec

a954cb1

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>

PatrykSaffer requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners December 9, 2025 10:45

mergify bot added the nvidia label Dec 9, 2025

github-project-automation bot added this to NVIDIA Dec 9, 2025

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

Update vllm.py

a25c64e

Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>

Update vllm.py

8edd14d

Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>

njhill requested a review from benchislett December 9, 2025 22:02

benchislett approved these changes Dec 9, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Dec 9, 2025

benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025

Merge branch 'main' into patryk/cuda-graph-spec-dec-bug

c0f0686

benchislett enabled auto-merge (squash) December 9, 2025 22:26

benchislett merged commit 4c2e10e into vllm-project:main Dec 10, 2025
47 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix cuda graph sizes when running with speculative decoding #30330

[Bugfix] Fix cuda graph sizes when running with speculative decoding #30330

Uh oh!

PatrykSaffer commented Dec 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Fix cuda graph sizes when running with speculative decoding #30330

[Bugfix] Fix cuda graph sizes when running with speculative decoding #30330

Uh oh!

Conversation

PatrykSaffer commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PatrykSaffer commented Dec 9, 2025 •

edited by github-actions bot

Loading