Skip to content

Conversation

@PatrykSaffer
Copy link
Contributor

@PatrykSaffer PatrykSaffer commented Dec 9, 2025

Purpose

When specifying max_seq_len and running with num_speculative_tokens > 1 not all cuda graph sizes are being captured.

Eg.
max_seq_len == 1, num_speculative_tokens==3
Only graph for batch_size==2 is being captured.

Test Plan

Manual test

Test Result

Tested manually

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request addresses a bug where CUDA graph sizes were not correctly captured when speculative decoding was enabled, particularly when max_seq_len was small and num_speculative_tokens was greater than 1. The fix correctly incorporates the number of speculative tokens into the calculation of max_cudagraph_capture_size, ensuring that the system captures appropriate graph sizes for speculative decoding scenarios. This improves the correctness and efficiency of CUDA graph utilization under these conditions.

Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
@mergify
Copy link

mergify bot commented Dec 9, 2025

Hi @PatrykSaffer, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
@njhill njhill requested a review from benchislett December 9, 2025 22:02
@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 9, 2025
@benchislett benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025
@benchislett benchislett enabled auto-merge (squash) December 9, 2025 22:26
@benchislett benchislett merged commit 4c2e10e into vllm-project:main Dec 10, 2025
47 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Dec 10, 2025
shaharmor98 pushed a commit to shaharmor98/smor-vllm that referenced this pull request Dec 11, 2025
…llm-project#30330)

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
TheCodeWrangler pushed a commit to TheCodeWrangler/vllm that referenced this pull request Dec 12, 2025
…llm-project#30330)

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Nathan Price <nathan@abridge.com>
TheCodeWrangler pushed a commit to TheCodeWrangler/vllm that referenced this pull request Dec 12, 2025
…llm-project#30330)

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Nathan Price <nathan@abridge.com>

Signed-off-by: Nathan Price <nathan@abridge.com>
TheCodeWrangler pushed a commit to TheCodeWrangler/vllm that referenced this pull request Dec 12, 2025
…llm-project#30330)

Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: Nathan Price <nathan@abridge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants