[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 #30319

heheda12345 · 2025-12-09T08:26:05Z

Purpose

We now forgets to pass the draft tokens from model runner to scheduler. This PR fix it.

Test Plan

VLLM_ENABLE_V1_MULTIPROCESSING=0 python3 examples/offline_inference/spec_decode.py --test

Test Result

Before this PR:

--------------------------------------------------
total_num_output_tokens: 247913
num_drafts: 0
num_draft_tokens: 0
num_accepted_tokens: 0
mean acceptance length: 1.00
--------------------------------------------------
acceptance at token 0: 0.00
acceptance at token 1: 0.00

After this PR:

--------------------------------------------------
total_num_output_tokens: 247893
num_drafts: 122887
num_draft_tokens: 245774
num_accepted_tokens: 124787
mean acceptance length: 2.02
--------------------------------------------------
acceptance at token 0: 0.69
acceptance at token 1: 0.33

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

chatgpt-codex-connector · 2025-12-09T08:26:18Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request aims to fix speculative decoding when not using multiprocessing by adding a call to post_step. While the change correctly identifies the missing call, the implementation hardcodes model_executed=True. This could lead to issues if step_fn is called without the model actually executing. I've provided a suggestion to use the model_executed flag returned by step_fn to ensure correctness and maintain consistency with the multiprocessing implementation.

vllm/v1/engine/core_client.py

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

njhill

Thanks @heheda12345!

heheda12345 added 2 commits December 9, 2025 00:12

fix bug

143b2f3

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

revert

cb9eb63

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

mergify bot added the v1 label Dec 9, 2025

heheda12345 requested a review from njhill December 9, 2025 08:27

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

vllm/v1/engine/core_client.py Outdated Show resolved Hide resolved

fix

5c72688

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

heheda12345 mentioned this pull request Dec 12, 2025

[V1] [Hybrid] Lighter Mamba Prefix Caching with standard memory layout #29272

Draft

5 tasks

njhill approved these changes Dec 12, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 12, 2025

njhill enabled auto-merge (squash) December 12, 2025 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 #30319

[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 #30319

heheda12345 commented Dec 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 #30319

Are you sure you want to change the base?

[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 #30319

Conversation

heheda12345 commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heheda12345 commented Dec 9, 2025 •

edited by github-actions bot

Loading