feat: moe kernel tuning #482

llcnt · 2025-12-23T10:59:08Z

Description

This PR is inspired from vLLM benchmarks (the benchmark_config fn is copied from here) and enable one to tune the MoE (triton) kernel used in vllm.
This new algorithm MoeKernelTuner does not modify the model. It generates a tuned configuration that is saved in:

the vllm configs folder (so that using the model on the same gpu afterward makes vllm use this optimized config);
the RedhatAI kernel folder in the hf hub (so that using the moe kernels from the kernels lib will make use of the optimized config);
the smash_config (to be saved and later re-used without waiting for tuning).

The core modification are in:

the new moe_kernel_tuner.py file (...);
the smash_config.py file (adding new artifacts for saving any additional dict into the smashconfig);
the load.py file (for re-svaing the tuned config inside vllm/hf cache when loading a smashed model).

TODO:

clean PR description
add model pruna-test/qwen3_coder_tiny for unit tests

Related Issue

Fixes #(issue number)

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

Notebook for testing with vllm is available here

llcnt added 12 commits December 23, 2025 17:02

feat: draft kernel tuning based on vllm fns

00f5eb7

feat: draft kernel tuning based on vllm fns

1a0d45e

feat: add benchmark fn and saving draft

8e3aec4

feat: clean and simplify tuning and config saving

3070db1

feat: add custom loading fn

19a5518

feat: add custom loading fn

eb98702

feat: add unit test

b2d600e

feat: add vllm dep and upd torch version

eeda9c8

feat: change smashconfig to save artifacts and reload it

4de173e

fix: adapt parameter names inside smashconfig

1314089

fix: moe intermediate size can differ from model intermediate size

3eb111e

fix: adapt xformers version to fit new torch version

5764274

llcnt force-pushed the feat/moe_kernel_tuning branch from 78c6657 to 5764274 Compare December 23, 2025 17:02

llcnt added 4 commits December 23, 2025 17:14

feat: uv tries to resolve even for extra dependencies in the ci

85f015e

feat: ruff linting

9322957

feat: ty check linting

f7daf14

fix: npdoc space issue

1d6cc7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: moe kernel tuning #482

feat: moe kernel tuning #482

Uh oh!

llcnt commented Dec 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: moe kernel tuning #482

Are you sure you want to change the base?

feat: moe kernel tuning #482

Uh oh!

Conversation

llcnt commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llcnt commented Dec 23, 2025 •

edited

Loading