Skip to content

Conversation

@llcnt
Copy link
Collaborator

@llcnt llcnt commented Dec 23, 2025

Description

This PR is inspired from vLLM benchmarks (the benchmark_config fn is copied from here) and enable one to tune the MoE (triton) kernel used in vllm.
This new algorithm MoeKernelTuner does not modify the model. It generates a tuned configuration that is saved in:

  • the vllm configs folder (so that using the model on the same gpu afterward makes vllm use this optimized config);
  • the RedhatAI kernel folder in the hf hub (so that using the moe kernels from the kernels lib will make use of the optimized config);
  • the smash_config (to be saved and later re-used without waiting for tuning).

The core modification are in:

  • the new moe_kernel_tuner.py file (...);
  • the smash_config.py file (adding new artifacts for saving any additional dict into the smashconfig);
  • the load.py file (for re-svaing the tuned config inside vllm/hf cache when loading a smashed model).

TODO:

  • clean PR description
  • add model pruna-test/qwen3_coder_tiny for unit tests

Related Issue

Fixes #(issue number)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

Notebook for testing with vllm is available here

@llcnt llcnt force-pushed the feat/moe_kernel_tuning branch from 78c6657 to 5764274 Compare December 23, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants