feat: reduce nb experts per token in moe architectures #450

llcnt · 2025-12-02T15:48:22Z

Description

This PR is a little tool that only acts on MoE models (for LLMs and Hunyuan3Image for now) by reducing the number of experts that are trigered for each token.
All models have been trained on a fix amount of active experts per token, and decreasing this number alter the output of the model. This idea was tested on Hunyuan3Image, gptoss_120b (for hunyaun3 image, (default is 8 out of 128 experts) 1 and 2 give very weird images: 4 experts seems ok, and yields 15% speedup. For gptoss120b, (default is 4 experts) 1 and 2 give very weird texts, and yields no speedup), but is applicable to any MoE, eg Mixtral, QwenNext, etc.

Related Issue

Fixes #(issue number)

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

Notebook to test the new feature available here.

cursor

Comment @cursor review or bugbot run to trigger another review on this PR

src/pruna/algorithms/reduce_noe.py

src/pruna/algorithms/red_noe.py

llcnt · 2025-12-12T13:31:41Z

@cursor review

simlang

Love it, super straightforward algorithm. commented on some higher level stuff.

.gitignore

src/pruna/algorithms/red_noe.py

llcnt force-pushed the feat/reduce_nb_experts_per_token branch from a93a18c to b3a0ac5 Compare December 3, 2025 16:41

llcnt marked this pull request as ready for review December 4, 2025 09:41

llcnt requested a review from sharpenb December 4, 2025 09:41

cursor bot reviewed Dec 4, 2025

View reviewed changes

src/pruna/algorithms/reduce_noe.py Show resolved Hide resolved

src/pruna/algorithms/reduce_noe.py Show resolved Hide resolved

src/pruna/algorithms/red_noe.py Outdated Show resolved Hide resolved

gsprochette mentioned this pull request Dec 12, 2025

build: delete uv.lock and gitignore it #457

Merged

10 tasks

simlang reviewed Dec 12, 2025

View reviewed changes

.gitignore Show resolved Hide resolved

src/pruna/algorithms/red_noe.py Outdated Show resolved Hide resolved

src/pruna/algorithms/red_noe.py Outdated Show resolved Hide resolved

llcnt added 9 commits December 23, 2025 16:23

feat: draft rednoe

eb9a911

feat: use tmpdir to save load with modified config

592174b

feat: add unit test

855d328

feat: make check fn more general and fix device

3aea1ec

feat: del uv.lock to avoid transformers pined version

8964d9b

feat: upd numpydoc version to avoid sphinx version errors

3e71ec6

fix: add _apply for pipeleines

f0e48b0

feat: change name to reduceNOE

69389d9

feat: simplify usage with unconstrained hyperparameters

04ee0cf

llcnt force-pushed the feat/reduce_nb_experts_per_token branch from e4e719b to 04ee0cf Compare December 23, 2025 16:46

fix: adapt path to new name

61a954a

llcnt requested a review from simlang December 23, 2025 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: reduce nb experts per token in moe architectures #450

feat: reduce nb experts per token in moe architectures #450

llcnt commented Dec 2, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llcnt commented Dec 12, 2025

Uh oh!

simlang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: reduce nb experts per token in moe architectures #450

Are you sure you want to change the base?

feat: reduce nb experts per token in moe architectures #450

Conversation

llcnt commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llcnt commented Dec 12, 2025

Uh oh!

simlang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llcnt commented Dec 2, 2025 •

edited

Loading