Skip to content

Conversation

@llcnt
Copy link
Collaborator

@llcnt llcnt commented Dec 2, 2025

Description

This PR is a little tool that only acts on MoE models (for LLMs and Hunyuan3Image for now) by reducing the number of experts that are trigered for each token.
All models have been trained on a fix amount of active experts per token, and decreasing this number alter the output of the model. This idea was tested on Hunyuan3Image, gptoss_120b (for hunyaun3 image, (default is 8 out of 128 experts) 1 and 2 give very weird images: 4 experts seems ok, and yields 15% speedup. For gptoss120b, (default is 4 experts) 1 and 2 give very weird texts, and yields no speedup), but is applicable to any MoE, eg Mixtral, QwenNext, etc.

Related Issue

Fixes #(issue number)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

image

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

Notebook to test the new feature available here.

@llcnt llcnt force-pushed the feat/reduce_nb_experts_per_token branch from a93a18c to b3a0ac5 Compare December 3, 2025 16:41
@llcnt llcnt marked this pull request as ready for review December 4, 2025 09:41
@llcnt llcnt requested a review from sharpenb December 4, 2025 09:41
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment @cursor review or bugbot run to trigger another review on this PR

@llcnt
Copy link
Collaborator Author

llcnt commented Dec 12, 2025

@cursor review

Copy link
Member

@simlang simlang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, super straightforward algorithm. commented on some higher level stuff.

@llcnt llcnt force-pushed the feat/reduce_nb_experts_per_token branch from e4e719b to 04ee0cf Compare December 23, 2025 16:46
@llcnt llcnt requested a review from simlang December 23, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants