Skip to content

Navigation Menu

Appearance settings

View all features
- BY COMPANY SIZE
  Enterprises
  Small and medium teams
  Startups
  Nonprofits
- BY USE CASE
  App Modernization
  DevSecOps
  DevOps
  CI/CD
  View all use cases
- BY INDUSTRY
  Healthcare
  Financial services
  Manufacturing
  Government
  View all industries
View all solutions
- EXPLORE BY TOPIC
  AI
  Software Development
  DevOps
  Security
  View all topics
- EXPLORE BY TYPE
  Customer stories
  Events & webinars
  Ebooks & reports
  Business insights
  GitHub Skills
- SUPPORT & SERVICES
  Documentation
  Customer support
  Community forum
  Trust center
  Partners
- COMMUNITY
  GitHub SponsorsFund open source developers
- PROGRAMS
  Security Lab
  Maintainer Community
  Accelerator
  Archive Program
- REPOSITORIES
  Topics
  Trending
  Collections
- ENTERPRISE SOLUTIONS
  Enterprise platformAI-powered developer platform
- AVAILABLE ADD-ONS
  GitHub Advanced SecurityEnterprise-grade security features
  Copilot for BusinessEnterprise-grade AI features
  Premium SupportEnterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

PrunaAI / pruna Public

Notifications You must be signed in to change notification settings
Fork 75
Star 1.1k

Code
Issues 47
Pull requests 28
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

feat: Sage Attention Algorithm #455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Marius-Graml wants to merge 8 commits into main

base: main

Choose a base branch

Loading

Loading

from feat/sage-attn

Open

feat: Sage Attention Algorithm #455

Marius-Graml wants to merge 8 commits into main from feat/sage-attn

+228 −12

Conversation 8 Commits 8 Checks 6 Files changed 16

Conversation

Copy link

Marius-Graml commented Dec 8, 2025

Description

Integration of the Sage Attention algorithm into the Pruna framework. The current version applies the attention backend from Diffusers, choosing the Sage Attention kernel from the Kernel Hub. This is because the original sageattn function appears to be broken (its outputs were pure noise). Additionally, tests for the Sage Attention algorithm were implemented.

Related Issue

No issues were fixed.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Reuse of the tests for flashattn3 adapted to sage attention.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

/

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Marius-Graml changed the title ~~feat/sage attn~~ feat: Sage Attention Algorithm

Marius Graml and others added 4 commits

December 10, 2025 15:19


          Add sage attention algorithm to pruna framework by using diffusers at…

7894df2

…tention backend


          Add compatibility for sage-attn with torch-compile

0c59782


          Add tests für sage attn algorithm

c8eda60


          Change formatting using ruff

69c9679

Marius-Graml force-pushed the feat/sage-attn branch from e2e6e9f to 69c9679 Compare

December 10, 2025 15:25

johannaSommer requested review from johannaSommer and nifleisch

December 15, 2025 08:53

johannaSommer requested changes

View reviewed changes

Copy link

Member

johannaSommer left a comment

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First PR and already almost flawless, big 👏🏻👏🏻👏🏻 coming your way soon!

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

src/pruna/algorithms/sage_attn.py Show resolved Hide resolved

Uh oh!

There was an error while loading. Please reload this page.

src/pruna/algorithms/sage_attn.py Outdated

+                  runs_on: list[str] = ["cuda", "accelerate"]
+                  dataset_required: bool = False
+                  compatible_before: Iterable[str] = []
+                  compatible_after: Iterable[str] = ["torch_compile"]

Copy link

Member

johannaSommer Dec 15, 2025

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatible after would also be tags.CACHERS and compatible before probably also tags.QUANTIZERS

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Copy link

Member

johannaSommer Dec 15, 2025

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then add this compatibility also in other algorithms

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

src/pruna/algorithms/sage_attn.py Outdated

+                          return False
+                      return any(
+                          hasattr(component, "set_attention_backend") and component.dtype in [torch.bfloat16, torch.float16]

Copy link

Member

johannaSommer Dec 15, 2025

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recall this dtype check for the components from flash attention (because attention needs to be computed in this precision for FA3 to work), did we double check that that is the case also here?

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

src/pruna/algorithms/sage_attn.py Outdated

+                      # We simply apply the sage attention backend from diffusers
+                      # Furthermore, we use the sage attention kernel from the hub as the default sageattn function
+                      # is broken (at least at the moment)
+                      for component in model.components.values():

Copy link

Member

johannaSommer Dec 15, 2025

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, let's add target modules also here :)

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Marius-Graml added 3 commits

December 17, 2025 12:42


          Quick commit, add sage_attn2++ as reference paper, add cachers and qu…

03ed96b

…antizers as compatible after and before, add sage_attn in corresponding cachers and quantizers algorithms as compatible, add dtype check as sage_attn only works for float/bfloat16 (double checked), add target modules (but not fully finished yet)


          Add target modules including hyperparameter for excluding first and l…

e7414aa

…ast attention block per attention component. Remove dtype gaurd as dtypes of q, k, and v per attn module is implicitly checked by sage attention kernel.


          Add doc strings to functions and methods

16aa7f4

johannaSommer requested changes

View reviewed changes

src/pruna/algorithms/sage_attn.py Outdated

    
                          configuration system.

                      """

                      return [

                          Boolean(

Copy link

Member

johannaSommer Dec 18, 2025

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually not needed and we can remove it, as the user can specify this exactly through the target modules anyway (there is a smash config interface for this)

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

src/pruna/algorithms/sage_attn.py Outdated

    
                          The wrapped model.

                      """

                      target_modules = smash_config["target_modules"]

                      exclude_first_and_last_transformer_blocks = smash_config["exclude_first_and_last_transformer_blocks"]

Copy link

Member

johannaSommer Dec 18, 2025

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the target modules, let's please use the functionality we already have, otherwise we have a lot of duplicate code here

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions


          Refactor sage_attn to use target_modules utilities

b6cb7c2

Marius-Graml requested a review from johannaSommer

December 24, 2025 15:21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

nifleisch Awaiting requested review from nifleisch

johannaSommer Awaiting requested review from johannaSommer

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Uh oh!

There was an error while loading. Please reload this page.

3 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Community
Docs
Contact

You can’t perform that action at this time.