Releases: PrunaAI/pruna
Release v0.3.0
🚀 Pruna 0.3.0 — Structural Refactor and Interface Upgrade
Today, the pruna package is getting a long-awaited upgrade!
To support a growing variety of algorithms in the long term, we have refactored the internal structure that defines how algorithms are organized and applied.
Why the Refactor
In previous versions, certain algorithm groups — such as cachers or quantizers — were tightly coupled to the package’s structure. This rigid grouping made it difficult to introduce new types of algorithms, or to combine them in flexible ways.
Starting with Pruna 0.3.0, we’ve reworked this system so that such classifications are no longer hard constraints. Instead, they now serve as supplementary metadata, enabling a more modular, composable, and future-proof design. This refactor lays the groundwork for integrating new optimization techniques and custom pipelines without structural limitations.
New Interface
This release also introduces a more flexible configuration interface. You can now define your SmashConfig either as a simple list of algorithm names:
from pruna import SmashConfig
config = SmashConfig(["torch_compile", "deepcache"])or as a dictionary with detailed per-algorithm parameters:
from pruna import SmashConfig
config = SmashConfig({
"hqq":
{
"weight_bits": 4,
"compute_dtype": "torch.bfloat16"
},
"torch_compile":
{
"fullgraph": True,
"mode": "max-autotune"
}
})Algorithm Ordering and Compatibility
Another major change is how algorithm application order is determined.
Previously, the execution sequence was dictated by the hierarchy of algorithm classes and a global ordering. In 0.3.0, this has been replaced by a more atomic and declarative system: each algorithm now specifies its own compatibility rules and ordering constraints.
This makes the algorithm pipeline more self-organizing, robust to new extensions, and capable of resolving valid combinations dynamically.
Full Changelog: v0.2.11...v0.3.0
v0.2.11
The juiciest bits 🧃
October was all about Hacktoberfest, and wow, our community really showed up 🥺
From shiny new features to better docs and stronger evals, Pruna got a whole lot more powerful (and prettier) this month.
Let’s dive into what everyone’s been cooking 🔮
@R055A leveled up our experimentation game ✨
@kirdmiv brought the vibes to our evaluation module 🎨
Type checking, but make it sleek 🪶:
- feat: migrate from
mypytotyby @supakornn in #360 - style: add jaxtyping annotations and contributor documentation by @vaishnaviparabkar90 @begumcig in #423
@DevManpreet5 was both adding features and fixing our bugs 🧰 :
- feat: add progress bars to
EvaluationAgentby @DevManpreet5 in #348 - docs: fix incorrect clone link in README by @DevManpreet5 in #347
TinyIMDB dataset plus new logging system with adjustable levels by @pranayyb 🎬 :
- feat: Add TinyIMDB dataset for lightweight experiments by @pranayyb in #374
- feat: add
set_logging_levelfunctionality by @pranayyb in #398
@Almonok made sure our repo stays squeaky clean 🛡️:
DINO has joined our evaluation zoo! Thanks to @Prashankavi @begumcig 🦖:
- feat: dino score by @Prashankavi @begumcig in #354
We have more tutorials, making getting started with Pruna even easier, thanks to @ParagEkbote 🇪🇿:
- Add an End-To-End Tutorial for Efficient-Large-Model/Sana_600M_512px_diffusers by @ParagEkbote in #322
- Create Compatibility Matrix for Algorithm in Docs by @ParagEkbote in #403
Our very own Pruners were also busy making amazing contributions this month!: 🟣
Target modules are extending!:
- feat: add target modules to bnb quantizers by @gsprochette in #333
We have more datasets for experimentation and benchmarking:
- feat: add prompt-only image generation datasets by @nifleisch in #310
- feat: vbench datamodule by @begumcig in #397
EvaluationAgent now supports multi-GPU inference and multi-GPU latency metrics.
More evaluation metric features:
PrunaModel and SmashConfig improvements:
- feat: allow loading models without a smash config by @sharpenb in #340
- feat: add a get device type function to utils by @simlang in #416
Pruning some bugs 🐞 and maintanence 👩🌾:
- docs: removing
pruna_promentions by @Mel-Alm in #387 - telemetry: enable metrics and update OTLP endpoint to staging environment by @gtregoat in #394
- docs: update contributor setup instructions by @johannaSommer in #355
- docs: add target modules documentation by @gsprochette in #331
- ci: migrate test models to pruna-test by @gsprochette in #385
- fix: change vbench dependency to exclude macOS by @begumcig in #426
- fix: add dataset and inferencer for janus tests by @gsprochette in #327
- Reduce Flakiness in CI by configuring HF Token and add caching for HF… by @davidberenstein1957 in #410
- fix: remove outdated prunamodel interface from deploying Sana notebook by @begumcig in #420
🌱 New faces in the garden
- @DevManpreet5 made their first contribution in #347
- @R055A made their first contribution in #362
- @pranayyb made their first contribution in #374
- @Almonok made their first contribution in #383
- @supakornn made their first contribution in #360
- Our @Mel-Alm made her first contribution 💪 in #387
- @sharpenb is back with their first contribution since open-sourcing 🚀 in #340
- @gtregoat made his first contribution too, usually holding it down on the backend, but nice to see him in the main repo! ⚙️ in #394
Full Changelog: v0.2.10...v0.2.11
v0.2.10
The juiciest bits 🚀
feat: add unconstrained hyperparameter by @gsprochette in #263
Introduces target modules, so you can pass custom configs while still keeping dependencies intact.
feat: new quantizer for vllm by @llcnt in #239
Adds new config options (patch_for_inference, default_to_hf) so vLLM models play nicer with quantization workflows.
feat: add pre-smash-hook for model preparation by @simlang in #309
Adds a hook so algorithms can prep or tweak models before smashing, making customization easier.
Our documentation got a huge glow up 💅 thanks to @sdiazlor and @davidberenstein1957:
- docs: create end to end reasoning tutorial by @sdiazlor in #283
- docs: create end to end video tutorial by @sdiazlor in #233
- docs: fix discord broken link by @sdiazlor in #305
- docs: updates gtm by @davidberenstein1957 in #316
Pruning some bugs 🐞 and maintenance 🧑🌾
- torch.load always to cpu first by @simlang in #308
- Rework Model Context by @simlang in #323
- fix: make qkv compatible with torch.compile in next diffusers release by @llcnt in #302
- fix: hqq diffusers saving and loading forget non linear layers by @llcnt in #275
- fix: namespace package conflict of optimum and optimum-quanto by @ParagEkbote in #298
- fix: deprecated call types & fixture bug by @begumcig in #313
- fix: nightly tests llmcompressor and gptq by @llcnt in #315
- fix: update datamodules for datasets v4.0.0 by @begumcig in #328
- fix: update model card tags to include 'pruna-ai' by default by @davidberenstein1957 in #334
Full Changelog: v0.2.9...v0.2.10
Release v0.2.9
The juiciest bits 🚀
feat: add flash_attention 3 kernel for diffusers pipelines by @johannaSommer in #287
We've added flash attention 3 to our new algorithm group "kernels". With the help of huggingface's kernel hub and pruna, you can now use flash attention 3 for any diffusers pipeline. Speed ups will vary based on the pipeline you are smashing, but we recommend it specifically for video generation pipelines like Wan!
feat: enhance model checks for transformers pipelines by @davidberenstein1957 in #281
We extended multiple algorithms to not support directly smashing a transformers pipeline without extracting the underlying model, simply give it to smash() and we will do the rest.
replace os.path with pathlib.Path by @GreatBahram in #260
@GreatBahram helped us finally with to pathlib and the code is looking cleaner than ever! 💅🏻
Pruning some bugs 🐞 and maintenance 🧑🌾
- test: connect inference/eval tests to algorithms by @begumcig in #181
- tests: update fixtures for algorithm and evaluation tests by @johannaSommer in #288
- fix: device placement with indexed devices by @davidberenstein1957 in #205
- feat: 277 feature update modelcard to include a snippet and base model by @davidberenstein1957 in #282
- tests: update audio datasets, add sdxl as example model by @johannaSommer in #293
- fix: failures from device indexing and evaluation testing by @johannaSommer in #300
Full Changelog: v0.2.8...v0.2.9
Release v0.2.8
The juiciest bits 🚀
feat: add arniqa by @begumcig in #183 | feat: clipiqa metric by @begumcig in #259 | feat: add sharpness metric by @begumcig in #261
This pruna release was all about metrics - @begumcig and @davidberenstein1957 integrated several new image generation metrics into pruna, which you can now use together with the EvaluationAgent to compare your smashed models.
feat: run test cases in parallel by @GreatBahram in #246
It wouldn't be a pruna release without @GreatBahram making the lives of contributors easier. Our Github Actions test cases now run in parallel, allowing us to merge & ship new algorithms even faster!
Pruning some bugs 🐞 and maintenance 🧑🌾
- fix: device state of input model for memory metrics by @johannaSommer in #256
- chore: 248 doc update the documentation tests to run more efficiently by @davidberenstein1957 in #250
- fix: warnings related to
chunk_length_sin ifw by @johannaSommer in #258 - build: update hqq requirement from <0.2.7 to <0.2.8 by @dependabot[bot] in #206
- fix: pin
torchmetricuntil PSNR adjustment by @johannaSommer in #274 - fix: handle new transformers version by @johnrachwan123 in #280
- chore: remove deprecated metrics, metric modes and calltypes by @begumcig in #264
- docs: fix documentation warnings when building by @sdiazlor in #267
- fix: metrics only return scalars by @begumcig in #284
Full Changelog: v0.2.7...v0.2.8
Release v0.2.7
The juiciest bits 🚀
feat: add janus support for quantization+torch.compile combo(s) by @llcnt in #145
You can now decrease the memory impact and the latency of the autoregressive Image Generation model janus(pro-7b) model by quantizing and compiling it.
feat: modular pruning by @begumcig in #154
Pruning is now agnostic to submodules! This recent update enables safe, module-level pruning by finding each target module’s interior, boundary, and exterior, pruning only the interior while auto-patching surrounding shapes.
feat: extend accelerate compatibility by @johannaSommer in #234
We are rolling out more support for base models distributed with accelerate. Several cachers, factorizers and more quantizers are now compatible!
feat: enhance model saving functionality with pro support by @davidberenstein1957 in #200
Saving your smashed models to Huggingface Hub just became easier - we added additional support to better distinguish models that were smashed with pruna and pruna_pro
feat: simplify the setup (tests + dev = dev) by @GreatBahram in #210
@GreatBahram was at it again this release, simplifying the installation and setup for contributors by removing dependency groups - a simple uv sync --extra dev does the job and you can start contributing!
build: reduce core dependencies in pyproject.toml by @ParagEkbote in #227
@ParagEkbote made pruna more lightweight and reduced the package dependencies by 20%!
Pruning some bugs 🐞 and maintenance 🧑🌾
- ci: add mission permission set to
package_build.yamlby @johannaSommer in #229 - tests: simplify diffusers fixture construction by @johannaSommer in #159
- build: reduce core dependencies in
pyproject.tomlby @ParagEkbote in #227 - feat: Refactor CI to use shared setup-uv-project action by @GreatBahram in #211
- fix:
device_mapspecification foraccelerate-compatible quantizers by @johannaSommer in #226 - chore: add github documentation on privately reporting vulnerabilities by @SaboniAmine in #217
- test: add pytest and code coverage configuration in
pyproject.tomlby @ParagEkbote in #230 - build: fix MacOS installation issues with bnb and
uvindex resolution by @johannaSommer in #235 - chore: enhance CI workflows with linting and concurrency controls by @davidberenstein1957 in #237
- ci: update algorithm generation workflow by @johannaSommer in #241
- ci: cleanup workflows 🧹 by @GreatBahram in #212
- build: pin
datasetsversion by @johannaSommer in #251 - tests: add
durationsargument to pytest config by @johannaSommer in #252 - fix: change janus import into automodel import by @llcnt in #243
- fix:
gptqandllmcompressortests by @johannaSommer in #231 - docs: review documentation failures before release by @sdiazlor in #238
- ci: add nightly workflow to mark PRs as stale by @johannaSommer in #242
- fix: installation test for gliner dependency by @begumcig in #255
New Contributors
- @ParagEkbote made their first contribution in #227
- @sdiazlor made their first contribution in #238
Full Changelog: v0.2.6...v0.2.7
Release v0.2.6
The juiciest bits 🚀
feat: accelerate support by @johannaSommer in #128
Pruna now supports smashing base models that are distributed across several GPUs with accelerate! Enjoy quantizing your big models from two GPUs to just one. We will roll out support for more algorithms as well as compatibility with the EvaluationAgent in the following releases.
feat: switch pruna from poetry to uv by @johnrachwan123 in #164
UV needs no introduction and you can now finally install pruna in lightning speed!
feat: streamline import failure handling by @johannaSommer in #152
We have streamlined the handling of algorithm-specific packages - we verify their correct installation before smashing and now guide the user better through the installation steps if a package is missing.
feat: add dependabot by @GreatBahram in #166
To make sure our dependencies are always up to date and support the newest versions, @GreatBahram introduced a dependabot to the pruna repository! 🤖
feat: improve overall device placement handling by @davidberenstein1957 in #148
To further improve user-experience, we now assist with choosing the best device available for smashing your models.
feat: improve the pre-commit configuration by @GreatBahram in #160
@GreatBahram improved the experience of contributers by overhauling our pre-commit configuration.
feat: update EvaluationAgent to support direct parameters and depreca… by @Ayyanaruto in #188
In their first contribution to the pruna repository, @Ayyanaruto improved the interface of the EvaluationAgent so that users can now directly specify metrics and parameters through the agent's constructor!
Pruning some bugs 🐞 and maintenance 🧑🌾
- fix: torchao rejection test by @johannaSommer in #132
- fix: rewrapping pruna model bug by @begumcig in #174
- fix: pin hqq dependency to avoid model re-loading bug by @davidberenstein1957 in #178
- chore: update pyproject.toml for optional dependencies and bitsandbytes by @davidberenstein1957 in #175
- fix: update poetry setup for external collaborators by @johannaSommer in #191
- docs: improve the documentation by @GreatBahram in #163
- build: update ctranslate2 requirement from ==4.5.0 to ==4.6.0 by @dependabot in #199
- docs: fix note on algorithm argument checking by @johannaSommer in #158
- build: update pytest requirement from 7.4.4 to 8.4.0 by @dependabot in #201
- fix: 168 bug device placement does not work with torchmetric by @davidberenstein1957 in #169
- fix: move import check after availability checks by @johannaSommer in #203
- fix: failing docs tests by @davidberenstein1957 in #139
- docs: change contributors list by @johnrachwan123 in #204
- fix: fix fp8dqrow setting by @nifleisch in #156
- refactor: update function signatures to accept both str and Path types by @Ayyanaruto in #187
- fix: update gptqmodel installation by @johnrachwan123 in #215
- docs: update Flux tutorial as a more general image generation tutorial by @davidberenstein1957 in #127
- docs: update LLM tutorial to optimize and evaluate large language models by @davidberenstein1957 in #126
- fix: code blocks in docs and the code block test by @begumcig in #218
- fix: device of FID torch metric by @johannaSommer in #223
No longer supported 👋
- refactor: remove
torch_staticquantizer 👋 by @johannaSommer in #140
New Contributors
- @dependabot made their first contribution in #193
- @Ayyanaruto made their first contribution in #188
Full Changelog: v0.2.5...v0.2.6
v0.2.5
The juiciest bits 🚀
refactor: metric attributes by @begumcig in #81
We improved the evaluation framework for a cleaner UX: every metric is now atomic (returns a single float) with clear attributes (name, units, higher_is_better) and a unified MetricResult wrapper. All metrics consistently accept single or pairwise calls, name alignment is enforced, and the EvaluationAgent shares inference across metrics to avoid redundant computation.
refactor: remove deprecated awq algorithm by @johnrachwan123 in #143 | feat: add llm_compressor quantizer by @johnrachwan123 in #144
To continue support of the popular AWQ quantizer, we moved from auto-awq to llm_compressor.
Fixing some bugs 🐞
- fix: enhance model saving functionality to load multiple JSON configs present in diffusers by @davidberenstein1957 in #98
Full Changelog: v0.2.4...v0.2.5
v0.2.4
The juiciest bits 🚀
- feat: automate loading arguments by @johannaSommer in #77
- feat:
max_batch_sizerefactoring by @johannaSommer in #67 - feat: add fastercache and pab by @nifleisch in #92
- refactor: move argument compatibility checks by @johannaSommer in #102
- docs: general structure refactor by @davidberenstein1957 in #90
- feat: add option for compilation for module lists by @johnrachwan123 in #105
- feat: add fora cacher by @nifleisch in #106
- test: use tiny random models to speed up tests by @nifleisch in #109
- refactor: remove deprecated algorithm names by @johannaSommer in #104
- feat: add device validation utility for improved device management by @davidberenstein1957 in #103
- feat: qkv fusing by @llcnt in #75
- feat: add torchao quantizer by @nifleisch in #110
Fixing some bugs 🐞
- fix: lower-bound ConfigSpace version by @johannaSommer in #101
- fix: fastercache and pab compatibility by @begumcig in #112
- fix: memory cleanup bug by @johannaSommer in #131
Full Changelog: v0.2.3...v0.2.4
Release v0.2.3
The juiciest bits 🚀
- feat: Add Hugging Face integration to save and load models by @davidberenstein1957 in #44
- feat: compile forward pass llm by @llcnt @johnrachwan123 in #51
- feat: support portable torch compilation by @johannaSommer in #69
- feat: add comfy support sfast by @agNikitaras in #66
Pruning some bugs 🐞
- fix: pin huggingface
datasetsversion by @johannaSommer in #79 - fix: broken links docs by @begumcig in #74
- fix:
GPUMemorymetric memory bug by @johannaSommer in #78 - fix:
collate_fncompatibility check logging by @johannaSommer in #76
New Contributors
- @agNikitaras made their first contribution in #66
Full Changelog: v0.2.2...v0.2.3