10 Nov 09:50

02f9779

Release v0.3.0 Latest

Latest

🚀 Pruna 0.3.0 — Structural Refactor and Interface Upgrade

Today, the pruna package is getting a long-awaited upgrade!
To support a growing variety of algorithms in the long term, we have refactored the internal structure that defines how algorithms are organized and applied.

Why the Refactor

In previous versions, certain algorithm groups — such as cachers or quantizers — were tightly coupled to the package’s structure. This rigid grouping made it difficult to introduce new types of algorithms, or to combine them in flexible ways.

Starting with Pruna 0.3.0, we’ve reworked this system so that such classifications are no longer hard constraints. Instead, they now serve as supplementary metadata, enabling a more modular, composable, and future-proof design. This refactor lays the groundwork for integrating new optimization techniques and custom pipelines without structural limitations.

New Interface

This release also introduces a more flexible configuration interface. You can now define your SmashConfig either as a simple list of algorithm names:

from pruna import SmashConfig

config = SmashConfig(["torch_compile", "deepcache"])

or as a dictionary with detailed per-algorithm parameters:

from pruna import SmashConfig

config = SmashConfig({
      "hqq":
          {
              "weight_bits": 4,
              "compute_dtype": "torch.bfloat16"
          },
      "torch_compile":
          {
          "fullgraph": True,
          "mode": "max-autotune"
          }
})

Algorithm Ordering and Compatibility

Another major change is how algorithm application order is determined.
Previously, the execution sequence was dictated by the hierarchy of algorithm classes and a global ordering. In 0.3.0, this has been replaced by a more atomic and declarative system: each algorithm now specifies its own compatibility rules and ordering constraints.

This makes the algorithm pipeline more self-organizing, robust to new extensions, and capable of resolving valid combinations dynamically.

Full Changelog: v0.2.11...v0.3.0

Assets 2

04 Nov 16:48

begumcig

v0.2.11

96f3163

v0.2.11

The juiciest bits 🧃

October was all about Hacktoberfest, and wow, our community really showed up 🥺
From shiny new features to better docs and stronger evals, Pruna got a whole lot more powerful (and prettier) this month.
Let’s dive into what everyone’s been cooking 🔮

@R055A leveled up our experimentation game ✨

feat: tinywikitext dataset by @R055A in #362
feat: multi-scale SSIM by @R055A in #361

@kirdmiv brought the vibes to our evaluation module 🎨

feat: aesthetic laion metric by @kirdmiv in #424

Type checking, but make it sleek 🪶:

feat: migrate from mypy to ty by @supakornn in #360
style: add jaxtyping annotations and contributor documentation by @vaishnaviparabkar90 @begumcig in #423

@DevManpreet5 was both adding features and fixing our bugs 🧰 :

feat: add progress bars to EvaluationAgent by @DevManpreet5 in #348
docs: fix incorrect clone link in README by @DevManpreet5 in #347

TinyIMDB dataset plus new logging system with adjustable levels by @pranayyb 🎬 :

feat: Add TinyIMDB dataset for lightweight experiments by @pranayyb in #374
feat: add set_logging_level functionality by @pranayyb in #398

@Almonok made sure our repo stays squeaky clean 🛡️:

feat: add pre-commit trufflehog by @Almonok in #383

DINO has joined our evaluation zoo! Thanks to @Prashankavi @begumcig 🦖:

feat: dino score by @Prashankavi @begumcig in #354

We have more tutorials, making getting started with Pruna even easier, thanks to @ParagEkbote 🇪🇿:

Add an End-To-End Tutorial for Efficient-Large-Model/Sana_600M_512px_diffusers by @ParagEkbote in #322
Create Compatibility Matrix for Algorithm in Docs by @ParagEkbote in #403

Our very own Pruners were also busy making amazing contributions this month!: 🟣

Target modules are extending!:

feat: add target modules to bnb quantizers by @gsprochette in #333

We have more datasets for experimentation and benchmarking:

feat: add prompt-only image generation datasets by @nifleisch in #310
feat: vbench datamodule by @begumcig in #397

EvaluationAgent now supports multi-GPU inference and multi-GPU latency metrics.

feat: accelerate inference by @begumcig in #405

More evaluation metric features:

feat: expose metric information in MetricResult by @sdiazlor in #326

PrunaModel and SmashConfig improvements:

feat: allow loading models without a smash config by @sharpenb in #340
feat: add a get device type function to utils by @simlang in #416

Pruning some bugs 🐞 and maintanence 👩‍🌾:

docs: removing pruna_pro mentions by @Mel-Alm in #387
telemetry: enable metrics and update OTLP endpoint to staging environment by @gtregoat in #394
docs: update contributor setup instructions by @johannaSommer in #355
docs: add target modules documentation by @gsprochette in #331
ci: migrate test models to pruna-test by @gsprochette in #385
fix: change vbench dependency to exclude macOS by @begumcig in #426
fix: add dataset and inferencer for janus tests by @gsprochette in #327
Reduce Flakiness in CI by configuring HF Token and add caching for HF… by @davidberenstein1957 in #410
fix: remove outdated prunamodel interface from deploying Sana notebook by @begumcig in #420

🌱 New faces in the garden

@DevManpreet5 made their first contribution in #347
@R055A made their first contribution in #362
@pranayyb made their first contribution in #374
@Almonok made their first contribution in #383
@supakornn made their first contribution in #360
Our @Mel-Alm made her first contribution 💪 in #387
@sharpenb is back with their first contribution since open-sourcing 🚀 in #340
@gtregoat made his first contribution too, usually holding it down on the backend, but nice to see him in the main repo! ⚙️ in #394

Full Changelog: v0.2.10...v0.2.11

Contributors

R055A, begumcig, and 17 other contributors

Assets 2

17 Sep 09:57

begumcig

v0.2.10

5881538

v0.2.10

The juiciest bits 🚀

feat: add unconstrained hyperparameter by @gsprochette in #263

Introduces target modules, so you can pass custom configs while still keeping dependencies intact.

feat: new quantizer for vllm by @llcnt in #239

Adds new config options (patch_for_inference, default_to_hf) so vLLM models play nicer with quantization workflows.

feat: add pre-smash-hook for model preparation by @simlang in #309

Adds a hook so algorithms can prep or tweak models before smashing, making customization easier.

Our documentation got a huge glow up 💅 thanks to @sdiazlor and @davidberenstein1957:

docs: create end to end reasoning tutorial by @sdiazlor in #283
docs: create end to end video tutorial by @sdiazlor in #233
docs: fix discord broken link by @sdiazlor in #305
docs: updates gtm by @davidberenstein1957 in #316

Pruning some bugs 🐞 and maintenance 🧑‍🌾

torch.load always to cpu first by @simlang in #308
Rework Model Context by @simlang in #323
fix: make qkv compatible with torch.compile in next diffusers release by @llcnt in #302
fix: hqq diffusers saving and loading forget non linear layers by @llcnt in #275
fix: namespace package conflict of optimum and optimum-quanto by @ParagEkbote in #298
fix: deprecated call types & fixture bug by @begumcig in #313
fix: nightly tests llmcompressor and gptq by @llcnt in #315
fix: update datamodules for datasets v4.0.0 by @begumcig in #328
fix: update model card tags to include 'pruna-ai' by default by @davidberenstein1957 in #334

Full Changelog: v0.2.9...v0.2.10

Contributors

begumcig, davidberenstein1957, and 5 other contributors

Assets 2

13 Aug 14:35

johannaSommer

v0.2.9

d37c2d3

Release v0.2.9

The juiciest bits 🚀

feat: add `flash_attention 3` kernel for diffusers pipelines by @johannaSommer in #287

We've added flash attention 3 to our new algorithm group "kernels". With the help of huggingface's kernel hub and pruna, you can now use flash attention 3 for any diffusers pipeline. Speed ups will vary based on the pipeline you are smashing, but we recommend it specifically for video generation pipelines like Wan!

feat: enhance model checks for transformers pipelines by @davidberenstein1957 in #281

We extended multiple algorithms to not support directly smashing a transformers pipeline without extracting the underlying model, simply give it to smash() and we will do the rest.

replace os.path with pathlib.Path by @GreatBahram in #260

@GreatBahram helped us finally with to pathlib and the code is looking cleaner than ever! 💅🏻

Pruning some bugs 🐞 and maintenance 🧑‍🌾

test: connect inference/eval tests to algorithms by @begumcig in #181
tests: update fixtures for algorithm and evaluation tests by @johannaSommer in #288
fix: device placement with indexed devices by @davidberenstein1957 in #205
feat: 277 feature update modelcard to include a snippet and base model by @davidberenstein1957 in #282
tests: update audio datasets, add sdxl as example model by @johannaSommer in #293
fix: failures from device indexing and evaluation testing by @johannaSommer in #300

Full Changelog: v0.2.8...v0.2.9

Contributors

GreatBahram, begumcig, and 2 other contributors

Assets 2

29 Jul 14:08

johannaSommer

v0.2.8

d96bdf9

Release v0.2.8

The juiciest bits 🚀

feat: add arniqa by @begumcig in #183 | feat: clipiqa metric by @begumcig in #259 | feat: add sharpness metric by @begumcig in #261

This pruna release was all about metrics - @begumcig and @davidberenstein1957 integrated several new image generation metrics into pruna, which you can now use together with the EvaluationAgent to compare your smashed models.

feat: run test cases in parallel by @GreatBahram in #246

It wouldn't be a pruna release without @GreatBahram making the lives of contributors easier. Our Github Actions test cases now run in parallel, allowing us to merge & ship new algorithms even faster!

Pruning some bugs 🐞 and maintenance 🧑‍🌾

fix: device state of input model for memory metrics by @johannaSommer in #256
chore: 248 doc update the documentation tests to run more efficiently by @davidberenstein1957 in #250
fix: warnings related to chunk_length_s in ifw by @johannaSommer in #258
build: update hqq requirement from <0.2.7 to <0.2.8 by @dependabot[bot] in #206
fix: pin torchmetric until PSNR adjustment by @johannaSommer in #274
fix: handle new transformers version by @johnrachwan123 in #280
chore: remove deprecated metrics, metric modes and calltypes by @begumcig in #264
docs: fix documentation warnings when building by @sdiazlor in #267
fix: metrics only return scalars by @begumcig in #284

Full Changelog: v0.2.7...v0.2.8

Contributors

GreatBahram, begumcig, and 5 other contributors

Assets 2

14 Jul 09:46

johannaSommer

v0.2.7

87eae01

Release v0.2.7

The juiciest bits 🚀

feat: add janus support for quantization+torch.compile combo(s) by @llcnt in #145

You can now decrease the memory impact and the latency of the autoregressive Image Generation model janus(pro-7b) model by quantizing and compiling it.

feat: modular pruning by @begumcig in #154

Pruning is now agnostic to submodules! This recent update enables safe, module-level pruning by finding each target module’s interior, boundary, and exterior, pruning only the interior while auto-patching surrounding shapes.

feat: extend `accelerate` compatibility by @johannaSommer in #234

We are rolling out more support for base models distributed with accelerate. Several cachers, factorizers and more quantizers are now compatible!

feat: enhance model saving functionality with pro support by @davidberenstein1957 in #200

Saving your smashed models to Huggingface Hub just became easier - we added additional support to better distinguish models that were smashed with pruna and pruna_pro

feat: simplify the setup (tests + dev = dev) by @GreatBahram in #210

@GreatBahram was at it again this release, simplifying the installation and setup for contributors by removing dependency groups - a simple uv sync --extra dev does the job and you can start contributing!

build: reduce core dependencies in `pyproject.toml` by @ParagEkbote in #227

@ParagEkbote made pruna more lightweight and reduced the package dependencies by 20%!

Pruning some bugs 🐞 and maintenance 🧑‍🌾

ci: add mission permission set to package_build.yaml by @johannaSommer in #229
tests: simplify diffusers fixture construction by @johannaSommer in #159
build: reduce core dependencies in pyproject.toml by @ParagEkbote in #227
feat: Refactor CI to use shared setup-uv-project action by @GreatBahram in #211
fix: device_map specification for accelerate-compatible quantizers by @johannaSommer in #226
chore: add github documentation on privately reporting vulnerabilities by @SaboniAmine in #217
test: add pytest and code coverage configuration in pyproject.toml by @ParagEkbote in #230
build: fix MacOS installation issues with bnb and uv index resolution by @johannaSommer in #235
chore: enhance CI workflows with linting and concurrency controls by @davidberenstein1957 in #237
ci: update algorithm generation workflow by @johannaSommer in #241
ci: cleanup workflows 🧹 by @GreatBahram in #212
build: pin datasets version by @johannaSommer in #251
tests: add durations argument to pytest config by @johannaSommer in #252
fix: change janus import into automodel import by @llcnt in #243
fix: gptq and llmcompressor tests by @johannaSommer in #231
docs: review documentation failures before release by @sdiazlor in #238
ci: add nightly workflow to mark PRs as stale by @johannaSommer in #242
fix: installation test for gliner dependency by @begumcig in #255

New Contributors

@ParagEkbote made their first contribution in #227
@sdiazlor made their first contribution in #238

Full Changelog: v0.2.6...v0.2.7

Contributors

GreatBahram, begumcig, and 6 other contributors

Assets 2

30 Jun 18:20

johannaSommer

v0.2.6

034d474

Release v0.2.6

The juiciest bits 🚀

feat: `accelerate` support by @johannaSommer in #128

Pruna now supports smashing base models that are distributed across several GPUs with accelerate! Enjoy quantizing your big models from two GPUs to just one. We will roll out support for more algorithms as well as compatibility with the EvaluationAgent in the following releases.

feat: switch pruna from poetry to uv by @johnrachwan123 in #164

UV needs no introduction and you can now finally install pruna in lightning speed!

feat: streamline import failure handling by @johannaSommer in #152

We have streamlined the handling of algorithm-specific packages - we verify their correct installation before smashing and now guide the user better through the installation steps if a package is missing.

feat: add dependabot by @GreatBahram in #166

To make sure our dependencies are always up to date and support the newest versions, @GreatBahram introduced a dependabot to the pruna repository! 🤖

feat: improve overall device placement handling by @davidberenstein1957 in #148

To further improve user-experience, we now assist with choosing the best device available for smashing your models.

feat: improve the pre-commit configuration by @GreatBahram in #160

@GreatBahram improved the experience of contributers by overhauling our pre-commit configuration.

feat: update EvaluationAgent to support direct parameters and depreca… by @Ayyanaruto in #188

In their first contribution to the pruna repository, @Ayyanaruto improved the interface of the EvaluationAgent so that users can now directly specify metrics and parameters through the agent's constructor!

Pruning some bugs 🐞 and maintenance 🧑‍🌾

fix: torchao rejection test by @johannaSommer in #132
fix: rewrapping pruna model bug by @begumcig in #174
fix: pin hqq dependency to avoid model re-loading bug by @davidberenstein1957 in #178
chore: update pyproject.toml for optional dependencies and bitsandbytes by @davidberenstein1957 in #175
fix: update poetry setup for external collaborators by @johannaSommer in #191
docs: improve the documentation by @GreatBahram in #163
build: update ctranslate2 requirement from ==4.5.0 to ==4.6.0 by @dependabot in #199
docs: fix note on algorithm argument checking by @johannaSommer in #158
build: update pytest requirement from 7.4.4 to 8.4.0 by @dependabot in #201
fix: 168 bug device placement does not work with torchmetric by @davidberenstein1957 in #169
fix: move import check after availability checks by @johannaSommer in #203
fix: failing docs tests by @davidberenstein1957 in #139
docs: change contributors list by @johnrachwan123 in #204
fix: fix fp8dqrow setting by @nifleisch in #156
refactor: update function signatures to accept both str and Path types by @Ayyanaruto in #187
fix: update gptqmodel installation by @johnrachwan123 in #215
docs: update Flux tutorial as a more general image generation tutorial by @davidberenstein1957 in #127
docs: update LLM tutorial to optimize and evaluate large language models by @davidberenstein1957 in #126
fix: code blocks in docs and the code block test by @begumcig in #218
fix: device of FID torch metric by @johannaSommer in #223

No longer supported 👋

refactor: remove torch_static quantizer 👋 by @johannaSommer in #140

New Contributors

@dependabot made their first contribution in #193
@Ayyanaruto made their first contribution in #188

Full Changelog: v0.2.5...v0.2.6

Contributors

GreatBahram, begumcig, and 6 other contributors

Assets 2

28 May 13:44

johannaSommer

v0.2.5

df91e11

v0.2.5

The juiciest bits 🚀

refactor: metric attributes by @begumcig in #81

We improved the evaluation framework for a cleaner UX: every metric is now atomic (returns a single float) with clear attributes (name, units, higher_is_better) and a unified MetricResult wrapper. All metrics consistently accept single or pairwise calls, name alignment is enforced, and the EvaluationAgent shares inference across metrics to avoid redundant computation.

refactor: remove deprecated awq algorithm by @johnrachwan123 in #143 | feat: add llm_compressor quantizer by @johnrachwan123 in #144

To continue support of the popular AWQ quantizer, we moved from auto-awq to llm_compressor.

Fixing some bugs 🐞

fix: enhance model saving functionality to load multiple JSON configs present in diffusers by @davidberenstein1957 in #98

Full Changelog: v0.2.4...v0.2.5

Contributors

begumcig, davidberenstein1957, and johnrachwan123

Assets 2

19 May 14:13

begumcig

v0.2.4

40e6ae8

v0.2.4

The juiciest bits 🚀

feat: automate loading arguments by @johannaSommer in #77
feat: max_batch_size refactoring by @johannaSommer in #67
feat: add fastercache and pab by @nifleisch in #92
refactor: move argument compatibility checks by @johannaSommer in #102
docs: general structure refactor by @davidberenstein1957 in #90
feat: add option for compilation for module lists by @johnrachwan123 in #105
feat: add fora cacher by @nifleisch in #106
test: use tiny random models to speed up tests by @nifleisch in #109
refactor: remove deprecated algorithm names by @johannaSommer in #104
feat: add device validation utility for improved device management by @davidberenstein1957 in #103
feat: qkv fusing by @llcnt in #75
feat: add torchao quantizer by @nifleisch in #110

Fixing some bugs 🐞

fix: lower-bound ConfigSpace version by @johannaSommer in #101
fix: fastercache and pab compatibility by @begumcig in #112
fix: memory cleanup bug by @johannaSommer in #131

Full Changelog: v0.2.3...v0.2.4

Contributors

begumcig, davidberenstein1957, and 4 other contributors

Assets 2

02 May 12:15

johannaSommer

v0.2.3

6b2d997

Release v0.2.3

The juiciest bits 🚀

feat: Add Hugging Face integration to save and load models by @davidberenstein1957 in #44
feat: compile forward pass llm by @llcnt @johnrachwan123 in #51
feat: support portable torch compilation by @johannaSommer in #69
feat: add comfy support sfast by @agNikitaras in #66

Pruning some bugs 🐞

fix: pin huggingface datasets version by @johannaSommer in #79
fix: broken links docs by @begumcig in #74
fix: GPUMemory metric memory bug by @johannaSommer in #78
fix: collate_fn compatibility check logging by @johannaSommer in #76

New Contributors

@agNikitaras made their first contribution in #66

Full Changelog: v0.2.2...v0.2.3

Contributors

begumcig, davidberenstein1957, and 4 other contributors

Assets 2

Releases: PrunaAI/pruna

Release v0.3.0

🚀 Pruna 0.3.0 — Structural Refactor and Interface Upgrade

Why the Refactor

New Interface

Algorithm Ordering and Compatibility

Uh oh!

v0.2.11

The juiciest bits 🧃

@R055A leveled up our experimentation game ✨

@kirdmiv brought the vibes to our evaluation module 🎨

Type checking, but make it sleek 🪶:

@DevManpreet5 was both adding features and fixing our bugs 🧰 :

TinyIMDB dataset plus new logging system with adjustable levels by @pranayyb 🎬 :

@Almonok made sure our repo stays squeaky clean 🛡️:

DINO has joined our evaluation zoo! Thanks to @Prashankavi @begumcig 🦖:

We have more tutorials, making getting started with Pruna even easier, thanks to @ParagEkbote 🇪🇿:

Target modules are extending!:

We have more datasets for experimentation and benchmarking:

EvaluationAgent now supports multi-GPU inference and multi-GPU latency metrics.

More evaluation metric features:

PrunaModel and SmashConfig improvements:

Pruning some bugs 🐞 and maintanence 👩‍🌾:

🌱 New faces in the garden

Contributors

Uh oh!

v0.2.10

The juiciest bits 🚀

feat: add unconstrained hyperparameter by @gsprochette in #263

feat: new quantizer for vllm by @llcnt in #239

feat: add pre-smash-hook for model preparation by @simlang in #309

Our documentation got a huge glow up 💅 thanks to @sdiazlor and @davidberenstein1957:

Pruning some bugs 🐞 and maintenance 🧑‍🌾

Contributors

Uh oh!

Release v0.2.9

The juiciest bits 🚀

feat: add flash_attention 3 kernel for diffusers pipelines by @johannaSommer in #287

feat: enhance model checks for transformers pipelines by @davidberenstein1957 in #281

replace os.path with pathlib.Path by @GreatBahram in #260

Pruning some bugs 🐞 and maintenance 🧑‍🌾

Contributors

Uh oh!

Release v0.2.8

The juiciest bits 🚀

feat: add arniqa by @begumcig in #183 | feat: clipiqa metric by @begumcig in #259 | feat: add sharpness metric by @begumcig in #261

feat: run test cases in parallel by @GreatBahram in #246

Pruning some bugs 🐞 and maintenance 🧑‍🌾

Contributors

Uh oh!

Release v0.2.7

The juiciest bits 🚀

feat: add janus support for quantization+torch.compile combo(s) by @llcnt in #145

feat: modular pruning by @begumcig in #154

feat: extend accelerate compatibility by @johannaSommer in #234

feat: enhance model saving functionality with pro support by @davidberenstein1957 in #200

feat: simplify the setup (tests + dev = dev) by @GreatBahram in #210

build: reduce core dependencies in pyproject.toml by @ParagEkbote in #227

Pruning some bugs 🐞 and maintenance 🧑‍🌾

New Contributors

Contributors

Uh oh!

Release v0.2.6

The juiciest bits 🚀

feat: accelerate support by @johannaSommer in #128

feat: switch pruna from poetry to uv by @johnrachwan123 in #164

feat: streamline import failure handling by @johannaSommer in #152

feat: add dependabot by @GreatBahram in #166

feat: improve overall device placement handling by @davidberenstein1957 in #148

feat: improve the pre-commit configuration by @GreatBahram in #160

feat: update EvaluationAgent to support direct parameters and depreca… by @Ayyanaruto in #188

Pruning some bugs 🐞 and maintenance 🧑‍🌾

No longer supported 👋

New Contributors

Contributors

Uh oh!

v0.2.5

The juiciest bits 🚀

refactor: metric attributes by @begumcig in #81

refactor: remove deprecated awq algorithm by @johnrachwan123 in #143 | feat: add llm_compressor quantizer by @johnrachwan123 in #144

feat: add `flash_attention 3` kernel for diffusers pipelines by @johannaSommer in #287

feat: extend `accelerate` compatibility by @johannaSommer in #234

build: reduce core dependencies in `pyproject.toml` by @ParagEkbote in #227

feat: `accelerate` support by @johannaSommer in #128