Skip to content

Releases: PrunaAI/pruna

Release v0.3.0

10 Nov 09:50
02f9779

Choose a tag to compare

🚀 Pruna 0.3.0 — Structural Refactor and Interface Upgrade

Today, the pruna package is getting a long-awaited upgrade!
To support a growing variety of algorithms in the long term, we have refactored the internal structure that defines how algorithms are organized and applied.

Why the Refactor

In previous versions, certain algorithm groups — such as cachers or quantizers — were tightly coupled to the package’s structure. This rigid grouping made it difficult to introduce new types of algorithms, or to combine them in flexible ways.

Starting with Pruna 0.3.0, we’ve reworked this system so that such classifications are no longer hard constraints. Instead, they now serve as supplementary metadata, enabling a more modular, composable, and future-proof design. This refactor lays the groundwork for integrating new optimization techniques and custom pipelines without structural limitations.

New Interface

This release also introduces a more flexible configuration interface. You can now define your SmashConfig either as a simple list of algorithm names:

from pruna import SmashConfig

config = SmashConfig(["torch_compile", "deepcache"])

or as a dictionary with detailed per-algorithm parameters:

from pruna import SmashConfig

config = SmashConfig({
      "hqq":
          {
              "weight_bits": 4,
              "compute_dtype": "torch.bfloat16"
          },
      "torch_compile":
          {
          "fullgraph": True,
          "mode": "max-autotune"
          }
})

Algorithm Ordering and Compatibility

Another major change is how algorithm application order is determined.
Previously, the execution sequence was dictated by the hierarchy of algorithm classes and a global ordering. In 0.3.0, this has been replaced by a more atomic and declarative system: each algorithm now specifies its own compatibility rules and ordering constraints.

This makes the algorithm pipeline more self-organizing, robust to new extensions, and capable of resolving valid combinations dynamically.

Full Changelog: v0.2.11...v0.3.0

v0.2.11

04 Nov 16:48

Choose a tag to compare

october_prune

The juiciest bits 🧃

October was all about Hacktoberfest, and wow, our community really showed up 🥺
From shiny new features to better docs and stronger evals, Pruna got a whole lot more powerful (and prettier) this month.
Let’s dive into what everyone’s been cooking 🔮


@R055A leveled up our experimentation game ✨


@kirdmiv brought the vibes to our evaluation module 🎨


Type checking, but make it sleek 🪶:


@DevManpreet5 was both adding features and fixing our bugs 🧰 :


TinyIMDB dataset plus new logging system with adjustable levels by @pranayyb 🎬 :

  • feat: Add TinyIMDB dataset for lightweight experiments by @pranayyb in #374
  • feat: add set_logging_level functionality by @pranayyb in #398

@Almonok made sure our repo stays squeaky clean 🛡️:


DINO has joined our evaluation zoo! Thanks to @Prashankavi @begumcig 🦖:


We have more tutorials, making getting started with Pruna even easier, thanks to @ParagEkbote 🇪🇿:

  • Add an End-To-End Tutorial for Efficient-Large-Model/Sana_600M_512px_diffusers by @ParagEkbote in #322
  • Create Compatibility Matrix for Algorithm in Docs by @ParagEkbote in #403

Our very own Pruners were also busy making amazing contributions this month!: 🟣


Target modules are extending!:


We have more datasets for experimentation and benchmarking:


EvaluationAgent now supports multi-GPU inference and multi-GPU latency metrics.


More evaluation metric features:

  • feat: expose metric information in MetricResult by @sdiazlor in #326

PrunaModel and SmashConfig improvements:

  • feat: allow loading models without a smash config by @sharpenb in #340
  • feat: add a get device type function to utils by @simlang in #416

Pruning some bugs 🐞 and maintanence 👩‍🌾:

🌱 New faces in the garden

Full Changelog: v0.2.10...v0.2.11

v0.2.10

17 Sep 09:57
5881538

Choose a tag to compare

The juiciest bits 🚀

feat: add unconstrained hyperparameter by @gsprochette in #263

Introduces target modules, so you can pass custom configs while still keeping dependencies intact.

feat: new quantizer for vllm by @llcnt in #239

Adds new config options (patch_for_inference, default_to_hf) so vLLM models play nicer with quantization workflows.

feat: add pre-smash-hook for model preparation by @simlang in #309

Adds a hook so algorithms can prep or tweak models before smashing, making customization easier.

Our documentation got a huge glow up 💅 thanks to @sdiazlor and @davidberenstein1957:

Pruning some bugs 🐞 and maintenance 🧑‍🌾

  • torch.load always to cpu first by @simlang in #308
  • Rework Model Context by @simlang in #323
  • fix: make qkv compatible with torch.compile in next diffusers release by @llcnt in #302
  • fix: hqq diffusers saving and loading forget non linear layers by @llcnt in #275
  • fix: namespace package conflict of optimum and optimum-quanto by @ParagEkbote in #298
  • fix: deprecated call types & fixture bug by @begumcig in #313
  • fix: nightly tests llmcompressor and gptq by @llcnt in #315
  • fix: update datamodules for datasets v4.0.0 by @begumcig in #328
  • fix: update model card tags to include 'pruna-ai' by default by @davidberenstein1957 in #334

Full Changelog: v0.2.9...v0.2.10

Release v0.2.9

13 Aug 14:35
d37c2d3

Choose a tag to compare

The juiciest bits 🚀

feat: add flash_attention 3 kernel for diffusers pipelines by @johannaSommer in #287

We've added flash attention 3 to our new algorithm group "kernels". With the help of huggingface's kernel hub and pruna, you can now use flash attention 3 for any diffusers pipeline. Speed ups will vary based on the pipeline you are smashing, but we recommend it specifically for video generation pipelines like Wan!

feat: enhance model checks for transformers pipelines by @davidberenstein1957 in #281

We extended multiple algorithms to not support directly smashing a transformers pipeline without extracting the underlying model, simply give it to smash() and we will do the rest.

replace os.path with pathlib.Path by @GreatBahram in #260

@GreatBahram helped us finally with to pathlib and the code is looking cleaner than ever! 💅🏻

Pruning some bugs 🐞 and maintenance 🧑‍🌾

Full Changelog: v0.2.8...v0.2.9

Release v0.2.8

29 Jul 14:08
d96bdf9

Choose a tag to compare

The juiciest bits 🚀

feat: add arniqa by @begumcig in #183 | feat: clipiqa metric by @begumcig in #259 | feat: add sharpness metric by @begumcig in #261

This pruna release was all about metrics - @begumcig and @davidberenstein1957 integrated several new image generation metrics into pruna, which you can now use together with the EvaluationAgent to compare your smashed models.

feat: run test cases in parallel by @GreatBahram in #246

It wouldn't be a pruna release without @GreatBahram making the lives of contributors easier. Our Github Actions test cases now run in parallel, allowing us to merge & ship new algorithms even faster!

Pruning some bugs 🐞 and maintenance 🧑‍🌾

Full Changelog: v0.2.7...v0.2.8

Release v0.2.7

14 Jul 09:46
87eae01

Choose a tag to compare

The juiciest bits 🚀

feat: add janus support for quantization+torch.compile combo(s) by @llcnt in #145

You can now decrease the memory impact and the latency of the autoregressive Image Generation model janus(pro-7b) model by quantizing and compiling it.

feat: modular pruning by @begumcig in #154

Pruning is now agnostic to submodules! This recent update enables safe, module-level pruning by finding each target module’s interior, boundary, and exterior, pruning only the interior while auto-patching surrounding shapes.

feat: extend accelerate compatibility by @johannaSommer in #234

We are rolling out more support for base models distributed with accelerate. Several cachers, factorizers and more quantizers are now compatible!

feat: enhance model saving functionality with pro support by @davidberenstein1957 in #200

Saving your smashed models to Huggingface Hub just became easier - we added additional support to better distinguish models that were smashed with pruna and pruna_pro

feat: simplify the setup (tests + dev = dev) by @GreatBahram in #210

@GreatBahram was at it again this release, simplifying the installation and setup for contributors by removing dependency groups - a simple uv sync --extra dev does the job and you can start contributing!

build: reduce core dependencies in pyproject.toml by @ParagEkbote in #227

@ParagEkbote made pruna more lightweight and reduced the package dependencies by 20%!

Pruning some bugs 🐞 and maintenance 🧑‍🌾

New Contributors

Full Changelog: v0.2.6...v0.2.7

Release v0.2.6

30 Jun 18:20
034d474

Choose a tag to compare

The juiciest bits 🚀

feat: accelerate support by @johannaSommer in #128

Pruna now supports smashing base models that are distributed across several GPUs with accelerate! Enjoy quantizing your big models from two GPUs to just one. We will roll out support for more algorithms as well as compatibility with the EvaluationAgent in the following releases.

feat: switch pruna from poetry to uv by @johnrachwan123 in #164

UV needs no introduction and you can now finally install pruna in lightning speed!

feat: streamline import failure handling by @johannaSommer in #152

We have streamlined the handling of algorithm-specific packages - we verify their correct installation before smashing and now guide the user better through the installation steps if a package is missing.

feat: add dependabot by @GreatBahram in #166

To make sure our dependencies are always up to date and support the newest versions, @GreatBahram introduced a dependabot to the pruna repository! 🤖

feat: improve overall device placement handling by @davidberenstein1957 in #148

To further improve user-experience, we now assist with choosing the best device available for smashing your models.

feat: improve the pre-commit configuration by @GreatBahram in #160

@GreatBahram improved the experience of contributers by overhauling our pre-commit configuration.

feat: update EvaluationAgent to support direct parameters and depreca… by @Ayyanaruto in #188

In their first contribution to the pruna repository, @Ayyanaruto improved the interface of the EvaluationAgent so that users can now directly specify metrics and parameters through the agent's constructor!

Pruning some bugs 🐞 and maintenance 🧑‍🌾

No longer supported 👋

New Contributors

Full Changelog: v0.2.5...v0.2.6

v0.2.5

28 May 13:44
df91e11

Choose a tag to compare

The juiciest bits 🚀

refactor: metric attributes by @begumcig in #81

We improved the evaluation framework for a cleaner UX: every metric is now atomic (returns a single float) with clear attributes (name, units, higher_is_better) and a unified MetricResult wrapper. All metrics consistently accept single or pairwise calls, name alignment is enforced, and the EvaluationAgent shares inference across metrics to avoid redundant computation.

refactor: remove deprecated awq algorithm by @johnrachwan123 in #143 | feat: add llm_compressor quantizer by @johnrachwan123 in #144

To continue support of the popular AWQ quantizer, we moved from auto-awq to llm_compressor.

Fixing some bugs 🐞

  • fix: enhance model saving functionality to load multiple JSON configs present in diffusers by @davidberenstein1957 in #98

Full Changelog: v0.2.4...v0.2.5

v0.2.4

19 May 14:13
40e6ae8

Choose a tag to compare

The juiciest bits 🚀

Fixing some bugs 🐞

Full Changelog: v0.2.3...v0.2.4

Release v0.2.3

02 May 12:15
6b2d997

Choose a tag to compare

The juiciest bits 🚀

Pruning some bugs 🐞

New Contributors

Full Changelog: v0.2.2...v0.2.3