feat: vbench integration #434

begumcig · 2025-11-10T17:30:39Z

Description

Combines #404 #395 and #400

Related Issue

Fixes #(issue number)

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

cursor

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2025-11-10T17:33:15Z

src/pruna/evaluation/task.py

+            raise ValueError("The task should have a single modality across all quality metrics.")
+        else:  # More than one modality, fine for evaluation, can't save artifacts (for now).
+            return "general"
+


Bug: Misplaced Method Breaks Class Functionality

The validate_and_get_task_modality method is incorrectly nested inside the _safe_build_metrics function instead of being a method of the Task class. This causes an AttributeError when self.validate_and_get_task_modality() is called in the Task.__init__ method, since the method doesn't exist on the Task instance.

cursor · 2025-11-10T17:33:16Z

src/pruna/evaluation/metrics/metric_vbench_background_consistency.py

+    default_call_type: str = "y"  # We just need the outputs
+    higher_is_better: bool = True
+    runs_on: List[str] = ["cuda", "cpu"]
+    modality: List[str] = ["video"]


Bug: Modality Type Conflict Prevents Validation

The modality attribute is declared as List[str] but the base class StatefulMetric expects set[str]. This type mismatch causes issues when validate_and_get_task_modality performs set operations like set.intersection on metric modalities, resulting in a TypeError since lists don't support set intersection.

cursor · 2025-11-10T17:33:16Z

src/pruna/evaluation/metrics/metric_vbench_dynamic_degree.py

+    # https://github.com/Vchitect/VBench/blob/dc62783c0fb4fd333249c0b669027fe102696682/evaluate.py#L111
+    # explicitly sets the device to cuda. We respect this here.
+    runs_on: List[str] = ["cuda"]
+    modality: List[str] = ["video"]


Bug: Modality Type Mismatch Halts Set Operations

The modality attribute is declared as List[str] but the base class StatefulMetric expects set[str]. This type mismatch causes issues when validate_and_get_task_modality performs set operations like set.intersection on metric modalities, resulting in a TypeError since lists don't support set intersection.

cursor · 2025-11-10T17:33:16Z

src/pruna/evaluation/evaluation_agent.py

+                    canonical_paths = []
+                    # We have to save the artifacts for each sample in the batch.
+                    for processed_output in processed_outputs:
+                        canonical_path = self.artifact_saver.save_artifact(processed_output)


Bug: Artifact saving ignores custom configurations.

The save_artifact call doesn't pass self.saving_kwargs even though the agent stores it and the method signature expects it. This means user-provided saving parameters like fps won't be applied when saving video artifacts, causing videos to be saved with default settings instead of the requested configuration.

cursor · 2025-11-10T17:33:16Z

src/pruna/evaluation/evaluation_agent.py

+        seed_strategy: Literal["per_sample", "no_seed"] = "no_seed",
+        global_seed: int | None = None,
+        artifact_saver_export_format: str | None = None,
+        saving_kwargs: dict = dict(),


Bug: Mutable Default Arguments: Shared State Surprise

Using dict() as a default parameter creates a mutable default argument that's shared across all function calls. If the dictionary is modified, subsequent calls without providing saving_kwargs will see the modified version instead of an empty dict, leading to unexpected behavior.

github-actions · 2025-11-21T00:07:41Z

This PR has been inactive for 10 days and is now marked as stale.

…ameters

… in the metric implementations.

… handler

* feat: remove algorithm groups from algorithms folder * feat: simply new algorithm registration to smash space * refactor: add new smash config interface * refactor: remove unused tokenizer name function * refactor: adjust order implementation * feat: add new graph-based path finding for algorithm execution order * tests: add first version of pre-smash-routines tests * tests: narrow down pre-smash routine tests * refactor: rename PRUNA_ALGORITHMS * refactor: enhance algorithm tags * refactor: remove `incompatible` specification * feat: add `smash_config` utility * style: initial fix all linting complaints * tests: adjust test structure to new refactoring * style: address PR comments * fix: conditionally register algorithms * fix: adjust smash config access in algorithms * fix: support older smash configs * fix: handle target module exception * fix: deprecated save/load imports * tests: update to fit recent interface changes * fix: add `global_utils` exception to algorithm registry * fix: extending compatible methods * fix: deprecate old hyperparameter interface properly * tests: add symmetry checks for algorithm order * style: address PR comments * feat: add utility to register custom algorithm * fix: insufficient docstring descriptions * fix: test references to HQQ * style: fix remaining linting errors * style: fix typing error w.r.t. compatibility setter * style: import sorting * fix: return type of registry function * fix: model context docstring * fix: some final bugs * fix: duplicate pyproject.toml key * fix: address cursorbot slander * style: move inline comments * fix: unify registry logic * feat: additional check in algorithm order overwrite * fix: documentation wording * fix: device function patching in tests

github-actions · 2025-12-22T00:08:54Z

This PR has been inactive for 10 days and is now marked as stale.

cursor bot reviewed Nov 10, 2025

View reviewed changes

github-actions bot added stale and removed stale labels Nov 21, 2025

begumcig force-pushed the feat/vbench-integration branch from c602d6a to b581c31 Compare November 27, 2025 16:09

begumcig and others added 25 commits December 11, 2025 13:45

feat: 2 vbench dimensions and vbench dependencies

e90ac49

test: vbench metric tests

f408e6a

docs: add more comprehensive docstring explanations for important par…

ac35d6e

…ameters

feat: add additional helper tools to utilities

dfc7b8f

refactor: small updates to utilities and docstrings

4e2d49b

refactor: add support for more calltypes in video eval utils

51da9e7

refactor: make utilities more vbench independent and fix small things…

e550edd

… in the metric implementations.

refactor: address PR comments

3cd54ea

test: adding more tests for dynamic degree and background consistency

41a7e26

feat: artifact saving and vbench related agent updates

7a65fb6

test: add tests for the artifact savers

4c24c9a

test: add artifact related evaluation tests and task modality tests

5731102

refactor: add some comments

6d65f3d

refactor: better initialization for artifact savers

46fe2c8

test: add more dtype tests for artifact saver

ca5e58a

feat: metric modalities as sets

31ce1b3

refactor: comments tests task modality

3f84abf

feat: add video inference support and seeding strategies to inference…

31efb1c

… handler

feat: remove per evaluation seed and add tests

081b095

chore: add comments

64dcf40

fix: bfloats cannot be moved to cpu error in cmmd metric

c2a1a8d

fix: pre commit file fix

b1ab3cc

refactor: configure seeding and tests

f15a6b2

feat:stratification to vbench datasets

b152371

begumcig added 2 commits December 11, 2025 13:49

feat: data stratification by indexing

df79ae8

chore: fix ruff errors

c31f6bd

begumcig force-pushed the feat/vbench-integration branch from 5143f67 to fbdd925 Compare December 11, 2025 13:50

refactor: more sampling features, fixed docstrings

69079b2

begumcig force-pushed the feat/vbench-integration branch from fbdd925 to 69079b2 Compare December 11, 2025 16:48

github-actions bot added the stale label Dec 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: vbench integration #434

feat: vbench integration #434

Uh oh!

begumcig commented Nov 10, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Nov 10, 2025

Uh oh!

cursor bot Nov 10, 2025

Uh oh!

cursor bot Nov 10, 2025

Uh oh!

cursor bot Nov 10, 2025

Uh oh!

cursor bot Nov 10, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: vbench integration #434

Are you sure you want to change the base?

feat: vbench integration #434

Uh oh!

Conversation

begumcig commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Nov 10, 2025

Choose a reason for hiding this comment

Bug: Misplaced Method Breaks Class Functionality

Uh oh!

cursor bot Nov 10, 2025

Choose a reason for hiding this comment

Bug: Modality Type Conflict Prevents Validation

Uh oh!

cursor bot Nov 10, 2025

Choose a reason for hiding this comment

Bug: Modality Type Mismatch Halts Set Operations

Uh oh!

cursor bot Nov 10, 2025

Choose a reason for hiding this comment

Bug: Artifact saving ignores custom configurations.

Uh oh!

cursor bot Nov 10, 2025

Choose a reason for hiding this comment

Bug: Mutable Default Arguments: Shared State Surprise

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

begumcig commented Nov 10, 2025 •

edited

Loading