Skip to content

Conversation

@begumcig
Copy link
Member

@begumcig begumcig commented Nov 10, 2025

Description

Combines #404 #395 and #400

Related Issue

Fixes #(issue number)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment @cursor review or bugbot run to trigger another review on this PR

raise ValueError("The task should have a single modality across all quality metrics.")
else: # More than one modality, fine for evaluation, can't save artifacts (for now).
return "general"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Misplaced Method Breaks Class Functionality

The validate_and_get_task_modality method is incorrectly nested inside the _safe_build_metrics function instead of being a method of the Task class. This causes an AttributeError when self.validate_and_get_task_modality() is called in the Task.__init__ method, since the method doesn't exist on the Task instance.

Fix in Cursor Fix in Web

default_call_type: str = "y" # We just need the outputs
higher_is_better: bool = True
runs_on: List[str] = ["cuda", "cpu"]
modality: List[str] = ["video"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Modality Type Conflict Prevents Validation

The modality attribute is declared as List[str] but the base class StatefulMetric expects set[str]. This type mismatch causes issues when validate_and_get_task_modality performs set operations like set.intersection on metric modalities, resulting in a TypeError since lists don't support set intersection.

Fix in Cursor Fix in Web

# https://github.com/Vchitect/VBench/blob/dc62783c0fb4fd333249c0b669027fe102696682/evaluate.py#L111
# explicitly sets the device to cuda. We respect this here.
runs_on: List[str] = ["cuda"]
modality: List[str] = ["video"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Modality Type Mismatch Halts Set Operations

The modality attribute is declared as List[str] but the base class StatefulMetric expects set[str]. This type mismatch causes issues when validate_and_get_task_modality performs set operations like set.intersection on metric modalities, resulting in a TypeError since lists don't support set intersection.

Fix in Cursor Fix in Web

canonical_paths = []
# We have to save the artifacts for each sample in the batch.
for processed_output in processed_outputs:
canonical_path = self.artifact_saver.save_artifact(processed_output)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Artifact saving ignores custom configurations.

The save_artifact call doesn't pass self.saving_kwargs even though the agent stores it and the method signature expects it. This means user-provided saving parameters like fps won't be applied when saving video artifacts, causing videos to be saved with default settings instead of the requested configuration.

Fix in Cursor Fix in Web

seed_strategy: Literal["per_sample", "no_seed"] = "no_seed",
global_seed: int | None = None,
artifact_saver_export_format: str | None = None,
saving_kwargs: dict = dict(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Mutable Default Arguments: Shared State Surprise

Using dict() as a default parameter creates a mutable default argument that's shared across all function calls. If the dictionary is modified, subsequent calls without providing saving_kwargs will see the modified version instead of an empty dict, leading to unexpected behavior.

Fix in Cursor Fix in Web

@github-actions
Copy link

This PR has been inactive for 10 days and is now marked as stale.

@github-actions github-actions bot added stale and removed stale labels Nov 21, 2025
@begumcig begumcig force-pushed the feat/vbench-integration branch from c602d6a to b581c31 Compare November 27, 2025 16:09
begumcig and others added 25 commits December 11, 2025 13:45
* feat: remove algorithm groups from algorithms folder

* feat: simply new algorithm registration to smash space

* refactor: add new smash config interface

* refactor: remove unused tokenizer name function

* refactor: adjust order implementation

* feat: add new graph-based path finding for algorithm execution order

* tests: add first version of pre-smash-routines tests

* tests: narrow down pre-smash routine tests

* refactor: rename PRUNA_ALGORITHMS

* refactor: enhance algorithm tags

* refactor: remove `incompatible` specification

* feat: add `smash_config` utility

* style: initial fix all linting complaints

* tests: adjust test structure to new refactoring

* style: address PR comments

* fix: conditionally register algorithms

* fix: adjust smash config access in algorithms

* fix: support older smash configs

* fix: handle target module exception

* fix: deprecated save/load imports

* tests: update to fit recent interface changes

* fix: add `global_utils` exception to algorithm registry

* fix: extending compatible methods

* fix: deprecate old hyperparameter interface properly

* tests: add symmetry checks for algorithm order

* style: address PR comments

* feat: add utility to register custom algorithm

* fix: insufficient docstring descriptions

* fix: test references to HQQ

* style: fix remaining linting errors

* style: fix typing error w.r.t. compatibility setter

* style: import sorting

* fix: return type of registry function

* fix: model context docstring

* fix: some final bugs

* fix: duplicate pyproject.toml key

* fix: address cursorbot slander

* style: move inline comments

* fix: unify registry logic

* feat: additional check in algorithm order overwrite

* fix: documentation wording

* fix: device function patching in tests
@begumcig begumcig force-pushed the feat/vbench-integration branch from 5143f67 to fbdd925 Compare December 11, 2025 13:50
@begumcig begumcig force-pushed the feat/vbench-integration branch from fbdd925 to 69079b2 Compare December 11, 2025 16:48
@github-actions
Copy link

This PR has been inactive for 10 days and is now marked as stale.

@github-actions github-actions bot added the stale label Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants