-
Notifications
You must be signed in to change notification settings - Fork 75
feat: vbench integration #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment @cursor review or bugbot run to trigger another review on this PR
| raise ValueError("The task should have a single modality across all quality metrics.") | ||
| else: # More than one modality, fine for evaluation, can't save artifacts (for now). | ||
| return "general" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Misplaced Method Breaks Class Functionality
The validate_and_get_task_modality method is incorrectly nested inside the _safe_build_metrics function instead of being a method of the Task class. This causes an AttributeError when self.validate_and_get_task_modality() is called in the Task.__init__ method, since the method doesn't exist on the Task instance.
| default_call_type: str = "y" # We just need the outputs | ||
| higher_is_better: bool = True | ||
| runs_on: List[str] = ["cuda", "cpu"] | ||
| modality: List[str] = ["video"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Modality Type Conflict Prevents Validation
The modality attribute is declared as List[str] but the base class StatefulMetric expects set[str]. This type mismatch causes issues when validate_and_get_task_modality performs set operations like set.intersection on metric modalities, resulting in a TypeError since lists don't support set intersection.
| # https://github.com/Vchitect/VBench/blob/dc62783c0fb4fd333249c0b669027fe102696682/evaluate.py#L111 | ||
| # explicitly sets the device to cuda. We respect this here. | ||
| runs_on: List[str] = ["cuda"] | ||
| modality: List[str] = ["video"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Modality Type Mismatch Halts Set Operations
The modality attribute is declared as List[str] but the base class StatefulMetric expects set[str]. This type mismatch causes issues when validate_and_get_task_modality performs set operations like set.intersection on metric modalities, resulting in a TypeError since lists don't support set intersection.
| canonical_paths = [] | ||
| # We have to save the artifacts for each sample in the batch. | ||
| for processed_output in processed_outputs: | ||
| canonical_path = self.artifact_saver.save_artifact(processed_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Artifact saving ignores custom configurations.
The save_artifact call doesn't pass self.saving_kwargs even though the agent stores it and the method signature expects it. This means user-provided saving parameters like fps won't be applied when saving video artifacts, causing videos to be saved with default settings instead of the requested configuration.
| seed_strategy: Literal["per_sample", "no_seed"] = "no_seed", | ||
| global_seed: int | None = None, | ||
| artifact_saver_export_format: str | None = None, | ||
| saving_kwargs: dict = dict(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Mutable Default Arguments: Shared State Surprise
Using dict() as a default parameter creates a mutable default argument that's shared across all function calls. If the dictionary is modified, subsequent calls without providing saving_kwargs will see the modified version instead of an empty dict, leading to unexpected behavior.
|
This PR has been inactive for 10 days and is now marked as stale. |
c602d6a to
b581c31
Compare
… in the metric implementations.
* feat: remove algorithm groups from algorithms folder * feat: simply new algorithm registration to smash space * refactor: add new smash config interface * refactor: remove unused tokenizer name function * refactor: adjust order implementation * feat: add new graph-based path finding for algorithm execution order * tests: add first version of pre-smash-routines tests * tests: narrow down pre-smash routine tests * refactor: rename PRUNA_ALGORITHMS * refactor: enhance algorithm tags * refactor: remove `incompatible` specification * feat: add `smash_config` utility * style: initial fix all linting complaints * tests: adjust test structure to new refactoring * style: address PR comments * fix: conditionally register algorithms * fix: adjust smash config access in algorithms * fix: support older smash configs * fix: handle target module exception * fix: deprecated save/load imports * tests: update to fit recent interface changes * fix: add `global_utils` exception to algorithm registry * fix: extending compatible methods * fix: deprecate old hyperparameter interface properly * tests: add symmetry checks for algorithm order * style: address PR comments * feat: add utility to register custom algorithm * fix: insufficient docstring descriptions * fix: test references to HQQ * style: fix remaining linting errors * style: fix typing error w.r.t. compatibility setter * style: import sorting * fix: return type of registry function * fix: model context docstring * fix: some final bugs * fix: duplicate pyproject.toml key * fix: address cursorbot slander * style: move inline comments * fix: unify registry logic * feat: additional check in algorithm order overwrite * fix: documentation wording * fix: device function patching in tests
5143f67 to
fbdd925
Compare
fbdd925 to
69079b2
Compare
|
This PR has been inactive for 10 days and is now marked as stale. |
Description
Combines #404 #395 and #400
Related Issue
Fixes #(issue number)
Type of Change
How Has This Been Tested?
Checklist
Additional Notes