Add archive verification command #18

Ulthran · 2025-11-18T16:21:33Z

Summary

add a verification command and helper to validate archived files against recorded md5 sums
expose the new verification entry point via pyproject scripts
cover verification success and failure cases with new tests

Testing

pytest

Copilot

Pull Request Overview

This PR adds archive verification functionality to validate that archived fastq files match their recorded MD5 checksums. The implementation includes a new verify_archive helper function, a command-line entry point verify_main, and corresponding tests for both success and failure scenarios.

Key Changes:

Added verify_archive function to validate MD5 checksums of archived files
Exposed verification functionality via new verify_backup CLI command
Added tests covering successful verification and corruption detection

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
seqBackupLib/backup.py	Implements `verify_archive` function for MD5 validation and `verify_main` CLI entry point
test/test_backup.py	Adds tests for successful verification and detection of file corruption
pyproject.toml	Registers new `verify_backup` script entry point

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-18T16:25:18Z

seqBackupLib/backup.py

+        for line in md5_file:
+            expected = line.strip().split("\t")
+            if len(expected) != 2:
+                raise ValueError(f"Invalid md5 line in {md5_fp}: {line}")


The error message could be more helpful. When an invalid MD5 line is encountered, the error includes the entire line in the message, but doesn't indicate which specific field or format issue was detected.

Consider improving the error message to be more specific:

raise ValueError( f"Invalid md5 line in {md5_fp}: expected 2 tab-separated values, " f"got {len(expected)} in line: {line.strip()}" )

Suggested change

raise ValueError(f"Invalid md5 line in {md5_fp}: {line}")

raise ValueError(

f"Invalid md5 line in {md5_fp}: expected 2 tab-separated values, "

f"got {len(expected)} in line: {line.strip()}"

)

Copilot · 2025-11-18T16:25:18Z

seqBackupLib/backup.py



+def verify_archive(archive_dir: Path) -> bool:
+    md5_files = list(archive_dir.glob("*.md5"))
+    if not md5_files:
+        raise FileNotFoundError(f"No md5 file found in {archive_dir}")
+
+    if len(md5_files) > 1:
+        warnings.warn(
+            f"Multiple md5 files found in {archive_dir}. Using {md5_files[0].name}."
+        )
+
+    md5_fp = md5_files[0]
+    missing_files = []
+    mismatched_hashes = []
+
+    with open(md5_fp) as md5_file:
+        for line in md5_file:
+            expected = line.strip().split("\t")
+            if len(expected) != 2:
+                raise ValueError(f"Invalid md5 line in {md5_fp}: {line}")
+            filename, expected_md5 = expected
+            file_fp = archive_dir / filename
+
+            if not file_fp.is_file():
+                missing_files.append(filename)
+                continue
+
+            computed_md5 = return_md5(file_fp)
+            if computed_md5 != expected_md5:


[nitpick] Potential performance issue: The function processes all files before reporting any errors. For large archives with many files, this means the entire verification must complete even if the first file fails.

While collecting all errors provides comprehensive feedback, consider whether early termination on first failure would be more appropriate for this use case. If comprehensive reporting is desired, the current approach is acceptable but could benefit from progress logging for large archives.

Copilot · 2025-11-18T16:25:18Z

seqBackupLib/backup.py

+
+    with open(md5_fp) as md5_file:
+        for line in md5_file:
+            expected = line.strip().split("\t")


The MD5 file parsing doesn't handle empty lines or lines with only whitespace. If the MD5 file contains blank lines, they will cause a ValueError when len(expected) != 2.

Consider adding a check to skip empty lines:

for line in md5_file: line = line.strip() if not line: # Skip empty lines continue expected = line.split("\t") if len(expected) != 2: raise ValueError(f"Invalid md5 line in {md5_fp}: {line}")

Suggested change

expected = line.strip().split("\t")

line = line.strip()

if not line: # Skip empty or whitespace-only lines

continue

expected = line.split("\t")

seqBackupLib/backup.py

Add archive verification command

31bd200

Copilot AI review requested due to automatic review settings November 18, 2025 16:21

Ulthran added the codex label Nov 18, 2025 — with ChatGPT Codex Connector

Copilot started reviewing on behalf of Ulthran November 18, 2025 16:21 View session

Make archived files owner-writable

d20271f

Copilot finished reviewing on behalf of Ulthran November 18, 2025 16:24

Copilot AI reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add archive verification command #18

Add archive verification command #18

Uh oh!

Ulthran commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 18, 2025

Uh oh!

Copilot AI Nov 18, 2025

Uh oh!

Copilot AI Nov 18, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            expected = line.strip().split("\t")
+            line = line.strip()
+            if not line:  # Skip empty or whitespace-only lines
+                continue
+            expected = line.split("\t")

Add archive verification command #18

Are you sure you want to change the base?

Add archive verification command #18

Uh oh!

Conversation

Ulthran commented Nov 18, 2025

Summary

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants