fix(sequencers/single): deterministic queue #2938

alpe · 2025-12-16T14:18:11Z

Overview

State should be same after restart.

Note to reviewer:
This PR contains is a breaking change the the queue store. I am not sure if we need a migration.

github-actions · 2025-12-16T14:18:25Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Dec 19, 2025, 8:31 PM

gemini-code-assist · 2025-12-16T14:18:36Z

Summary of Changes

Hello @alpe, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where the BatchQueue could lose its deterministic state after a restart. By implementing a new mechanism for generating and managing sequential keys for persisted batches, the queue can now reliably restore its exact order and content from storage, ensuring consistent behavior and preventing potential data reordering or loss in crash recovery scenarios.

Highlights

Deterministic Persistence: Batches are now persisted using deterministically generated sequential keys instead of content hashes, ensuring the queue state is consistent across restarts.
Queued Item Struct: A new queuedItem struct has been introduced to store both the batch and its unique persistence key, allowing the queue to manage and retrieve items based on their ordered keys.
Sequence Number Management: The BatchQueue now tracks nextAddSeq and nextPrependSeq to generate unique, ordered keys for new batches. These sequence counters are correctly re-initialized upon loading from persistent storage to prevent key collisions.
Ordered Loading: The Load operation has been enhanced to query the datastore with OrderByKey, ensuring batches are loaded in their correct order. It also reconstructs the in-memory queue and sequence counters based on the persisted, ordered keys.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2025-12-16T14:18:45Z

Claude finished @alpe's task in 4m 6s —— View job

Code Review Complete

I've completed a comprehensive review of the deterministic queue implementation. This PR makes an important architectural improvement by switching from content-hash-based keys to sequence-number-based keys, ensuring deterministic queue restoration after restarts.

Summary

Overall Assessment: APPROVED with recommendations

The implementation is well-designed and addresses a critical issue with queue state persistence. The code is production-ready with good test coverage. My main recommendations focus on improving observability (structured logging), error handling resilience, and addressing a few edge cases.

1. Code Quality & Architecture ✅

Strengths:

Clean sequence number design: Using initialSeqNum = 0x8000000000000000 (middle of uint64 range) is elegant, allowing both append (increment) and prepend (decrement) operations without collision risk
Deterministic key generation: seqToKey() using big-endian encoding ensures lexicographical sort matches numeric order - crucial for correct queue ordering
Migration strategy: Automatic detection and migration of legacy hash-based keys is well-implemented
Memory efficiency: The compaction logic in Next() (sequencers/single/queue.go:146-154) prevents memory leaks by releasing processed items

Concerns:

Missing structured logging (echoes gemini-code-assist's feedback)
- Lines 160, 189, 199, 228, 234, 239 use fmt.Printf for error logging
- Impact: In production, these errors go to stdout with no severity levels, making monitoring difficult
- Recommendation: Inject a structured logger (e.g., log/slog, zap) into BatchQueue for proper observability
- Priority: HIGH - critical for production debugging
Sequence counter overflow handling
- While uint64 provides massive headroom (~18 quintillion operations in each direction), there's no overflow detection
- Scenario: If nextPrependSeq underflows or nextAddSeq overflows, keys would wrap around and potentially collide
- Recommendation: Add overflow checks before incrementing/decrementing sequences:
```
if bq.nextAddSeq == math.MaxUint64 {
    return fmt.Errorf("sequence number overflow")
}
```
- Priority: LOW - extremely unlikely in practice, but good defensive programming

2. Error Handling & Resilience ⚠️

Strengths:

Error wrapping is consistent (line 182: fmt.Errorf("error querying datastore: %w", err))
Load() gracefully handles malformed entries by logging and continuing (lines 188-191, 198-201)

Concerns:

Silent error handling in critical paths (sequencers/single/queue.go:158-161)
```
if err := bq.db.Delete(ctx, ds.NewKey(item.Key)); err != nil {
    // Log the error but continue
    fmt.Printf("Error deleting processed batch: %v\n", err)
}
```
- Issue: If deletion fails, the batch remains in persistent storage and will be reloaded on restart, causing duplicate processing
- Impact: Queue state diverges from disk state, potentially leading to transaction reprocessing after crash
- Recommendation: Either:
  - Return the error to caller (breaking change, more correct)
  - Track failed deletions and retry on next Load()
  - Add metrics/alerts for deletion failures
- Priority: HIGH - affects data integrity
Migration error handling (sequencers/single/queue.go:233-240)
```
if err := bq.persistBatch(ctx, item.Batch, newKeyName); err != nil {
    fmt.Printf("Failed to migrate legacy item %s: %v\n", item.Key, err)
    continue  // Skips migration but doesn't track failure
}
```
- Issue: Failed migrations are silently dropped - the legacy item stays in DB but isn't loaded into queue
- Impact: Transaction loss on migration failure
- Recommendation:
  - Return error if any migration fails OR
  - Track partial migration state and retry on next startup
- Priority: MEDIUM - migration is one-time but critical
Context cancellation not checked in loops
- The Load() loop (lines 187-224) doesn't check ctx.Done(), potentially blocking shutdown
- Recommendation: Add select { case <-ctx.Done(): return ctx.Err(); default: } in the loop
- Priority: LOW - Load() is typically fast

3. Security 🔒

Overall: Good, no major concerns

✅ No credential handling or sensitive data exposure
✅ Input validation present (key length check on line 208: len(keyName) == 16)
✅ No SQL injection risk (protobuf-based storage)
✅ No obvious DoS vectors (maxQueueSize limits memory usage)

Minor observation:

The migration code (lines 228-244) processes all legacy items without batching, which could cause memory pressure if thousands of legacy items exist
Recommendation: Consider processing migrations in batches if you expect large legacy datasets
Priority: LOW - unlikely scenario

4. Performance & Resource Efficiency ⚡

Strengths:

Efficient compaction heuristic (line 146): Only compacts when head > len(queue)/2 && head > 100, avoiding frequent reallocations
Big-endian encoding allows direct lexicographical ordering without conversion
Prepend optimization (lines 116-122): Reuses head space when available instead of always prepending to slice

Concerns:

OrderByKey query cost (line 178)
```
q := query.Query{
    Orders: []query.Order{query.OrderByKey{}},
}
```
- Depending on datastore backend, this could require full key sort on every Load()
- Recommendation: Document expected datastore performance characteristics
- Priority: LOW - likely acceptable for restart scenarios
Memory zeroing overhead (lines 149-151)
```
for i := remaining; i < len(bq.queue); i++ {
    bq.queue[i] = queuedItem{}
}
```
- This zeroing loop is good for GC but adds O(n) overhead
- Alternative: bq.queue = bq.queue[:remaining:remaining] to prevent slice reuse (forces new allocation on append)
- Priority: VERY LOW - current approach is safer

5. Testing & Reliability 🧪

Strengths:

✅ Excellent test coverage with migration-specific tests (queue_migration_test.go)
✅ TestBatchQueue_Load_SetsSequencesProperly (line 309) validates critical sequence counter restoration
✅ Concurrency tests present (TestConcurrency, TestBatchQueue_QueueLimit_Concurrency)
✅ Edge cases covered: empty queue, multiple prepends, persistence across restarts

Observations:

Codecov shows 83.60% patch coverage with 10 lines missing
- Missing lines likely in error paths (migration failures, delete errors)
- Recommendation: Add tests that inject datastore failures to cover error paths
- Priority: MEDIUM - important for production confidence
Migration test validation (queue_migration_test.go:106-108)
```
// We expect 0 keys in DB because we called Next() 3 times which deletes them from DB as well!
require.Empty(keys, "Expected DB to be empty after processing all items")
```
- This test validates successful migration, but doesn't test failure scenarios
- Recommendation: Add test for partial migration failure (mock datastore errors)
- Priority: MEDIUM
Sequence collision test missing
- No test validates behavior when loading items with sequence numbers near overflow boundaries
- Recommendation: Add test with sequences near 0 and math.MaxUint64
- Priority: LOW - edge case

6. Documentation & Developer Experience 📚

Strengths:

✅ Clear comment on initialSeqNum explaining the design choice (lines 25-28)
✅ Function comments explain behavior (e.g., Prepend bypassing maxQueueSize, lines 100-101)
✅ Migration test comments explain expected behavior

Recommendations:

Document migration process
- Add a comment in Load() explaining the migration logic and what happens to legacy keys
- Mention that migration is automatic and one-time
- Priority: MEDIUM
Document deletion failure implications
- Add comment at line 158 explaining why deletion errors are logged but not returned
- Document the potential for duplicate processing on restart if deletion fails
- Priority: MEDIUM
Breaking change documentation
- The PR description mentions this is a breaking change but doesn't specify impact
- Recommendation: Add a comment or doc explaining:
  - Old queues will auto-migrate on first Load()
  - No manual migration needed
  - Old hash-based keys are deleted after successful migration
- Priority: HIGH - critical for operators

7. Concurrency Safety 🔐

Assessment: Well-protected

✅ All public methods acquire bq.mu lock
✅ Load() resets sequences under lock (lines 168-175)
✅ Test coverage for concurrent operations (TestConcurrency)

No issues found in this area.

Critical Issues Summary

Priority	Issue	Location	Recommendation
HIGH	Structured logging missing	Lines 160, 189, 199, 228, 234, 239	Inject logger into `BatchQueue`
HIGH	Deletion failures cause state drift	Line 158-161	Track failures or return error
HIGH	Breaking change not documented	PR description	Document migration behavior
MEDIUM	Migration failures silently drop data	Lines 233-240	Return error or track failures
MEDIUM	Missing error path test coverage	N/A	Add datastore failure injection tests

Minor Suggestions

Line 186: Consider extracting magic number 16 to a constant:
```
const seqKeyLength = 16  // hex-encoded uint64
```

Line 208: The validation logic is buried in a loop - consider extracting:

func isValidSeqKey(key string) (uint64, bool) {
    if len(key) != seqKeyLength {
        return 0, false
    }
    seq, err := strconv.ParseUint(key, 16, 64)
    return seq, err == nil
}

Prepend comment (line 100): The comment says "NOTE: Prepend intentionally bypasses..." but doesn't explain why. Consider adding rationale (prevents transaction loss when batch size limits are hit).

Migration Consideration (Response to PR Author's Question)

"I am not sure if we need a migration."

Answer: You already have automatic migration ✅

The code in Load() (lines 186-246) implements automatic migration:

Detects legacy keys (non-16-char hex strings)
Re-persists with new sequence-based keys
Deletes old hash-based keys
Updates sequence counters

This is the correct approach. Manual migration scripts would be more complex and error-prone.

However, consider adding:

Migration observability: Emit a log/metric when migration happens with count of migrated items
Migration validation: After migration, verify all legacy keys are gone (add to test)
Rollback documentation: If migration fails and operators need to rollback, document that they'd lose the deterministic queue benefit

Reference: Issue #2795

Per @julienrbrt's comment, this relates to #2795 for tracking legacy code removal before v1. The legacy key detection code (lines 205-223) should be tracked for eventual removal once all deployments have migrated.

Final Recommendation

APPROVE with the following action items before merge:

MUST: Add structured logging (addresses gemini-code-assist feedback + production needs)
SHOULD: Improve deletion error handling in Next() to prevent state drift
SHOULD: Document the breaking change and migration behavior for operators
NICE-TO-HAVE: Add error injection tests to improve coverage to >90%

The core logic is sound and the tests demonstrate correctness. The issues identified are about production robustness and observability.

Great work on solving the deterministic queue restoration problem! The sequence number approach is much more robust than content hashing.

gemini-code-assist

Code Review

This pull request refactors the persistent queue to use deterministic sequence numbers for keys instead of content-based hashes. This is a significant improvement for robustness, ensuring the queue state can be reliably restored after a restart. The changes are well-implemented, including updating the Load function to correctly handle the new key scheme and resume sequencing. A new test case has been added to validate this new logic. My main feedback is to replace fmt.Printf calls with a structured logger for better observability in production.

gemini-code-assist · 2025-12-16T14:20:25Z

sequencers/single/queue.go

 	for result := range results.Next() {
 		if result.Error != nil {
 			fmt.Printf("Error reading entry from datastore: %v\n", result.Error)
 			continue
 		}
+		// We care about the last part of the key (the sequence number)
+		// ds.Key usually has a leading slash.
+		key := ds.NewKey(result.Key).Name()
+
 		pbBatch := &pb.Batch{}
 		err := proto.Unmarshal(result.Value, pbBatch)
 		if err != nil {
-			fmt.Printf("Error decoding batch for key '%s': %v. Skipping entry.\n", result.Key, err)
+			fmt.Printf("Error decoding batch for key '%s': %v. Skipping entry.\n", key, err)
 			continue
 		}
-		bq.queue = append(bq.queue, coresequencer.Batch{Transactions: pbBatch.Txs})
+
+		batch := coresequencer.Batch{Transactions: pbBatch.Txs}
+		bq.queue = append(bq.queue, queuedItem{Batch: batch, Key: key})
+
+		// Update sequences based on loaded keys to avoid collisions
+		if seq, err := strconv.ParseUint(key, 16, 64); err == nil {
+			if seq >= bq.nextAddSeq {
+				bq.nextAddSeq = seq + 1
+			}
+			if seq <= bq.nextPrependSeq {
+				bq.nextPrependSeq = seq - 1
+			}
+		}
 	}


Throughout the Load and Next functions, errors are logged using fmt.Printf. In a library or production service, it's better to use a structured logging library (e.g., log/slog, zap, zerolog). This provides several benefits:

Controllable Output: Consumers of the package can direct logs to files, network services, etc.

Log Levels: You can assign severity levels (e.g., error, warn, info) to messages.

Structured Data: You can include key-value pairs (like the key in this case) for easier parsing and filtering by log analysis tools.

Consider injecting a logger into the BatchQueue to handle these cases.

codecov · 2025-12-16T14:21:51Z

Codecov Report

❌ Patch coverage is 83.60656% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.19%. Comparing base (742b8bb) to head (ff87318).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
sequencers/single/queue.go	83.60%	5 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2938      +/-   ##
==========================================
+ Coverage   59.06%   59.19%   +0.13%     
==========================================
  Files          90       90              
  Lines        8596     8627      +31     
==========================================
+ Hits         5077     5107      +30     
- Misses       2937     2938       +1     
  Partials      582      582

Flag	Coverage Δ
combined	`59.19% <83.60%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sequencers/single/queue.go

* main: fix: make evm_execution more robust (#2942) fix(sequencers/single): deterministic queue (#2938) fix(block): fix init logic sequencer for da epoch fetching (#2926)

Use seq with sequencer queue

c26a567

alpe marked this pull request as ready for review December 16, 2025 14:18

github-actions bot assigned alpe Dec 16, 2025

gemini-code-assist bot reviewed Dec 16, 2025

View reviewed changes

Merge branch 'main' into alex/deterministic_queue

8219b61

julienrbrt changed the title ~~fix: deterministic queue~~ fix(sequencers/single): deterministic queue Dec 16, 2025

Migrate legacy entries on startup

06db707

julienrbrt previously approved these changes Dec 18, 2025

View reviewed changes

sequencers/single/queue.go Show resolved Hide resolved

Merge branch 'main' into alex/deterministic_queue

fcd0fca

julienrbrt dismissed their stale review via fcd0fca December 19, 2025 20:14

julienrbrt previously approved these changes Dec 19, 2025

View reviewed changes

julienrbrt enabled auto-merge December 19, 2025 20:15

docs

ff87318

julienrbrt dismissed their stale review via ff87318 December 19, 2025 20:30

julienrbrt approved these changes Dec 19, 2025

View reviewed changes

julienrbrt disabled auto-merge December 19, 2025 20:31

julienrbrt merged commit d4a2da9 into main Dec 19, 2025
19 of 20 checks passed

julienrbrt deleted the alex/deterministic_queue branch December 19, 2025 20:31

alpe added a commit that referenced this pull request Dec 22, 2025

Merge branch 'main' into alex/2609_hints

993c2b1

* main: fix: make evm_execution more robust (#2942) fix(sequencers/single): deterministic queue (#2938) fix(block): fix init logic sequencer for da epoch fetching (#2926)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sequencers/single): deterministic queue #2938

fix(sequencers/single): deterministic queue #2938

Uh oh!

alpe commented Dec 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 16, 2025

Uh oh!

codecov bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(sequencers/single): deterministic queue #2938

fix(sequencers/single): deterministic queue #2938

Uh oh!

Conversation

alpe commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

1. Code Quality & Architecture ✅

2. Error Handling & Resilience ⚠️

3. Security 🔒

4. Performance & Resource Efficiency ⚡

5. Testing & Reliability 🧪

6. Documentation & Developer Experience 📚

7. Concurrency Safety 🔐

Critical Issues Summary

Minor Suggestions

Migration Consideration (Response to PR Author's Question)

Reference: Issue #2795

Final Recommendation

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alpe commented Dec 16, 2025 •

edited

Loading

github-actions bot commented Dec 16, 2025 •

edited

Loading

claude bot commented Dec 16, 2025 •

edited

Loading

codecov bot commented Dec 16, 2025 •

edited

Loading