-
Notifications
You must be signed in to change notification settings - Fork 246
fix: make evm_execution more robust #2942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Claude finished @tac0turtle's task in 2m 30s —— View job Code Review: EVM Execution Robustness ImprovementsI've completed a comprehensive review of this PR. The changes introduce critical improvements for preventing fork explosions and enabling crash recovery. Overall, the implementation is solid with good error handling patterns, but there are several important issues to address. Critical Issues1. Missing Concurrency Protection for ExecuteTxs 🔴Location: The PR description mentions "Added Problem: Without serialization, concurrent calls to
Evidence:
Recommendation: Add an c.executeMu.Lock()
defer c.executeMu.Unlock()This is essential to prevent the fork explosion scenario described in the PR. 2. Transaction Validation Gap in Idempotency Check 🟡Location: Problem: The idempotency check at line 686 only verifies Attack Scenario:
Current Code: // Line 686-691: Only checks timestamp, not transaction content
if existingTimestamp == uint64(timestamp.Unix()) {
c.logger.Info()...Msg("ExecuteTxs: reusing existing block at height (EL idempotency)")
// Returns existing stateRoot WITHOUT verifying txs match
return existingStateRoot.Bytes(), nil, true, nil
}Recommendation: Option B (Defensive): If EL block exists at height with matching timestamp, retrieve the actual block transactions from EL and compare. If mismatch, log critical warning and reject. The High Priority Issues3. Unused Public APIs Without Clear Purpose 🟡Locations:
Issue: As @alpe noted, these public methods are not called anywhere in the codebase. The PR description mentions they're for "DA inclusion loop" integration, but:
Recommendation:
4. Safe Block Lag Semantics Need Clarification 🟡Location: Issue: The constants
Recommendation:
Code Quality & Architecture5. Store Integration Pattern ✅ (Resolved)The discussion between @julienrbrt and others about store dependencies was resolved by:
This is a good pattern that keeps execution/evm modular while still using ev-node's datastore abstraction. 6. Error Handling: Good Retry Logic ✅The exponential backoff retry pattern (
Minor suggestion: Consider making retry constants configurable for testing. 7. Logging: Generally Good ✅Structured logging with appropriate levels. Good use of contextual fields. Minor nit: Some debug logs at line 451-453, 472-475 could be elevated to Security Considerations8. JWT Secret Handling ✅JWT secret management in
9. Transaction Filtering ✅
10. State Root Validation ✅Line 316-318: Validates that prevStateRoot matches execution layer state. Good defensive check. Testing11. Test Coverage: Adequate but Missing Concurrency Tests 🟡
Missing:
Recommendation: Add tests for: // Concurrent execution attempts
func TestConcurrentExecuteTxs(t *testing.T) { ... }
// Idempotency with tx mismatch
func TestIdempotencyDifferentTxs(t *testing.T) { ... }
// Crash recovery
func TestResumePayloadAfterCrash(t *testing.T) { ... }Documentation12. Code Comments: Excellent ✅Function-level documentation is thorough and explains:
The comment style aligns with Go conventions. Performance13. Block Hash Cache Management ✅
14. Potential Optimization: Store Writes 🟡
Suggestion: Consider batching these writes or making intermediate stages optional. The critical saves are "started" (for crash recovery) and "promoted" (for idempotency). Summary of Required ChangesMust Fix Before Merge:
Should Address: Nice to Have: Overall AssessmentThis PR makes important progress on preventing fork explosions and enabling crash recovery. The core idempotency design with The code quality is generally high with good error handling, logging, and documentation. Once the critical issues are fixed and tests are added, this will be a solid improvement to execution layer robustness. Recommendation: Request changes for issues #1 and #2 before merging. |
Summary of ChangesHello @tac0turtle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of EVM execution by addressing potential "fork explosions" and improving crash recovery mechanisms. It introduces a multi-layered approach including serialization of block builds, idempotent execution checks, and the ability to resume interrupted payload creations. Furthermore, it refines the handling of safe and finalized block states, ensuring more stable and predictable chain progression. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
9e47304 to
f2a5390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant changes to enhance the robustness and idempotency of block execution, particularly for EVM-based chains. Key changes include adding an ExecMeta struct and associated storage (pkg/store) to track the execution state of blocks at different stages (started, promoted), enabling crash recovery and preventing the creation of sibling blocks during retries. The Executor now saves and retrieves this metadata, and a new payloadResumer interface allows EVM execution clients to resume in-progress payload builds. The EngineClient in execution/evm has been updated to serialize ExecuteTxs calls, implement idempotency checks by reusing existing blocks, and manage forkchoice state more granularly with SafeBlockLag and new setHead, SetSafe, SetFinalized, and ResumePayload methods. Mock generation configurations (.mockery.yaml) and protobuf definitions (proto/evnode/v1/state.proto) were updated to support these new structures and interfaces, and the buf.gen.yaml configuration was simplified by removing the clean: true directive.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2942 +/- ##
==========================================
+ Coverage 59.19% 59.27% +0.08%
==========================================
Files 90 90
Lines 8627 8627
==========================================
+ Hits 5107 5114 +7
+ Misses 2938 2931 -7
Partials 582 582
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // This provides a buffer for reorg protection - safe blocks won't be reorged | ||
| // under normal operation. A value of 2 means when head is at block N, | ||
| // safe is at block N-2. | ||
| SafeBlockLag = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SafeBlockLag = 2 | |
| //TODO: make this work with da inclusion loop or syncer when using da. | |
| SafeBlockLag = 2 |
julienrbrt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing the core logic. My comments are still about moving most of it outside of ev-node.
| err = c.engineClient.CallContext(ctx, &payloadResult, "engine_getPayloadV4", *payloadID) | ||
| if err != nil { | ||
| return nil, 0, fmt.Errorf("get payload failed: %w", err) | ||
| // Save ExecMeta with payloadID for crash recovery (Stage="started") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove this nil check and assume store is always set. it is in the constructor.
| return c.doForkchoiceUpdate(ctx, args, "setHead") | ||
| } | ||
|
|
||
| func (c *EngineClient) setFinal(ctx context.Context, blockHash common.Hash, isFinal bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit weird to have two private setFinal and setFinalWithHeight. Can we merge them?
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
It would be great to get rid of the store nil checks in the code
| // Uses cached block hashes when available to avoid RPC calls. Falls back to RPC on cache miss | ||
| // (e.g., during restart before cache is warmed). | ||
| // Returns nil if the height doesn't exist yet (block not produced). | ||
| func (c *EngineClient) SetSafeByHeight(ctx context.Context, height uint64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and SetSafe are public methods but I could not find where they are called from. Do you need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to a be a follow up in order to build it against the da inclusion loop.
| // SetFinalized explicitly sets the finalized block hash. | ||
| // This allows the derivation layer to advance finalization independently. | ||
| // Finalized indicates a block that will never be reorged (e.g., included in DA with sufficient confirmations). | ||
| func (c *EngineClient) SetFinalized(ctx context.Context, blockHash common.Hash) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and other public methods. I could not find where they are called.
execution/evm/execution.go
Outdated
| // - err: error during checks | ||
| func (c *EngineClient) checkIdempotency(ctx context.Context, height uint64, timestamp time.Time, txs [][]byte) (stateRoot []byte, payloadID *engine.PayloadID, found bool, err error) { | ||
| // 1. Check ExecMeta (if store is configured) | ||
| if c.store != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TXs are unused.
The function checks if a block exists at a certain height with a specific timestamp. However, it does not verify if the transactions (txs) match what was previously executed or what exists on the Execution Layer (EL).
Risk: If the sequencer sends a different set of transactions for the same height/timestamp (e.g., due to a bug or malicious actor), this function will return the stateRoot of the previous execution, leading to state divergence between nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I
f there is an already executed block we shouldnt override it as this would be considered a fork.
2a1d79a to
bf5c9fa
Compare
bf5c9fa to
ecec4e7
Compare
alpe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates and additional informations.
Change look good. Please fix the test before merging
Overview
Summary
Prevents fork explosions where multiple EL blocks are created for the same app block height by adding three layers of protection:
setFinal(isFinal=false)no longer incorrectly advances safe block; safe now lags head bySafeBlockLag(2 blocks)Changes
executeMu sync.Mutexto serializeExecuteTxsandResumePayloadcallsExecuteTxsusing height + timestamp matchingcurrentSafeBlockHash/currentFinalizedBlockHashinstead ofprevBlockHashSafeBlockLagconstant (2 blocks) for automatic safe block advancementSetSafe(),SetSafeByHeight(),SetFinalized(),ResumePayload()