Debugger: add top-level crate and async wrapper allowing for a "debugger environment". #12183

cfallin · 2025-12-19T02:56:15Z

The debug support in Wasmtime so far is structured around async callbacks that occur at certain kinds of events, like breakpoints. This is a suitable foundation but makes for an awkward implementation of a top-level debugger implementation, which likely has an event loop dealing with user commands (via a UI or a protocol connection) and expects to perform actions such as "run until next breakpoint".

This PR introduces a new crate that wraps a Store in a Debugger. This wrapper embodies an inner async body that can perform whatever actions it likes on the Store that is passed back in. This inner body is spawned as an async task. The debugger wrapper registers its own DebugHandler callback that communicates with the outside world via bidirectional command/response queues.

On the "outside", the Debugger presents an interface suitable for inserting into a debug protocol server or UI: an async method that runs until next event and returns that event, and a method that permits querying or modifying the store whenever the run method is not executing. The latter operates by sending a closure over the queue, because the Store must continue to be owned by the async task that is (still) running and suspended in async callbacks.

Right now, this is exercised only via a few unit tests, but the intent is to next build up the "top half" of the debugger using this abstraction, e.g. by running a gdbstub protocol server (likely as a Wasm component in a "debug-main WIT world" -- RFC needed for this).

Also, when we eventually move debugging over to native use of run_concurrent, this paradigm should remain mostly unchanged at this level of API: there can still be an object that has an async method that runs and yields the next event, and there can still be a method that takes a closure that can operate (within its scope only) on the Store.

A few warts that I could use feedback on:

Cancelation safety is weird. Fibers panic when dropped before execution of their body completes, and this seems to mean that we can't allow a Debugger to drop early (or at least, the tokio::test unit test that owns the runtime that runs the async task to finish before the debugged body completes!). If there is a better way to handle cancelation safety here, I'm all ears.
It's not clear to me if the boxed-closure-and-Any approach to providing access to the Store is the best we can do, but I suspect it is.

… environment". The debug support in Wasmtime so far is structured around async callbacks that occur at certain kinds of events, like breakpoints. This is a suitable foundation but makes for an awkward implementation of a top-level debugger implementation, which likely has an event loop dealing with user commands (via a UI or a protocol connection) and expects to perform actions such as "run until next breakpoint". This PR introduces a new crate that wraps a `Store` in a `Debugger`. This wrapper embodies an inner async body that can perform whatever actions it likes on the `Store` that is passed back in. This inner body is spawned as an async task. The debugger wrapper registers its own `DebugHandler` callback that communicates with the outside world via bidirectional command/response queues. On the "outside", the `Debugger` presents an interface suitable for inserting into a debug protocol server or UI: an async method that runs until next event and returns that event, and a method that permits querying or modifying the store whenever the `run` method is not executing. The latter operates by sending a closure over the queue, because the `Store` must continue to be owned by the async task that is (still) running and suspended in async callbacks. Right now, this is exercised only via a few unit tests, but the intent is to next build up the "top half" of the debugger using this abstraction, e.g. by running a gdbstub protocol server (likely as a Wasm component in a "debug-main WIT world" -- RFC needed for this). Also, when we eventually move debugging over to native use of `run_concurrent`, this paradigm should remain mostly unchanged at this level of API: there can still be an object that has an async method that runs and yields the next event, and there can still be a method that takes a closure that can operate (within its scope only) on the `Store`. A few warts that I could use feedback on: - Cancelation safety is weird. Fibers panic when dropped before execution of their body completes, and this seems to mean that we can't allow a `Debugger` to drop early (or at least, the `tokio::test` unit test that owns the runtime that runs the async task to finish before the debugged body completes!). If there is a better way to handle cancelation safety here, I'm all ears. - It's not clear to me if the boxed-closure-and-`Any` approach to providing access to the `Store` is the best we can do, but I suspect it is.

fitzgen

Have to context switch, but want to get my comments thus far posted.

crates/debugger/src/lib.rs

alexcrichton

this seems to mean that we can't allow a Debugger to drop early

This is definitely a bug in Wasmtime that we need to fix. Can you add a failing test or gist one that I can help poke around?

crates/debugger/Cargo.toml

crates/debugger/src/lib.rs

cfallin · 2025-12-19T23:09:43Z

This is definitely a bug in Wasmtime that we need to fix. Can you add a failing test or gist one that I can help poke around?

Sure thing -- with this commit and

cargo test -p wasmtime-internal-debugger -- basic_debugger

I see

thread 'test::basic_debugger' (1129985) panicked at /home/cfallin/work/wasmtime/crates/wasmtime/src/runtime/fiber.rs:199:70:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

cfallin · 2025-12-19T23:10:21Z

(It's entirely possible that I'm holding some part of the async machinery wrong, either in this PR or in earlier work to add the async debugger hook -- happy to have your opinions on that if so!)

alexcrichton · 2025-12-22T16:34:43Z

That'll get fixed by this:

diff --git a/crates/wasmtime/src/runtime/fiber.rs b/crates/wasmtime/src/runtime/fiber.rs
index 68dce6884f..1ab5120ef3 100644
--- a/crates/wasmtime/src/runtime/fiber.rs
+++ b/crates/wasmtime/src/runtime/fiber.rs
@@ -358,7 +358,7 @@ impl StoreOpaque {
     }

     /// Returns whether `block_on` will succeed or panic.
-    #[cfg(feature = "call-hook")]
+    #[cfg(any(feature = "call-hook", feature = "debug"))]
     pub(crate) fn can_block(&mut self) -> bool {
         self.fiber_async_state_mut().current_future_cx.is_some()
     }
diff --git a/crates/wasmtime/src/runtime/store.rs b/crates/wasmtime/src/runtime/store.rs
index 0ce69fc6c3..ac6929d376 100644
--- a/crates/wasmtime/src/runtime/store.rs
+++ b/crates/wasmtime/src/runtime/store.rs
@@ -2839,6 +2839,9 @@ unsafe impl<T> VMStore for StoreInner<T> {
     #[cfg(feature = "debug")]
     fn block_on_debug_handler(&mut self, event: crate::DebugEvent<'_>) -> anyhow::Result<()> {
         if let Some(handler) = self.debug_handler.take() {
+            if !self.can_block() {
+                bail!("couldn't grab async_cx for debug handler")
+            }
             log::trace!("about to raise debug event {event:?}");
             StoreContextMut(self).with_blocking(|store, cx| {
                 cx.block_on(Pin::from(handler.handle(store, event)).as_mut())

tl;dr; when a store is dropped all fibers are resumed with a trap that says "please die immediately" but there are some circumstances in internal code where during the process of returning an error from everything something async is attempted. The async attempt fails because the blocking context (rightfully) wasn't configured.

This happened before with call hooks for example where we would try to invoke the call hook when the fiber is exiting saying "hey a trap happened". The fix here is to just skip debug events/call hooks/etc during this situation since if the store is being dropped that's no interesting anyway

(if you wouldn't mind this'd be a good thing to add a test for too)

…th fiber teardown.

cfallin · 2025-12-22T20:50:01Z

Awesome, thanks @alexcrichton for that fix! Updated and added a test with an early drop of debugger/store.

alexcrichton

👍 from me (I defer to you/Nick to possibly await his approval too)

cfallin · 2025-12-22T21:21:47Z

Thanks! I'll go ahead and merge and we can always tweak anything we need as we build on top of this.

These Pulley instructions are an optimization: they incorporate moves from other registers to x0/x1/x2/x3 as part of the call itself. This works fine for the default calling convention, but the logic that absorbs these moves and removes them from the list given to the rest of the VCode/ABI machinery embeds a tacit assumption: the argument registers are volatile (caller-saved). In particular, this is because the call{1,2,3,4} instructions are *not* marked as clobbering x0/x1/x2/x3 explicitly. To fix this issue, this PR simply omits the magical folding that happens in the emission code. An alternative would be to try to add the clobbers to the special instructions, but actually we don't want these to be just "clobbers": we want the usual regalloc constraints to take effect and the values to be represented as continuing valid/live in the registers after return, and the proper way to do that is to leave the `args` on the call and let the regalloc operand pass add the fixed-reg operands. Discovered while trying to land bytecodealliance#12183.

cfallin · 2025-12-23T04:05:59Z

CI failure on i686 (i.e., pulley32) is due to a problem with PreserveAll interacting poorly with Pulley's call1/2/3/4 optimization; #12213 fixes (tested locally). Will merge that here once that PR lands.

…BI. (#12213) These Pulley instructions are an optimization: they incorporate moves from other registers to x0/x1/x2/x3 as part of the call itself. This works fine for the default calling convention, but the logic that absorbs these moves and removes them from the list given to the rest of the VCode/ABI machinery embeds a tacit assumption: the argument registers are volatile (caller-saved). In particular, this is because the call{1,2,3,4} instructions are *not* marked as clobbering x0/x1/x2/x3 explicitly. To fix this issue, this PR simply omits the magical folding that happens in the emission code. An alternative would be to try to add the clobbers to the special instructions, but actually we don't want these to be just "clobbers": we want the usual regalloc constraints to take effect and the values to be represented as continuing valid/live in the registers after return, and the proper way to do that is to leave the `args` on the call and let the regalloc operand pass add the fixed-reg operands. Discovered while trying to land #12183.

alexcrichton · 2025-12-23T21:07:07Z

The CI error there is:

Running: `"curl" "--user-agent" "bytecodealliance/wasmtime auto-publish script" "https://crates.io/api/v1/crates/wasmtime-internal-debugger/owners"`

thread 'main' (2599) panicked at scripts/publish.rs:623:17:

crate wasmtime-internal-debugger is not owned by wasmtime-publish, please run:

    cargo owner -a wasmtime-publish wasmtime-internal-debugger

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

notably we verify that all crates are owned by wasmtime-publish and already published to crates.io -- this ensures the auto-publish later succeeds. Can you publish a stub crate and invite the wasmtime-publish owner?

cfallin · 2025-12-23T21:41:53Z

Yep, I've just published the crate and invited wasmtime-publish -- let me know when ready to re-attempt landing.

cfallin · 2025-12-23T22:43:22Z

I'm not sure I understand the failure now -- I added wasmtime-publish and it shows as an owner; I removed myself as an owner; it's still failing. @alexcrichton any ideas?

alexcrichton · 2025-12-23T22:51:57Z

Oh that's wasmtime-publish-the-github-team not wasmtime-publish-the-github user. If you run the comment it printed out it should be resolved

cfallin · 2025-12-23T23:06:09Z

Hmm. Unfortunately I removed myself from the crate after wasmtime-publish accepted the invite. So I'm not able to make any more permissions changes.

Do you have credentials for that user so you can log in and invite the team? Or have we lost the crate name forever?

(FWIW, I want to reiterate my suggestion from earlier that we have a runbook for all of this now that we have such a complex publishing and permissions scheme, because this was all very ambiguous and undocumented to me)

alexcrichton · 2025-12-24T17:13:02Z

Yeah the BA has a 1password with credentials for wasmtime-publish (that's how I accepted the invite in the first place). I added the team. Agreed this can be documented better (like pretty much everything else infrastructure-related), and it's something I hope to handle in January with the move to trusted publishing.

- Do not link to non-existent README in new crate. - Remove a few other attributes that our internal crates don't have (now the set of attributes at the top level is the same as for the new error crate).

cfallin · 2025-12-24T18:48:09Z

Thanks!

cfallin requested a review from alexcrichton December 19, 2025 02:56

cfallin requested review from a team as code owners December 19, 2025 02:56

cfallin added 2 commits December 18, 2025 22:34

Cancel safety!

978f7b7

Add new crate to publish.rs script.

1fcef74

fitzgen reviewed Dec 19, 2025

View reviewed changes

crates/debugger/src/lib.rs Outdated Show resolved Hide resolved

crates/debugger/src/lib.rs Show resolved Hide resolved

alexcrichton reviewed Dec 19, 2025

View reviewed changes

crates/debugger/Cargo.toml Outdated Show resolved Hide resolved

crates/debugger/Cargo.toml Outdated Show resolved Hide resolved

crates/debugger/src/lib.rs Outdated Show resolved Hide resolved

cfallin added 5 commits December 22, 2025 09:19

Review feedback.

fac011a

Review feedback: state diagram.

cce7e85

Merge remote-tracking branch 'upstream/main' into debugger-crate

e7b7e83

Update after merge from main: new DebugEvent for epoch yields.

014c18f

Make Debugger drop-safe by making debug event callbacks compatible wi…

80a5467

…th fiber teardown.

alexcrichton approved these changes Dec 22, 2025

View reviewed changes

cfallin added this pull request to the merge queue Dec 22, 2025

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Dec 22, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 22, 2025

cfallin mentioned this pull request Dec 23, 2025

Cranelift: pulley: don't use call{1,2,3,4} opcodes with PreserveAll ABI. #12213

Merged

Merge remote-tracking branch 'upstream/main' into debugger-crate

364e5ba

cfallin enabled auto-merge December 23, 2025 19:23

cfallin added this pull request to the merge queue Dec 23, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 23, 2025

cfallin added this pull request to the merge queue Dec 23, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 23, 2025

alexcrichton added this pull request to the merge queue Dec 24, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 24, 2025

CI fix on debugger crate manifest.

2b720fa

- Do not link to non-existent README in new crate. - Remove a few other attributes that our internal crates don't have (now the set of attributes at the top level is the same as for the new error crate).

cfallin enabled auto-merge December 24, 2025 18:47

cfallin added this pull request to the merge queue Dec 24, 2025

Merged via the queue into bytecodealliance:main with commit 405ab55 Dec 24, 2025
45 checks passed

cfallin deleted the debugger-crate branch December 24, 2025 19:22

Debugger: add top-level crate and async wrapper allowing for a "debugger environment". #12183

Debugger: add top-level crate and async wrapper allowing for a "debugger environment". #12183

Conversation

cfallin commented Dec 19, 2025

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cfallin commented Dec 19, 2025

Uh oh!

cfallin commented Dec 19, 2025

Uh oh!

alexcrichton commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin commented Dec 22, 2025

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

cfallin commented Dec 22, 2025

Uh oh!

Uh oh!

cfallin commented Dec 23, 2025

Uh oh!

Uh oh!

alexcrichton commented Dec 23, 2025

Uh oh!

cfallin commented Dec 23, 2025

Uh oh!

Uh oh!

cfallin commented Dec 23, 2025

Uh oh!

alexcrichton commented Dec 23, 2025

Uh oh!

cfallin commented Dec 23, 2025

Uh oh!

alexcrichton commented Dec 24, 2025

Uh oh!

Uh oh!

cfallin commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexcrichton commented Dec 22, 2025 •

edited

Loading