distinguish probes across NVMe devices #993

iximeow · 2025-12-19T22:44:50Z

NVMe commands are unique in a device across queue ID (qid in various parts of Propolis) and command ID (cid in various parts of Propolis), but are ambiguous when looking at an entire VM: two different NVMe devices will have a "queue 1", and their distinct "queue 1" may also have "command 1234" submitted at the same time.

Pick a device ID for NVMe controllers, and plumb that through to distinguish queues similar to how we distinguish queues for block devices: a devq_id construction that makes a device and queue unique across the VM.

In the NVMe case, a devq_id is not uniquely identifying across the life of a VM, as the queues can be destroyed, recreated, the controller reset, etc. But absent such an administrative operation, this helps untangle what's happening across a VM with many disks.

Note that block attachment IDs and NVMe IDs are similar in name, and probably similar in practice, but are tracked distinctly. This is similar to the difference between block qeuue IDs and NVMe queue IDs: the architcture is similar, but the mapping between them is more arbitrary and may change as an implementation detail of Propolis.

a few of us (@ahl among others) were looking at the nvme probes as a "top-of-propolis" view of submission->completion time and found, to our surprise, that sometimes we'd get really bogus data. it took a bit of thinking to realize that we were probably seeing duplicate (qid, cid) tuples since we were loading a bunch of disks in the VM. I want to see that not happen any more for good measure, but from first principles the probes don't seem sufficient for analysis with multiple disks at the moment.

nvme-trace.d only looked at cid, not the queue ID, so it'd be even more ambiguous. it probably could use a sprinkling of devq_id/arg0 too.

NVMe commands are unique in a device across queue ID (`qid` in various parts of Propolis) and command ID (`cid` in various parts of Propolis), but are ambiguous when looking at an entire VM: two different NVMe devices will have a "queue 1", and their distinct "queue 1" may also have "command 1234" submitted at the same time. Pick a device ID for NVMe controllers, and plumb that through to distinguish queues similar to how we distinguish queues for block devices: a `devq_id` construction that makes a device and queue unique across the VM. In the NVMe case, a `devq_id` is not uniquely identifying across the life of a VM, as the queues can be destroyed, recreated, the controller reset, etc. But absent such an administrative operation, this helps untangle what's happening across a VM with many disks. Note that block attachment IDs and NVMe IDs are similar in name, and probably similar in practice, but are tracked distinctly. This is similar to the difference between block qeuue IDs and NVMe queue IDs: the architcture is similar, but the mapping between them is more arbitrary and may change as an implementation detail of Propolis.

iximeow · 2025-12-19T22:46:54Z

lib/propolis/src/hw/nvme/mod.rs

-                    sqid,
-                    u16::from(block_qid),
+                    devsq_id,
+                    block_devqid,


bringing block_qid -> block_devqid along here because otherwise we'd know which NVMe device notified a block queue ... somewhere ... with no way to find out which block attachment we'd just poked.

iximeow · 2025-12-19T23:00:43Z

ok, that's what I get for building it and seeing the VM run, but not cargo test .. :)

ahl

good stuff

lib/propolis/src/hw/nvme/mod.rs

ahl · 2025-12-19T22:59:30Z

lib/propolis/src/hw/nvme/mod.rs

+    // must have changed the type of QueueId such this is no longer absurd!
+    #[allow(clippy::absurd_extreme_comparisons)]
+    {
+        static_assertions::const_assert!(QueueId::MAX <= u16::MAX);


In Clippy's defense u16::MAX <= u16::MAX is pretty absurd...

lib/propolis/src/hw/nvme/mod.rs

ahl · 2025-12-19T23:04:00Z

lib/propolis/src/hw/nvme/mod.rs

+    }
+}
+
+/// Combine an NVMe device and queue ID into a single u64 for probes


... to make a globally unique ID?

don't want to go quite that far: the NVMe controller can be reset, which will destroy queues, abandon any in-flight work, and probably have queues with the same ID re-created when the guest sets up the controller again. but I'll adjust this to say a bit more.

lib/propolis/src/hw/nvme/mod.rs

lib/propolis/src/hw/nvme/queue.rs

ahl · 2025-12-19T23:10:14Z

lib/propolis/src/hw/nvme/queue.rs

+    /// The ID of the device this is a completion queue for. Not semantically
+    /// interesting for completion queues, but useful context in probes.


Suggested change

/// The ID of the device this is a completion queue for. Not semantically

/// interesting for completion queues, but useful context in probes.

/// The ID of the device associated with this completion queue. Not semantically

/// interesting for completion queues, but useful context in probes.

Not trying to be grammatically picayune--took me a few reads before I understood the meaning.

much appreciated. as much as I try not to, I invent novel sentence structures sometimes.. played boggle with the words here and I hope it's much more legible now.

iximeow · 2025-12-20T00:21:56Z

lib/propolis/src/hw/nvme/requests.rs

+    fn nvme_read_enqueue(devsq_id: u64, idx: u16, cid: u16, off: u64, sz: u64) {
+    }


apologies to literally everyone for the grating formatting.

iximeow · 2025-12-20T06:07:50Z

"[TEST - EVENT] Error obtaining artifact from source"

ah, well, if office internet is flaky i suppose it will be hard for that computer to get artifacts

iximeow commented Dec 19, 2025

View reviewed changes

iximeow force-pushed the ixi/nvme-probe-device-qid branch from b6f258d to f39a608 Compare December 19, 2025 22:48

AlejandroME added the local storage Relating to the local storage project label Dec 19, 2025

AlejandroME added this to the 18 milestone Dec 19, 2025

ahl approved these changes Dec 19, 2025

View reviewed changes

iximeow added 2 commits December 19, 2025 23:45

tests should work

243fffa

review feedback

6ccc658

iximeow commented Dec 20, 2025

View reviewed changes

one more wording tweak

c47deb6

iximeow force-pushed the ixi/nvme-probe-device-qid branch from 6c5a37f to c47deb6 Compare December 20, 2025 00:27

iximeow merged commit fc2def4 into master Dec 20, 2025
11 checks passed

iximeow deleted the ixi/nvme-probe-device-qid branch December 20, 2025 17:12

		/// The ID of the device this is a completion queue for. Not semantically
		/// interesting for completion queues, but useful context in probes.

		fn nvme_read_enqueue(devsq_id: u64, idx: u16, cid: u16, off: u64, sz: u64) {
		}

distinguish probes across NVMe devices #993

distinguish probes across NVMe devices #993

Uh oh!

Conversation

iximeow commented Dec 19, 2025

Uh oh!

iximeow Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

iximeow commented Dec 19, 2025

Uh oh!

ahl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahl Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahl Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

iximeow Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ahl Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

iximeow Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

iximeow Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

iximeow commented Dec 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iximeow Dec 19, 2025 •

edited

Loading