-
Notifications
You must be signed in to change notification settings - Fork 28
distinguish probes across NVMe devices #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
NVMe commands are unique in a device across queue ID (`qid` in various parts of Propolis) and command ID (`cid` in various parts of Propolis), but are ambiguous when looking at an entire VM: two different NVMe devices will have a "queue 1", and their distinct "queue 1" may also have "command 1234" submitted at the same time. Pick a device ID for NVMe controllers, and plumb that through to distinguish queues similar to how we distinguish queues for block devices: a `devq_id` construction that makes a device and queue unique across the VM. In the NVMe case, a `devq_id` is not uniquely identifying across the life of a VM, as the queues can be destroyed, recreated, the controller reset, etc. But absent such an administrative operation, this helps untangle what's happening across a VM with many disks. Note that block attachment IDs and NVMe IDs are similar in name, and probably similar in practice, but are tracked distinctly. This is similar to the difference between block qeuue IDs and NVMe queue IDs: the architcture is similar, but the mapping between them is more arbitrary and may change as an implementation detail of Propolis.
| sqid, | ||
| u16::from(block_qid), | ||
| devsq_id, | ||
| block_devqid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bringing block_qid -> block_devqid along here because otherwise we'd know which NVMe device notified a block queue ... somewhere ... with no way to find out which block attachment we'd just poked.
b6f258d to
f39a608
Compare
|
ok, that's what I get for building it and seeing the VM run, but not |
ahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good stuff
| // must have changed the type of QueueId such this is no longer absurd! | ||
| #[allow(clippy::absurd_extreme_comparisons)] | ||
| { | ||
| static_assertions::const_assert!(QueueId::MAX <= u16::MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Clippy's defense u16::MAX <= u16::MAX is pretty absurd...
| } | ||
| } | ||
|
|
||
| /// Combine an NVMe device and queue ID into a single u64 for probes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... to make a globally unique ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't want to go quite that far: the NVMe controller can be reset, which will destroy queues, abandon any in-flight work, and probably have queues with the same ID re-created when the guest sets up the controller again. but I'll adjust this to say a bit more.
lib/propolis/src/hw/nvme/queue.rs
Outdated
| /// The ID of the device this is a completion queue for. Not semantically | ||
| /// interesting for completion queues, but useful context in probes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// The ID of the device this is a completion queue for. Not semantically | |
| /// interesting for completion queues, but useful context in probes. | |
| /// The ID of the device associated with this completion queue. Not semantically | |
| /// interesting for completion queues, but useful context in probes. |
Not trying to be grammatically picayune--took me a few reads before I understood the meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much appreciated. as much as I try not to, I invent novel sentence structures sometimes.. played boggle with the words here and I hope it's much more legible now.
| fn nvme_read_enqueue(devsq_id: u64, idx: u16, cid: u16, off: u64, sz: u64) { | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies to literally everyone for the grating formatting.
6c5a37f to
c47deb6
Compare
ah, well, if office internet is flaky i suppose it will be hard for that computer to get artifacts |
NVMe commands are unique in a device across queue ID (
qidin various parts of Propolis) and command ID (cidin various parts of Propolis), but are ambiguous when looking at an entire VM: two different NVMe devices will have a "queue 1", and their distinct "queue 1" may also have "command 1234" submitted at the same time.Pick a device ID for NVMe controllers, and plumb that through to distinguish queues similar to how we distinguish queues for block devices: a
devq_idconstruction that makes a device and queue unique across the VM.In the NVMe case, a
devq_idis not uniquely identifying across the life of a VM, as the queues can be destroyed, recreated, the controller reset, etc. But absent such an administrative operation, this helps untangle what's happening across a VM with many disks.Note that block attachment IDs and NVMe IDs are similar in name, and probably similar in practice, but are tracked distinctly. This is similar to the difference between block qeuue IDs and NVMe queue IDs: the architcture is similar, but the mapping between them is more arbitrary and may change as an implementation detail of Propolis.
a few of us (@ahl among others) were looking at the nvme probes as a "top-of-propolis" view of submission->completion time and found, to our surprise, that sometimes we'd get really bogus data. it took a bit of thinking to realize that we were probably seeing duplicate
(qid, cid)tuples since we were loading a bunch of disks in the VM. I want to see that not happen any more for good measure, but from first principles the probes don't seem sufficient for analysis with multiple disks at the moment.nvme-trace.donly looked atcid, not the queue ID, so it'd be even more ambiguous. it probably could use a sprinkling ofdevq_id/arg0too.