Skip to content

Conversation

@chweelinchoong
Copy link

The existing LLDP operation forcefully enables verify_enabled during MAC Merge (MM) verification whenever preemption support is advertised by the link partner. This behavior overrides the intended configuration and conflicts with IEEE 802.3br Section 99.4.3, which allows verification to be disabled.

This fix ensures that the verify_enabled configuration is preserved and remains a MAC Merge sublayer responsibility, aligning with IEEE 802.3br Tables 30-8 and 30-9.

The existing LLDP operation forcefully enables verify_enabled
during MAC Merge (MM) verification whenever preemption support
is advertised by the link partner. This behavior overrides the
intended configuration and conflicts with IEEE 802.3br Section
99.4.3, which allows verification to be disabled.

This fix ensures that the verify_enabled configuration is
preserved and remains a MAC Merge sublayer responsibility,
aligning with IEEE 802.3br Tables 30-8 and 30-9.

Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com>
@apconole
Copy link
Contributor

apconole commented Jan 8, 2025

Fixes: 5d54c31 ("lldpad: ieee8023: add Additional Ethernet Capabilities TLV logic")

@vladimiroltean Please double check the change.

@vladimiroltean
Copy link
Contributor

Thank you for the patch, @chweelinchoong, and thanks @apconole for notifying me of it.

I'm sorry, I simply do not have access to a copy of the obsolete IEEE 802.3br-2016 standard. I believe that the equivalent clause you are referring to is IEEE 802.3-2022 99.4.3 Verifying preemption operation. Indeed, there it says that verification can be disabled, and it can be disabled if you control the MAC Merge layer manually with ethtool, and not with LLDP. It also says:

Verification may be disabled. Verification disable is intended for engineered closed networks (i.e., where it
is ensured by design that links can support preemption capability).

I don't think LLDP is an engineered closed network, quite the opposite in fact.

The standards, in my reading, were not very clear on the exact LLDP flow with the MAC Merge feature. But I did have a justification for force-enabling the TX verification feature. Let me quote from the commit message of the blamed patch:

    When we support the MAC Merge layer and the link partner supports it
    too (confusingly called "preemption capability support" in IEEE 802.3),
    we go ahead and try to enable preemption in the TX direction. The link
    partner may do that too or may do something else.

    The preemption verification is a feature through which the port, once TX
    preemption was enabled, will generate some special SMD-V frames and wait
    for SMD-R replies from the link partner. When an SMD-R frame comes back,
    the verification state machine comples, and TX preemption is not only
    enabled, but also active.

    Using the verification feature allows us to ensure that, even though we
    enabled TX preemption without explicit agreement from the LP, we have a
    backpressure mechanism which prevents us from sending packets that the
    LP is simply going to drop as invalid.

In other words: my intended usage flow with LLDP was as follows. Any station capable of frame preemption can advertise this capability. BUT at the same time, it can keep its pMAC turned off by default, for power saving reasons. Only when it connects to a link partner also advertising the frame preemption capability, it turns on its pMAC, so that if the link partner chooses to send preemptible traffic and/or verification frames, it can receive them successfully. Since the pMAC, when enabled, is obligated to respond to verification frames (see rcv_v variable and Respond state diagram), then using the verification feature is a reliable way of waiting until the link partner has finished enabling its RX pMAC.

Corollary: if the verification feature were not used, it would mean that implementations would have to keep their pMAC RX always enabled, since there would otherwise exist a race condition leading to packet loss between the moment the link partner computes (based on the LLDP TLV) that frame preemption can be used in its local TX direction (and subsequently starts sending preemptible traffic), and the moment that we compute (based on the LLDP TLV) that the link partner might want to send preemptible traffic, and get prepared for it. With the implementation I chose, packet loss is still possible, but only for verification frames (SMD-V) and thus not data packets. For SMD-V, that is quite ok though, because the standard defines a clear retry mechanism with a generous timeout budget (3 * 128 ms).

I don't see a problem with making use of the verification feature, and to my knowledge it is a required MAC Merge functionality, so these were the constraints which led me to make the design choices I made. But maybe I don't understand your motivation for wanting to operate LLDP with verification turned off?

@chweelinchoong
Copy link
Author

@vladimiroltean,

We understand your concern regarding potential race conditions in non-engineered networks and agree that verification is a critical feature in such scenarios to ensure both devices are preemption-active simultaneously.

However, the IEEE 802.3-2022 standard (99.4.3) explicitly allows verification to be disabled in engineered closed networks, where race conditions are assumed to be mitigated by design. In such cases, LLDP with verification disabled remains valuable for auxiliary purposes, such as exchanging addFragSize, detecting link failures, and maintaining other operational parameters.

We believe making this behavior configurable would better align with the standard and provide LLDP with the flexibility to cater to both open and engineered networks.

Additionally, we are preparing a patch for the kernel ethtool utility (ethtool: Allow FPE Verify Enabled to be set before Tx Enabled) to improve how verification is managed. With this patch, verification can be enabled using ethtool before TX preemption (tx_enabled) when necessary. However, the verification process (handled separately by the NIC driver) would only begin after receiving the appropriate LLDP messages from the link partner, ensuring compatibility and avoiding potential packet loss.

We welcome further discussion to refine this approach and ensure it meets both standard compliance and practical needs.

Regards,
CL

@vladimiroltean
Copy link
Contributor

The way in which you define the boundaries of an engineered network seem arbitrary to me. The whole point of LLDP and verification is that you do not control what the link partner does. If you are instead in control of the link partner, don't you also know the addFragSize? I don't see the purpose of LLDP. IMO it is counter-productive to offer a configuration option of LLDP + verification disabled, because it makes an assumption about the implementation of the LLDP agent on the link partner which just doesn't hold water (that it is ready to accept preemptible traffic as soon as you receive a TLV from it advertising the capability, but it hasn't yet received any TLV from you stating that you also advertise it).

Let's take even the simple case where both ends run the openlldp implementation. Unless you patch openlldp to keep pmac_enabled at start time, your patch is simply buggy because it does not prevent the race condition I was talking about. And then, even if you do patch openlldp, there exist potentially other LLDP implementations, which you have no idea what they do. You just want to introduce a configuration option which pushes the responsibility of knowing what the link partner does to the end user. That is very fragile and I see no real benefit to justify it.

Even if you force pmac_enabled=true regardless of whether the link partner is preemption-capable, you will need an analysis of the impact that this will have upon existing device drivers. Let aside power saving. For current-generation Intel NICs like i226, verification is handled in software and involves adding extra tests in the packet data path for detecting SMD-V or SMD-R packets. These tests cost CPU cycles and will degrade packet throughput, so we are discussing options to turn them off, using static keys, for the normal path where pmac_enabled=false.
https://lore.kernel.org/netdev/20241216064720.931522-7-faizal.abdul.rahim@linux.intel.com/
The implication of your proposal would be that frame preemption for those NICs would incur an avoidable penalty on performance, even when not used.

Additionally, we are preparing a patch for the kernel ethtool utility (ethtool: Allow FPE Verify Enabled to be set before Tx Enabled) to improve how verification is managed. With this patch, verification can be enabled using ethtool before TX preemption (tx_enabled) when necessary. However, the verification process (handled separately by the NIC driver) would only begin after receiving the appropriate LLDP messages from the link partner, ensuring compatibility and avoiding potential packet loss.

What would be the purpose of this change? Why would the kernel allow a configuration where TX verification is enabled, but preemptible TX in general is disabled? What would it even mean, and why would that be useful? Is it connected at all to the patch proposed here, or is it an altogether separate topic?

@chweelinchoong
Copy link
Author

@vladimiroltean,

Our proposed change aims to ensure compliance with the Avnu Alliance Profile Agnostic Conformance Test Plan for Interspersed Express Traffic (IET) IEEE 802.1Qbu/.3br (Revision: 1.2). The test plan includes specific scenarios where LLDP TLV exchange occur while verification is disabled on the DUT. For instance:

  1. Test PACT-IET-1.0-com-1.1 Additional Ethernet Capabilities TLV Exchange:

    • Part A: Exchange of Additional Ethernet Capabilities TLV message using LLDP message exchange, verification disabled in the DUT
  2. Test PACT-IET-1.0-com-1.2 Additional Ethernet Capabilities TLV Exchange:

    • Part D: Verify DUT’s capability to send Respond (SMD-R) frames in response to a valid Verify (SMD-V) frames from the Test Tool. Verification is disabled at the DUT.
    • Part E: Verify that the DUT does not send Respond (SMD-R) frames in response to invalid Verify (SMD-V) frames from the Test Tool. Verification is disabled at the DUT.

However, if you believe that LLDP + disabled verification represents an invalid scenario in real-world deployments, we are open to engaging with Avnu Alliance to express these concerns and seek clarification on the test requirements.

Regards,
CL

@penguin359
Copy link
Contributor

I do have access to the IEEE 802.3br-2016 standard and others, but I'll have to review it to remember how this works. It's been a while since I've dug in.

@faizal-rahim
Copy link

faizal-rahim commented Feb 6, 2025

Corollary: if the verification feature were not used, it would mean that implementations would have to keep their pMAC RX always enabled, since there would otherwise exist a race condition leading to packet loss between the moment the link partner computes (based on the LLDP TLV) that frame preemption can be used in its local TX direction (and subsequently starts sending preemptible traffic), and the moment that we compute (based on the LLDP TLV) that the link partner might want to send preemptible traffic, and get prepared for it. With the implementation I chose, packet loss is still possible, but only for verification frames (SMD-V) and thus not data packets. For SMD-V, that is quite ok though, because the standard defines a clear retry mechanism with a generous timeout budget (3 * 128 ms).

Hi @vladimiroltean ,

Thanks for the detailed explanation.
We’d like to confirm our understanding of the race condition you mentioned.
To clarify, I've created a simple diagram illustrating our interpretation.
Could you help to check it and let us know if it reflects what you meant ?


a. The race condition that would happen without forceful verification
WithoutVerificationForceful drawio (10)


b. Your design that enable forceful verification to mitigate race condition
WithForcefulverification drawio (5)

@vladimiroltean
Copy link
Contributor

Hi Faizal,
Your diagrams are a correct representation of what I am trying to say, yes.
It is important to realize what can and what cannot be transmitted through the Additional Ethernet Capabilities TLV (802.3-2022 table 79-20):

Bit Function Value/meaning openlldp mapping
0 preemption capability support 1 = supported
0 = not supported
LLDP_8023_ADD_ETH_CAPS_PREEMPT_SUPPORT
True if NIC responds without error to
ethtool_mm_get_state()
1 preemption capability status 1 = enabled
0 = not enabled
LLDP_8023_ADD_ETH_CAPS_PREEMPT_STATUS
Maps over tx_enabled
2 preemption capability active 1 = active
0 = not active
LLDP_8023_ADD_ETH_CAPS_PREEMPT_ACTIVE
Maps over tx_active
4:3 additional fragment size A 2-bit integer value indicating, in units
of 64 octets, the minimum number of
octets over 64 octets required in
non-final fragments by the receiver
LLDP_8023_ADD_ETH_CAPS_ADD_FRAG_SIZE
Maps over rx_min_frag_size
15:5 reserved

Bits 1 and 2 on their own are useless/unreliable from a protocol standpoint, because by the time you get an LLDP TLV where the link partner had set them, you don't necessarily have time to react and enable your RX - it may be too late. So no protocol decision is being made based on them.
Bit 0 of the link partner is what permits the delayed activation of the pMAC, but that itself is also unreliable if the link partner is not going to use verification to effectively introduce a barrier and effectively close the feedback loop. There simply is no other mechanism through the LLDP TLVs through which one can signal that our receiver is ready. The link partner has to set its bit 1 in blind, and with no verification, bit 2 also would get set immediately.
You may say "just enable your pMAC whenever you set bit 0, so that it's ready ahead of time", but then, you're basically going to run with the pMAC enabled even if the link partner isn't going to set its bit 0 (and preemption will never be used in either direction), so that's a net loss from an efficiency point of view.
This is just my interpretation of the standard, I'm happy to be pointed towards things I haven't considered.

@vladimiroltean
Copy link
Contributor

Essentially, IMO, the Additional Ethernet Capabilities TLV is just a plug-and-play way to communicate using packets sent over the eMAC that the pMAC also exists and is a viable communication channel. When each side sees that preemption is supported on both ends, it should have the courtesy of enabling its RX direction, and then, each side independently has the option of enabling TX if it wants to. LLDP does not have any mechanism that replaces verification, quite the opposite: verification is what makes a plug and play feature "just work". It's a foot gun to want to disable it, and I just don't see any good reason why a user might want to.

@faizal-rahim
Copy link

faizal-rahim commented Feb 10, 2025

Hi @vladimiroltean, thanks for your quick response! Sorry for the slow reply.
Just wanted to bounce an idea off you.
Instead of forcefully enabling verification, what if we do something like this:

WithoutVerificationForceful drawio (12)

Some benefits I see:
a) Still keeps things plug-and-play.
b) Race condition is still handled, it is in fact more flexible since we can adjust the timer according to future needs.
c) Relying only on existing verification timeout duration might not be ideal—especially for drivers like i225/i226, where enabling pmac_enabled for the first time also triggers an adapter reset, which could take longer than what verification timeout allows.
I remember having trouble getting verification to succeed when launching OpenLLDP on both end stations with i226 at the same time. I haven’t looked into it further, but I highly suspect it’s due to the reset adapter since doing it manually with ethtool worked fine.
e) Compliant with the AVNU test plan that Chwee Lin mentioned.
f) Allows verification to remain disabled, making OpenLLDP more flexible for engineered closed networks where verification is not always needed. This way, it can work for both open and closed networks.
IEEE snippet for reference:
image

Thoughts ?

@vladimiroltean
Copy link
Contributor

I think the positions on both sides have been heard, and I'd like Avnu to make a comment - I know this thread has been brought to their attention.
I'm skeptical of LLDP without verification, it seems an escape hatch for the i226, where verification is possible, it's just that its implementation is very weak and you'd rather have users not enable it, so you found a loop hole where there apparently exists an Avnu test case for it. To me, it still seems like something which twists the intended use, but it will probably end up being the documented norm for i226.
I really don't want to spend too much time trying to find counter arguments to your delay-based alternative proposal, but a very basic difference is in the type of failures that can occur. If the link partner fails to enable its pMAC reception in the time specified by the standard, the verification-based implementation will see that, and the delay-based implementation won't.

@vladimiroltean
Copy link
Contributor

There was an Avnu Silicon Validation and Certification Task Group meeting today, where this topic was briefly touched, although not settled yet.
Some key points:

  • Verification was mainly created to detect intermediate dumb bridges which are invisible to LLDP, but don't understand frame preemption. Yet, verification takes place independently in both transmit directions, which makes it more useful than just that, and it is not invalid to use it beyond this purpose.
  • The ability to use TX verification is required; the ability to disable TX verification is optional.
  • The standard, as currently written, expects implementations to enable reception through their pMAC as soon as they advertise preemption capability support (LLDP TLV bit 0), regardless of whether the link partner has also advertised it or not.
  • It was acknowledged that bit 0 collectively signifies support for the RX and the TX direction.
  • The openlldp implementation enables its pMAC RX only once its link partner also advertises bit 0. This is what requires its link partner to use verification. This, however, does not hamper any device's ability to transmit preemptible traffic asymetrically.
  • The openlldp behavior with the enabling of pMAC RX on an as-needed basis makes practical sense, but is outside what the standard originally intended and requires being brought up with 802.3 people (outside the scope of the Avnu group who is working on tests).
  • If there was a test case that would capture whether the device receives preemptible traffic as soon as it advertises bit 0 and independently of what the link partner had advertised (if anything), openlldp would fail that test. But that test does not exist, and practically speaking this test would not be very useful, because it does not capture a relevant condition in real use. However, its existence would impede something more practically useful.
  • The Avnu test case presented earlier (LLDP with disabled verification) in this thread could be made conditional on the device's ability to support disabling verification. This could help, in the sense that a vendor could choose to not declare it has support for disabling verification, but at the same time, it is counter-productive because that capability exists, and vendors might want to use it in other circumstances - it is just that it is not compatible with the delayed pMAC RX activation in openlldp.
  • Topic to be continued in next weekly work group meeting.

@vladimiroltean
Copy link
Contributor

So I've asked myself a lot of questions while implementing this feature, and I still stand by my reasoning while answering them, but one question I admit I've never asked myself is apparently a very fundamental one.
Is the exchange of LLDP TLVs something based on which the local system can make functionally useful decisions (change its local state), or is it just something for diagnostics that you should put in a dashboard and just look at it?
If it's the former (which I always expected it should be, although the standard does not explicitly say one way or another), then I fully expect the LLDP agent would auto-configure the local system in a way that maximizes the common capabilities with the link partner. Not dissimilar to how, when you plug a cable between two gigabit Ethernet PHYs, they just link at gigabit without any input from you.
If the latter is true, and you're of the opinion that LLDP is functionally useless and you can't extract something useful out of it by design, then what they're saying certainly makes a lot of sense, which is that the LLDP agent shouldn't impose a given configuration of the local device (TX verification enabled, in this case). It should just subscribe (somehow) to changes made by some external processes to the local device configuration, advertise those as they are, and also process and display TLVs from the link partner. That is absolutely not the philosophy behind the Additional Ethernet Capabilities TLV in openlldp.
Where the two philosophies differ is exactly the case where one link partner supports and wants to use preemption and the other doesn't. In the former scenario, the side that supports preemption isn't going to enable it because it reacts to the side that doesn't support it. In the latter scenario, enabling preemption is an administrative decision. In the best case, preemption will be turned off if the administrator doesn't forget about it, but the MAC Merge layer can just as well sit uselessly idle. Don't say "let some external process monitor the TLV exchange and react to the presence/absence of a link partner with preemption support", because we'd just be moving this debate to another Github project's issue tracker.
@apconole, I guess you're in the best position to comment which of these 2 roles is more common and expected of the LLDP protocol in general, for other TLVs?

@vladimiroltean
Copy link
Contributor

After the Avnu meeting today, I believe the topic was clarified abd I would like to explain here as well.
The standard expects that when we advertise the "preemption capable" bit we already can receive preemptible packets. This is because pmac_enabled is a Linux extension based on real hardware out in the wild. But as far as the 802.3 state machine for receive processing is concerned, the express filter and pMAC should always be active (as if pmac_enabled were always true).
There is no debate that having a pmac_enable extension of the standard is useful and makes sense, but having the pMAC initially disabled is exactly what forces the link partner to use verification. It has to be brought up with the standards committee to see whether standardizing pmac_enabled, and managing expectations w.r.t. verification, makes sense, but that is outside current Avnu tests based on current standards.

There was something which had escaped me during last meeting: LLDP is required for all Avnu compliance tests. There is a set of tests with +LLDP +verification, a set of tests with +LLDP -verification, but no tests -LLDP.

Thus, this option which was proposed last time:

The Avnu test case presented earlier (LLDP with disabled verification) in this thread could be made conditional on the device's ability to support disabling verification. This could help, in the sense that a vendor could choose to not declare it has support for disabling verification, but at the same time, it is counter-productive because that capability exists, and vendors might want to use it in other circumstances - it is just that it is not compatible with the delayed pMAC RX activation in openlldp

makes a lot more sense now. So the proposal as it stands today is to make the tests with +LLDP -verification "conditional mandatory", dependent on the general ability to disable verification (with LLDP). So, since openlldp does not support disabling verification, this test will be skipped.

I now do not reject the future possibility to add a patch to openlldp which goes something like this: add a "strict" mode, originally false, which when set to true, sets pmac_enabled=true regardless of the link partner's advertised TLVs, and does allow verification to be either enabled or disabled. This is just as a matter of principle. Things would have to be sorted out with the standards body as well, to see whether pmac_enabled can become part of the standard management objects, and therefore how are we going to "brand" this feature.

In the meantime, I understand that the topic is relevant not because of some special use case requiring verification disabled, but because of the Avnu test which can now be bypassed, so I just want to confirm that everyone is ok with this temporary conclusion.

@faizal-rahim
Copy link

Hi @vladimiroltean, I'm good with the conclusion. Glad to see the impacted test plan is now conditional. Appreciate your time and effort in explaining the practical deployment versus standard offerings.

@apconole
Copy link
Contributor

I think the conclusion of this thread is that the semantic proposed in this patch needs to be re-worked to be a tri-state, like enforcement: 'strict', 'loose', 'disabled' and that the behavior of turning on the verify_enabled macmerge setting will be based on that configuration knob. Did I understand it correctly? Will @chweelinchoong be reworking the proposed patch?

@vladimiroltean
Copy link
Contributor

Aaron, I'm a bit confused by your reference to a tri-state, so here is a more detailed explanation (repeating some things):

  • the Avnu test suite will be reworked to tolerate the openlldp behavior as is
  • to better adhere to the current 802.3 standard, where the Linux pmac_enabled extension does not exist, a "bool strict" (defaulting to false) can probably be communicated between lldptool and lldpad through set_config_setting() / get_config_setting(). When strict=true, lldpad will emulate a system without the Linux pmac_enabled extension, i.e. it will call ethtool_mm_change_pmac_enabled() from somewhere around ieee8023_bld_preempt_status() - somewhere where it is known that is_tlv_txenabled(LLDP_8023_ADD_ETH_CAPS) is true and that ethtool_mm_get_state() indicates MM is supported, but irrespective of what the link partner advertises for frame preemption (which we see in ieee8023_rchange_add_eth_caps()).
  • In this sense, the "bool strict" + the "is_tlv_txenabled(LLDP_8023_ADD_ETH_CAPS)" could perhaps be thought of as creating a logical tri-state (when the TLV TX is enabled and strict=false, you could call that "loose", I suppose). But that doesn't have to explicitly exist as such.
  • When strict=true, lldpad will not force verify_enabled=true in ieee8023_rchange_add_eth_caps(), once the common capabilities are resolved. It will work with whatever verify_enabled setting was configured outside of lldptool - aka manually through ethtool, or driver default value (not necessarily uniform across drivers). The part which alters the ETHTOOL_A_MM_VERIFY_ENABLED netlink attribute will have to be split out of ethtool_mm_change_tx_enabled(), since the rest of it needs to be called even in strict mode.
  • An instance of lldpad with strict=true will also expect the link partner to be strict. Otherwise we haven't gotten rid of the aforementioned race condition.
  • A strict=false implementation will work well in its local TX side with a strict=true implementation.
  • "Strict" doesn't mean "better", it just means paying lip service to the standard. Two "strict=true" implementations aren't guaranteed to work with each other. It is a mode which lets you turn off verification, which you may think you don't need and that the reception of the TLVs is sufficient. But those TLVs could have been forwarded by an in-between LLDP-unaware, preemption-unaware dumb switch. With disabled verification, which the strict implementation permits, each side would end up having tx_active=true, which results in preemptible packets getting sent on the wire, and dropped by that in-between preemption-unaware switch, with no indication that something is wrong (TLVs on both sides show "preemption capability supported, enabled, active").
  • The "strict" information is not communicated through the TLV. It will have to be rigorously documented, and a recommendation to be made to users. My recommendation remains "don't enable it, you gain extra functionality by not doing so, you're not really losing anything practical, and it's protecting you against configurations you probably do not intend to make".
  • I am intentionally not addressing the question about reworking the patch which was specifically intended to @chweelinchoong.

@apconole
Copy link
Contributor

Okay - given the above, I don't know who will work on the pmac emulation patch, but it doesn't matter. i think that means this PR can be closed.

@apconole apconole closed this Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants