Skip to content

Commit 9404fc3

Browse files
authored
Merge pull request #1441 from 072020127/KmeshUpdateDev
Proposal: Kmesh-daemon upgrades traffic without disruption
2 parents 927b2e5 + d2fd2b8 commit 9404fc3

File tree

2 files changed

+195
-0
lines changed

2 files changed

+195
-0
lines changed
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
---
2+
title: Kmesh-daemon upgrades traffic without disruption
3+
authors:
4+
- "@072020127"
5+
reviews:
6+
-
7+
approves:
8+
-
9+
10+
create-date: 2025-07-08
11+
12+
---
13+
14+
## Kmesh-daemon upgrades traffic without disruption
15+
16+
### Summary
17+
18+
Add traffic-preserving upgrades to Kmesh-daemon.
19+
20+
### Motivation
21+
22+
Currently, Kmesh supports traffic-preserving restarts but does not support traffic-preserving upgrades. During upgrades, existing eBPF map state may be discarded if the map definitions change, leading to connection drops, policy resets, or performance metric loss.
23+
24+
This proposal improves the upgrade experience by:
25+
26+
- Preserving important state (flows, policies, metrics) across versions
27+
- Allowing safe, autonomous rolling upgrades in Kubernetes environments
28+
- Reducing operational risk and improving reliability in production deployments
29+
30+
#### Goals
31+
32+
The purpose of this proposal is to enable seamless traffic continuity during version upgrades by detecting map changes and migrating data safely.
33+
34+
### Design Details
35+
36+
#### Map Compatibility Detection
37+
38+
1.**Runtime MapSpec Loader**: The comparison logic begins by loading each map’s runtime `MapSpec` which includes `MapType`, `KeySize`, `ValueSize`, `MaxEntries`, `Key` and `Value`.
39+
Runtime map compatibility inspection is done by calling `loadCompileTimeSpecs`, which loads each embedded CollectionSpec generated by bpf2go. This function iterates over the enabled BPF engines (e.g., KernelNative, DualEngine, General) based on config and returns a nested registry keyed first by a logical package name (e.g., KmeshCgroupSock) and then by map name.
40+
41+
```go
42+
func loadCompileTimeSpecs(config *options.BpfConfig) (map[string]map[string]*ebpf.MapSpec, error) {
43+
specs := make(map[string]map[string]*ebpf.MapSpec)
44+
45+
if config.KernelNativeEnabled() {
46+
// KernelNative: cgroup_sock
47+
if coll, err := kernelnative.LoadKmeshCgroupSock(); err != nil {
48+
return nil, fmt.Errorf("load KernelNative KmeshCgroupSock spec: %w", err)
49+
} else {
50+
specs["KmeshCgroupSock"] = coll.Maps
51+
}
52+
... // other KernelNative
53+
} else if config.DualEngineEnabled() {
54+
// DualEngine: cgroup_sock workload
55+
if coll, err := dualengine.LoadKmeshCgroupSockWorkload(); err != nil {
56+
return nil, fmt.Errorf("load DualEngine KmeshCgroupSockWorkload spec: %w", err)
57+
} else {
58+
specs["KmeshCgroupSockWorkload"] = coll.Maps
59+
}
60+
... // other DualEngine
61+
}
62+
63+
// General: tc_mark_encrypt
64+
if coll, err := general.LoadKmeshTcMarkEncrypt(); err != nil {
65+
return nil, fmt.Errorf("load General KmeshTcMarkEncrypt spec: %w", err)
66+
} else {
67+
specs["KmeshTcMarkEncrypt"] = coll.Maps
68+
}
69+
... // other General
70+
71+
return specs, nil
72+
}
73+
```
74+
75+
2.**MapSpec Snapshot**: During Kmesh-daemon startup, each `MapSpec` generated from the compiled BPF object is stored in a user-space registry on normal startup or Update-type startup. Because raw btf.Type objects can’t be directly marshaled, a custom representation is used:
76+
77+
1. MemberInfo: records each struct field’s name, typeName, offset, and bitfieldSize. If the field itself is a struct, it carries a nested StructInfo.
78+
79+
2. StructInfo: represents a whole struct, storing its name and a slice of MemberInfo entries. If the structure of the Key/Value is not a structure (e.g., int), the `Name` will save the structure name and the `Members` will be null.
80+
81+
3. PersistedMapSpec: stores the metadata for each map — name, type, sizes, max entries, flags — along with the `StructInfo` for its key and value.
82+
83+
The structure that ends up being written to disk is the `PersistedSnapshot` which is keyed first by a logical package name (e.g., KmeshCgroupSock) and then by map name.
84+
85+
```go
86+
type MemberInfo struct {
87+
Name string `json:"name"`
88+
TypeName string `json:"typeName"`
89+
Offset uint32 `json:"offset"`
90+
BitfieldSize uint32 `json:"bitfieldsize"` // only have value when the type is bitfield
91+
Nested *StructInfo `json:"nested,omitempty"`
92+
}
93+
94+
type StructInfo struct {
95+
Name string `json:"name"`
96+
Members []MemberInfo `json:"members"`
97+
}
98+
99+
type PersistedMapSpec struct {
100+
Name string `json:"name"`
101+
Type string `json:"type"` // MapType.String()
102+
KeySize uint32 `json:"keySize"`
103+
ValueSize uint32 `json:"valueSize"`
104+
MaxEntries uint32 `json:"maxEntries"`
105+
Flags uint32 `json:"flags"`
106+
KeyInfo StructInfo `json:"keyInfo"` // get from btf.Struct
107+
ValueInfo StructInfo `json:"valueInfo"`
108+
}
109+
110+
type PersistedSnapshot struct {
111+
Maps map[string]map[string]PersistedMapSpec `json:"maps"`
112+
}
113+
```
114+
115+
3.**Persisted MapSpec Loader**: The daemon reads the previously written snapshot file and unmarshals the JSON into the `PersistedSnapshot` structure. This provides the baseline oldMapSpec set used for compatibility checking against newly compiled specs.
116+
117+
4.**Layout Diffing**: A recursive function `diffStructInfoAgainstBTF` is implemented to compare old and new btf.Struct definitions field by field. It detects field additions, removals, type changes, offset shifts, and nested structure changes, and uses a visited map to avoid infinite recursion in recursive types. This function provides a fine-grained structural diff to guide compatibility decisions.
118+
119+
```go
120+
type StructDiff struct {
121+
Removed bool // fields present in A but missing in B
122+
Added bool // fields present in B but missing in A
123+
TypeChanged bool // same-name fields whose type changed
124+
OffsetChanged bool // same-name fields whose offset changed
125+
NestedChanged bool // same-name fields of struct type whose nested layout changed
126+
}
127+
```
128+
129+
#### Map Migration Logic
130+
131+
1.**New Map Creation**: When a layout change is detected, a new map is created based on the latest `MapSpec`, with its path set to the old map path appended with "_tmp", and temporarily pinned to an alternate location. If no change is detected, the existing map is left intact and no further action is taken.
132+
133+
2.**Atomic Pin Swap**: Once data migration completes, the daemon proceeds to unpin the old map. It then closes the old map’s file descriptor, attempts to remove the old map’s pin file, and finally renames the temporary pinned path of the new map to the original map’s pin path.
134+
135+
```go
136+
if err := oldMap.Unpin(); err != nil && !os.IsNotExist(err) {
137+
log.Warnf("failed to unpin old map %s: %v (continuing)", pinPath, err)
138+
}
139+
if err := oldMap.Close(); err != nil {
140+
log.Warnf("failed to close old map FD: %v (continuing)", err)
141+
}
142+
if err := os.Remove(pinPath); err != nil && !os.IsNotExist(err) {
143+
return nil, fmt.Errorf("remove old pin %s failed: %w", pinPath, err)
144+
}
145+
if err := os.Rename(tmpPinPath, pinPath); err != nil {
146+
return nil, fmt.Errorf("rename tmp pin %s to old pin %s failed: %w", tmpPinPath, pinPath, err)
147+
}
148+
```
149+
150+
#### Hot Program Replacement
151+
152+
**Atomic Swap**: Once all maps are migrated, new BPF programs are attached. The upgrade process uses `utils.BpfProgUpdate()` to atomically swap the loaded program with a new one. BpfProgUpdate(progPinPath, cgopt) actually does two steps:
153+
154+
1. LoadPinnedLink: Reopens the existing `bpf_link` from the pinned path before reloading, recovering the same link object in the kernel as the kernel has attached.
155+
156+
2. link.Update(newProgFD): Atomically swaps the BPF program FD on that link to `cgopt.Program`, preserving the existing hook and any accumulated state.
157+
158+
This approach ensures there is no packet loss during the transition. Take `BpfSockOps` for example, if the process is detected as a Restart or Update, the existing pinned link is recovered and updated with the new program:
159+
160+
```go
161+
func (sc *BpfSockOps) Attach() error {
162+
var err error
163+
cgopt := link.CgroupOptions{
164+
Path: sc.Info.Cgroup2Path,
165+
Attach: sc.Info.AttachType,
166+
Program: sc.KmeshSockopsObjects.SockopsProg,
167+
}
168+
// pin bpf_link
169+
progPinPath := filepath.Join(sc.Info.BpfFsPath, constants.Prog_link)
170+
if restart.GetStartType() == restart.Restart || restart.GetStartType() == restart.Update {
171+
if sc.Link, err = utils.BpfProgUpdate(progPinPath, cgopt); err != nil {
172+
return err
173+
}
174+
} else {
175+
sc.Link, err = link.AttachCgroup(cgopt)
176+
if err != nil {
177+
return err
178+
}
179+
if err = sc.Link.Pin(progPinPath); err != nil {
180+
return err
181+
}
182+
}
183+
return nil
184+
}
185+
```
186+
187+
#### Workflow
188+
189+
![Kmesh-daemon upgrades workflow](./pics/kmesh-daemon_upgrades_without_disruption_workflow.png)
190+
191+
#### Testing Plan
192+
193+
1.**Unit Tests**: Validate the functionality of key functions, including `LoadCompileTimeSpecs`, `diffStructInfoAgainstBTF`, `SnapshotSpecsByPkg` and `LoadPersistedSnapshot`.
194+
195+
2.**E2E Tests**: Run Kmesh upgrades with live traffic and verify data continuity, no packet loss, and zero connection resets.
114 KB
Loading

0 commit comments

Comments
 (0)