Skip to content

Commit 3aef771

Browse files
committed
Add README to xpumanager sidecar and reference to main README
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
1 parent 3922aa1 commit 3aef771

File tree

2 files changed

+79
-0
lines changed

2 files changed

+79
-0
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Table of Contents
2323
* [DLB device plugin](#dlb-device-plugin)
2424
* [IAA device plugin](#iaa-device-plugin)
2525
* [Device Plugins Operator](#device-plugins-operator)
26+
* [XeLink XPU-Manager sidecar](#xelink-xpu-manager-sidecar)
2627
* [Demos](#demos)
2728
* [Workload Authors](#workload-authors)
2829
* [Developers](#developers)
@@ -203,6 +204,12 @@ The [Device plugins operator README](cmd/operator/README.md) gives the installat
203204

204205
The [Device plugins Operator for OCP](cmd/operator/ocp_quickstart_guide/README.md) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736).
205206

207+
## XeLink XPU-Manager Sidecar
208+
209+
To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.
210+
211+
The [XeLink XPU-Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.
212+
206213
## Demos
207214

208215
The [demo subdirectory](demo/readme.md) contains a number of demonstrations for

cmd/xpumanager_sidecar/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# XeLink sidecar for Intel XPU Manager
2+
3+
Table of Contents
4+
5+
* [Introduction](#introduction)
6+
* [Modes and Configuration Options](#modes-and-configuration-options)
7+
* [Installation](#installation)
8+
* [Install XPU-Manager with the Sidecar](#install-xpu-manager-with-the-sidecar)
9+
* [Install Sidecar to an Existing XPU-Manager](#install-sidecar-to-an-existing-xpu-manager)
10+
* [Verify Sidecar Functionality](#verify-sidecar-functionality)
11+
12+
## Introduction
13+
14+
Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by [Intel XPU Manager](https://www.github.com/intel/xpumanager) via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under ```/etc/kubernetes/node-feature-discovery/features.d/``` as a feature label file. [NFD](https://github.com/kubernetes-sigs/node-feature-discovery) reads this file and converts it to Kubernetes node labels. These labels are then used by [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) to make [scheduling decisions](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#multi-gpu-allocation-with-xe-link-connections) for Pods.
15+
16+
## Modes and Configuration Options
17+
18+
| Flag | Argument | Default | Meaning |
19+
|:---- |:-------- |:------- |:------- |
20+
| -lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted |
21+
| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) |
22+
| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
23+
| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links |
24+
25+
The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.
26+
27+
## Installation
28+
29+
The following sections detail how to obtain, deploy and test the XPU-Manager XeLink sidecar.
30+
31+
### Pre-built Images
32+
33+
[Pre-built images](https://hub.docker.com/r/intel/intel-xpumanager-sidecar)
34+
of this component are available on the Docker hub. These images are automatically built and uploaded
35+
to the hub from the latest main branch of this repository.
36+
37+
Release tagged images of the components are also available on the Docker hub, tagged with their
38+
release version numbers in the format `x.y.z`, corresponding to the branches and releases in this
39+
repository.
40+
41+
Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images.
42+
43+
See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin.
44+
45+
#### Install XPU-Manager with the Sidecar
46+
47+
Install XPU-Manager daemonset with the XeLink sidecar
48+
49+
```bash
50+
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>'
51+
```
52+
53+
Please see XPU-Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).
54+
55+
#### Install Sidecar to an Existing XPU-Manager
56+
57+
Use patch to add sidecar into the XPU-Manager daemonset.
58+
59+
```bash
60+
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>'
61+
```
62+
63+
NOTE: The sidecar patch will remove other resources from the XPU-Manager container. If your XPU-Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
64+
65+
#### Verify Sidecar Functionality
66+
67+
You can verify the sidecar's functionality by checking node's xe-links labels:
68+
69+
```bash
70+
$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}"
71+
master,0.0-1.0_0.1-1.1
72+
```

0 commit comments

Comments
 (0)