|
| 1 | +# XeLink sidecar for Intel XPU Manager |
| 2 | + |
| 3 | +Table of Contents |
| 4 | + |
| 5 | +* [Introduction](#introduction) |
| 6 | +* [Modes and Configuration Options](#modes-and-configuration-options) |
| 7 | +* [Installation](#installation) |
| 8 | + * [Install XPU-Manager with the Sidecar](#install-xpu-manager-with-the-sidecar) |
| 9 | + * [Install Sidecar to an Existing XPU-Manager](#install-sidecar-to-an-existing-xpu-manager) |
| 10 | +* [Verify Sidecar Functionality](#verify-sidecar-functionality) |
| 11 | + |
| 12 | +## Introduction |
| 13 | + |
| 14 | +Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by [Intel XPU Manager](https://www.github.com/intel/xpumanager) via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under ```/etc/kubernetes/node-feature-discovery/features.d/``` as a feature label file. [NFD](https://github.com/kubernetes-sigs/node-feature-discovery) reads this file and converts it to Kubernetes node labels. These labels are then used by [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) to make [scheduling decisions](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#multi-gpu-allocation-with-xe-link-connections) for Pods. |
| 15 | + |
| 16 | +## Modes and Configuration Options |
| 17 | + |
| 18 | +| Flag | Argument | Default | Meaning | |
| 19 | +|:---- |:-------- |:------- |:------- | |
| 20 | +| -lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted | |
| 21 | +| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) | |
| 22 | +| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) | |
| 23 | +| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links | |
| 24 | + |
| 25 | +The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options. |
| 26 | + |
| 27 | +## Installation |
| 28 | + |
| 29 | +The following sections detail how to obtain, deploy and test the XPU-Manager XeLink sidecar. |
| 30 | + |
| 31 | +### Pre-built Images |
| 32 | + |
| 33 | +[Pre-built images](https://hub.docker.com/r/intel/intel-xpumanager-sidecar) |
| 34 | +of this component are available on the Docker hub. These images are automatically built and uploaded |
| 35 | +to the hub from the latest main branch of this repository. |
| 36 | + |
| 37 | +Release tagged images of the components are also available on the Docker hub, tagged with their |
| 38 | +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this |
| 39 | +repository. |
| 40 | + |
| 41 | +Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. |
| 42 | + |
| 43 | +See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. |
| 44 | + |
| 45 | +#### Install XPU-Manager with the Sidecar |
| 46 | + |
| 47 | +Install XPU-Manager daemonset with the XeLink sidecar |
| 48 | + |
| 49 | +```bash |
| 50 | +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>' |
| 51 | +``` |
| 52 | + |
| 53 | +Please see XPU-Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes). |
| 54 | + |
| 55 | +#### Install Sidecar to an Existing XPU-Manager |
| 56 | + |
| 57 | +Use patch to add sidecar into the XPU-Manager daemonset. |
| 58 | + |
| 59 | +```bash |
| 60 | +$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>' |
| 61 | +``` |
| 62 | + |
| 63 | +NOTE: The sidecar patch will remove other resources from the XPU-Manager container. If your XPU-Manager daemonset is using, for example, the smarter device manager resources, those will be removed. |
| 64 | + |
| 65 | +#### Verify Sidecar Functionality |
| 66 | + |
| 67 | +You can verify the sidecar's functionality by checking node's xe-links labels: |
| 68 | + |
| 69 | +```bash |
| 70 | +$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}" |
| 71 | +master,0.0-1.0_0.1-1.1 |
| 72 | +``` |
0 commit comments