Skip to content

Commit 215ba37

Browse files
authored
Merge pull request #1280 from tkatila/xpum-sidecar-pr
Gather XeLink mapping info to nodes via xpu-manager sidecar
2 parents 1380d24 + 3aef771 commit 215ba37

File tree

10 files changed

+847
-2
lines changed

10 files changed

+847
-2
lines changed

.github/workflows/ci.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ jobs:
122122
- intel-idxd-config-initcontainer
123123
- intel-dlb-plugin
124124
- intel-dlb-initcontainer
125+
- intel-xpumanager-sidecar
125126

126127
# Demo images
127128
- crypto-perf

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Table of Contents
2323
* [DLB device plugin](#dlb-device-plugin)
2424
* [IAA device plugin](#iaa-device-plugin)
2525
* [Device Plugins Operator](#device-plugins-operator)
26+
* [XeLink XPU-Manager sidecar](#xelink-xpu-manager-sidecar)
2627
* [Demos](#demos)
2728
* [Workload Authors](#workload-authors)
2829
* [Developers](#developers)
@@ -203,6 +204,12 @@ The [Device plugins operator README](cmd/operator/README.md) gives the installat
203204

204205
The [Device plugins Operator for OCP](cmd/operator/ocp_quickstart_guide/README.md) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736).
205206

207+
## XeLink XPU-Manager Sidecar
208+
209+
To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.
210+
211+
The [XeLink XPU-Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.
212+
206213
## Demos
207214

208215
The [demo subdirectory](demo/readme.md) contains a number of demonstrations for
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
## This is a generated file, do not edit directly. Edit build/docker/templates/intel-xpumanager-sidecar.Dockerfile.in instead.
2+
##
3+
## Copyright 2022 Intel Corporation. All Rights Reserved.
4+
##
5+
## Licensed under the Apache License, Version 2.0 (the "License");
6+
## you may not use this file except in compliance with the License.
7+
## You may obtain a copy of the License at
8+
##
9+
## http://www.apache.org/licenses/LICENSE-2.0
10+
##
11+
## Unless required by applicable law or agreed to in writing, software
12+
## distributed under the License is distributed on an "AS IS" BASIS,
13+
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
## See the License for the specific language governing permissions and
15+
## limitations under the License.
16+
###
17+
ARG CMD=xpumanager_sidecar
18+
## FINAL_BASE can be used to configure the base image of the final image.
19+
##
20+
## This is used in two ways:
21+
## 1) make <image-name> BUILDER=<docker|buildah>
22+
## 2) docker build ... -f <image-name>.Dockerfile
23+
##
24+
## The project default is 1) which sets FINAL_BASE=gcr.io/distroless/static
25+
## (see build-image.sh).
26+
## 2) and the default FINAL_BASE is primarily used to build Redhat Certified Openshift Operator container images that must be UBI based.
27+
## The RedHat build tool does not allow additional image build parameters.
28+
ARG FINAL_BASE=registry.access.redhat.com/ubi8-micro:latest
29+
###
30+
##
31+
## GOLANG_BASE can be used to make the build reproducible by choosing an
32+
## image by its hash:
33+
## GOLANG_BASE=golang@sha256:9d64369fd3c633df71d7465d67d43f63bb31192193e671742fa1c26ebc3a6210
34+
##
35+
## This is used on release branches before tagging a stable version.
36+
## The main branch defaults to using the latest Golang base image.
37+
ARG GOLANG_BASE=golang:1.19-bullseye
38+
###
39+
FROM ${GOLANG_BASE} as builder
40+
ARG DIR=/intel-device-plugins-for-kubernetes
41+
ARG GO111MODULE=on
42+
ARG BUILDFLAGS="-ldflags=-w -s"
43+
ARG GOLICENSES_VERSION
44+
ARG EP=/usr/local/bin/intel_xpumanager_sidecar
45+
ARG CMD
46+
WORKDIR ${DIR}
47+
COPY . .
48+
RUN (cd cmd/${CMD}; GO111MODULE=${GO111MODULE} CGO_ENABLED=0 go install "${BUILDFLAGS}") && install -D /go/bin/${CMD} /install_root${EP}
49+
RUN install -D ${DIR}/LICENSE /install_root/licenses/intel-device-plugins-for-kubernetes/LICENSE \
50+
&& if [ ! -d "licenses/$CMD" ] ; then \
51+
GO111MODULE=on go run github.com/google/go-licenses@${GOLICENSES_VERSION} save "./cmd/$CMD" \
52+
--save_path /install_root/licenses/$CMD/go-licenses ; \
53+
else mkdir -p /install_root/licenses/$CMD/go-licenses/ && cd licenses/$CMD && cp -r * /install_root/licenses/$CMD/go-licenses/ ; fi
54+
###
55+
FROM ${FINAL_BASE}
56+
COPY --from=builder /install_root /
57+
ENTRYPOINT ["/usr/local/bin/intel_xpumanager_sidecar"]
58+
LABEL vendor='Intel®'
59+
LABEL version='devel'
60+
LABEL release='1'
61+
LABEL name='intel-xpumanager-sidecar'
62+
LABEL summary='Intel® xpumanager sidecar'
63+
LABEL description='The xpumanager sidecar creates NFD labels from xpumanager topology information'
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#define _ENTRYPOINT_ /usr/local/bin/intel_xpumanager_sidecar
2+
ARG CMD=xpumanager_sidecar
3+
4+
#include "default_plugin.docker"
5+
6+
LABEL name='intel-xpumanager-sidecar'
7+
LABEL summary='Intel® xpumanager sidecar'
8+
LABEL description='The xpumanager sidecar creates NFD labels from xpumanager topology information'

cmd/xpumanager_sidecar/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# XeLink sidecar for Intel XPU Manager
2+
3+
Table of Contents
4+
5+
* [Introduction](#introduction)
6+
* [Modes and Configuration Options](#modes-and-configuration-options)
7+
* [Installation](#installation)
8+
* [Install XPU-Manager with the Sidecar](#install-xpu-manager-with-the-sidecar)
9+
* [Install Sidecar to an Existing XPU-Manager](#install-sidecar-to-an-existing-xpu-manager)
10+
* [Verify Sidecar Functionality](#verify-sidecar-functionality)
11+
12+
## Introduction
13+
14+
Intel GPUs can be interconnected via an XeLink. In some workloads it is beneficial to use GPUs that are XeLinked together for optimal performance. XeLink information is provided by [Intel XPU Manager](https://www.github.com/intel/xpumanager) via its metrics API. Xelink sidecar retrieves the information from XPU Manager and stores it on the node under ```/etc/kubernetes/node-feature-discovery/features.d/``` as a feature label file. [NFD](https://github.com/kubernetes-sigs/node-feature-discovery) reads this file and converts it to Kubernetes node labels. These labels are then used by [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) to make [scheduling decisions](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#multi-gpu-allocation-with-xe-link-connections) for Pods.
15+
16+
## Modes and Configuration Options
17+
18+
| Flag | Argument | Default | Meaning |
19+
|:---- |:-------- |:------- |:------- |
20+
| -lane-count | int | 4 | Minimum lane count for an XeLink interconnect to be accepted |
21+
| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) |
22+
| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
23+
| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links |
24+
25+
The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.
26+
27+
## Installation
28+
29+
The following sections detail how to obtain, deploy and test the XPU-Manager XeLink sidecar.
30+
31+
### Pre-built Images
32+
33+
[Pre-built images](https://hub.docker.com/r/intel/intel-xpumanager-sidecar)
34+
of this component are available on the Docker hub. These images are automatically built and uploaded
35+
to the hub from the latest main branch of this repository.
36+
37+
Release tagged images of the components are also available on the Docker hub, tagged with their
38+
release version numbers in the format `x.y.z`, corresponding to the branches and releases in this
39+
repository.
40+
41+
Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images.
42+
43+
See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin.
44+
45+
#### Install XPU-Manager with the Sidecar
46+
47+
Install XPU-Manager daemonset with the XeLink sidecar
48+
49+
```bash
50+
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>'
51+
```
52+
53+
Please see XPU-Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).
54+
55+
#### Install Sidecar to an Existing XPU-Manager
56+
57+
Use patch to add sidecar into the XPU-Manager daemonset.
58+
59+
```bash
60+
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>'
61+
```
62+
63+
NOTE: The sidecar patch will remove other resources from the XPU-Manager container. If your XPU-Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
64+
65+
#### Verify Sidecar Functionality
66+
67+
You can verify the sidecar's functionality by checking node's xe-links labels:
68+
69+
```bash
70+
$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}"
71+
master,0.0-1.0_0.1-1.1
72+
```

0 commit comments

Comments
 (0)