gpu: add new nfd + monitoring + shared-dev deployment option

tkatila · uniemimu · commit d1e8350c6eb1 · 2023-01-05T14:13:13.000+02:00
Signed-off-by: Tuomas Katila &lt;tuomas.katila@intel.com&gt;
diff --git a/cmd/gpu_plugin/README.md b/cmd/gpu_plugin/README.md
@@ -15,6 +15,7 @@ Table of Contents
     * [Pre-built Images](#pre-built-images)
          * [Install to all nodes](#install-to-all-nodes)
          * [Install to nodes with Intel GPUs with NFD](#install-to-nodes-with-intel-gpus-with-nfd)
+         * [Install to nodes with NFD, Monitoring and Shared-dev](#install-to-nodes-with-nfd-monitoring-and-shared-dev)
          * [Install to nodes with Intel GPUs with Fractional resources](#install-to-nodes-with-intel-gpus-with-fractional-resources)
               * [Fractional resources details](#fractional-resources-details)
     * [Verify Plugin Registration](#verify-plugin-registration)
@@ -167,6 +168,21 @@ $ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes
 $ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/nfd_labeled_nodes?ref=<RELEASE_VERSION>'
 ```
 
+#### Install to nodes with NFD, Monitoring and Shared-dev
+
+Same as above, but configures GPU plugin with logging, [monitoring and shared-dev](#modes-and-configuration-options) features enabled. This option is useful when there is a desire to retrieve GPU metrics from nodes. For example with [XPU-Manager](https://github.com/intel/xpumanager/) or [collectd](https://github.com/collectd/collectd/tree/collectd-6.0).
+
+```bash
+# Start NFD - if your cluster doesn't have NFD installed yet
+$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd?ref=<RELEASE_VERSION>'
+
+# Create NodeFeatureRules for detecting GPUs on nodes
+$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/node-feature-rules?ref=<RELEASE_VERSION>'
+
+# Create GPU plugin daemonset
+$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/?ref=<RELEASE_VERSION>'
+```
+
 #### Install to nodes with Intel GPUs with Fractional resources
 
 With the experimental fractional resource feature you can use additional kubernetes extended
diff --git a/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/add-args.yaml b/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/add-args.yaml
@@ -0,0 +1,13 @@
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: intel-gpu-plugin
+spec:
+  template:
+    spec:
+      containers:
+      - name: intel-gpu-plugin
+        args:
+        - "-shared-dev-num=30"
+        - "-enable-monitoring"
+        - "-v=2"
diff --git a/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/add-nodeselector-intel-gpu.yaml b/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/add-nodeselector-intel-gpu.yaml
@@ -0,0 +1,9 @@
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: intel-gpu-plugin
+spec:
+  template:
+    spec:
+      nodeSelector:
+        intel.feature.node.kubernetes.io/gpu: "true"
diff --git a/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/kustomization.yaml b/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/kustomization.yaml
@@ -0,0 +1,5 @@
+bases:
+  - ../../base
+patches:
+  - path: add-args.yaml
+  - path: add-nodeselector-intel-gpu.yaml