You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cmd/gpu_plugin/README.md
+21Lines changed: 21 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,7 @@ Table of Contents
19
19
*[CDI support](#cdi-support)
20
20
*[KMD and UMD](#kmd-and-umd)
21
21
*[Health management](#health-management)
22
+
*[by-path mounting](#by-path-mounting)
22
23
*[Issues with media workloads on multi-GPU setups](#issues-with-media-workloads-on-multi-gpu-setups)
23
24
*[Workaround for QSV and VA-API](#workaround-for-qsv-and-va-api)
24
25
@@ -60,6 +61,7 @@ For workloads on different KMDs, see [KMD and UMD](#kmd-and-umd).
60
61
| -allow-ids | string | "" | A list of PCI Device IDs that are allowed to be registered as resources. Default is empty (=all registered). Cannot be used together with `deny-ids`. |
61
62
| -deny-ids | string | "" | A list of PCI Device IDs that are denied to be registered as resources. Default is empty (=all registered). Cannot be used together with `allow-ids`. |
62
63
| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: _balanced_ mode spreads workloads among GPU devices, _packed_ mode fills one GPU fully before moving to next, and _none_ selects first available device from kubelet. Default is _none_. |
64
+
| -bypath | string | single | 3 possible values: single, none, all. Default is single. Changes how the by-path symlinks are handled by the plugin. More [info](#by-path-mounting). |
63
65
64
66
The plugin also accepts a number of other arguments (common to all plugins) related to logging.
65
67
Please use the -h option to see the complete list of logging related options.
@@ -258,6 +260,25 @@ Kubernetes Device Plugin API allows passing device's healthiness to Kubelet. By
258
260
259
261
Temperature limit can be provided via the command line argument, default is 100C.
260
262
263
+
### By-path mounting
264
+
265
+
The DRM devices forthe Intel GPUs register `by-path` symlinks under `/dev/dri/by-path`. For each GPU character device, there is a corresponding symlinkin the by-path directory:
266
+
```
267
+
$ ls -l /dev/dri/by-path/
268
+
lrwxrwxrwx 1 root root 8 oct x 13:09 pci-0000:00:02.0-card -> ../card1
269
+
lrwxrwxrwx 1 root root 13 oct x 13:09 pci-0000:00:02.0-render -> ../renderD128
270
+
```
271
+
272
+
The Intel GPU UMD uses these symlinks to detect hardware properties in some cases. Mounting the by-path symlinks as __symlinks__ with the Device plugin API (DP API) is not possible. When the symlinks are mounted via the DP API, they are mounted as the actual devices, and the symlink information is lost (pci address).
273
+
274
+
To support possible all use cases, GPU plugin allows changing the by-path mounting method. The options are:
275
+
*`single` - Symlinks are individually mounted per device. Default.
276
+
* Mostly Works, but is known to have issues with some pytorch workloads. See [issue](https://github.com/intel/intel-device-plugins-for-kubernetes/issues/2158).
277
+
*`none` - No symlinks are mounted.
278
+
* Aligned with Docker `privileged` mode devices usage.
279
+
*`all` - Mounts whole DRM `by-path` directory. Pro: symlink file types are preserved. Con: symlinks are present for all devices.
280
+
* Optimal for scale-up workloads where all the GPUs are used by the workload.
281
+
261
282
### Issues with media workloads on multi-GPU setups
262
283
263
284
OneVPL media API, 3D and compute APIs provide device discovery
0 commit comments