File tree Expand file tree Collapse file tree 1 file changed +22
-0
lines changed
Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Original file line number Diff line number Diff line change @@ -206,6 +206,28 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
206206
207207 `ENABLE_TENSOR_FUSER`
208208
209+ ### Support
210+
211+ #### Model Instance Group Kind
212+
213+ The PyTorch backend supports the following kinds of
214+ [ Model Instance Groups] ( https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups )
215+ where the input tensors are placed as follows:
216+
217+ * ` KIND_GPU ` : Inputs are prepared on the GPU device associated with the model
218+ instance.
219+
220+ * ` KIND_CPU ` : Inputs are prepared on the CPU.
221+
222+ * ` KIND_MODEL ` : Inputs are prepared on the CPU. When loading the model, the
223+ backend does not choose the GPU device for the model; instead, it respects the
224+ device(s) specified in the model and uses them as they are during inference.
225+ This is useful when the model internally utilizes multiple GPUs, as demonstrated
226+ in this
227+ [ example model] ( https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_instance_group_kind_model/gen_models.py ) .
228+ If no device is specified in the model, the backend uses the first available
229+ GPU device. This feature is available starting in the 23.06 release.
230+
209231### Important Notes
210232
211233* The execution of PyTorch model on GPU is asynchronous in nature. See
You can’t perform that action at this time.
0 commit comments