@@ -218,9 +218,28 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
218218 state and a restart of the server may be required to continue serving
219219 successfully.
220220
221- * PyTorch does not support Tensor of Strings but it does support models that accept
222- a List of Strings as input(s) / produces a List of String as output(s). For these models
223- Triton allows users to pass String input(s)/receive String output(s) using the String
224- datatype. As a limitation of using List instead of Tensor for String I/O, only for
225- 1-dimensional input(s)/output(s) are supported for I/O of String type.
226-
221+ * PyTorch does not support Tensor of Strings but it does support models that
222+ accept a List of Strings as input(s) / produces a List of String as output(s).
223+ For these models Triton allows users to pass String input(s)/receive String
224+ output(s) using the String datatype. As a limitation of using List instead of
225+ Tensor for String I/O, only for 1-dimensional input(s)/output(s) are supported
226+ for I/O of String type.
227+
228+ * In a multi-GPU environment, a potential runtime issue can occur when using
229+ [ Tracing] ( https://pytorch.org/docs/stable/generated/torch.jit.trace.html )
230+ to generate a
231+ [ TorchScript] ( https://pytorch.org/docs/stable/jit.html ) model. This issue
232+ arises due to a device mismatch between the model instance and the tensor. By
233+ default, Triton creates a single execution instance of the model for each
234+ available GPU. The runtime error occurs when a request is sent to a model
235+ instance with a different GPU device from the one used during the TorchScript
236+ generation process. To address this problem, it is highly recommended to use
237+ [ Scripting] ( https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script )
238+ instead of Tracing for model generation in a multi-GPU environment. Scripting
239+ avoids the device mismatch issue and ensures compatibility with different GPUs
240+ when used with Triton. However, if using Tracing is unavoidable, there is a
241+ workaround available. You can explicitly specify the GPU device for the model
242+ instance in the
243+ [ model configuration] ( https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups )
244+ to ensure that the model instance and the tensors used for inference are
245+ assigned to the same GPU device as on which the model was traced.
0 commit comments