Skip to content

Commit a7a2413

Browse files
authored
Add the documentation for the issue when using traced model in multi-… (#104)
* Add the documentation for the issue when using traced model in multi-GPU environment * Address comment
1 parent d600509 commit a7a2413

File tree

1 file changed

+25
-6
lines changed

1 file changed

+25
-6
lines changed

README.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -218,9 +218,28 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
218218
state and a restart of the server may be required to continue serving
219219
successfully.
220220

221-
* PyTorch does not support Tensor of Strings but it does support models that accept
222-
a List of Strings as input(s) / produces a List of String as output(s). For these models
223-
Triton allows users to pass String input(s)/receive String output(s) using the String
224-
datatype. As a limitation of using List instead of Tensor for String I/O, only for
225-
1-dimensional input(s)/output(s) are supported for I/O of String type.
226-
221+
* PyTorch does not support Tensor of Strings but it does support models that
222+
accept a List of Strings as input(s) / produces a List of String as output(s).
223+
For these models Triton allows users to pass String input(s)/receive String
224+
output(s) using the String datatype. As a limitation of using List instead of
225+
Tensor for String I/O, only for 1-dimensional input(s)/output(s) are supported
226+
for I/O of String type.
227+
228+
* In a multi-GPU environment, a potential runtime issue can occur when using
229+
[Tracing](https://pytorch.org/docs/stable/generated/torch.jit.trace.html)
230+
to generate a
231+
[TorchScript](https://pytorch.org/docs/stable/jit.html) model. This issue
232+
arises due to a device mismatch between the model instance and the tensor. By
233+
default, Triton creates a single execution instance of the model for each
234+
available GPU. The runtime error occurs when a request is sent to a model
235+
instance with a different GPU device from the one used during the TorchScript
236+
generation process. To address this problem, it is highly recommended to use
237+
[Scripting](https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script)
238+
instead of Tracing for model generation in a multi-GPU environment. Scripting
239+
avoids the device mismatch issue and ensures compatibility with different GPUs
240+
when used with Triton. However, if using Tracing is unavoidable, there is a
241+
workaround available. You can explicitly specify the GPU device for the model
242+
instance in the
243+
[model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
244+
to ensure that the model instance and the tensors used for inference are
245+
assigned to the same GPU device as on which the model was traced.

0 commit comments

Comments
 (0)