-
Notifications
You must be signed in to change notification settings - Fork 245
Description
Description
When running large 3D grids (e.g., 1295^3 points), Devito fails due to integer overflow in the C structure definition in devito/types/dense.py.
The problem originates from the use of ctypes.c_int (32-bit signed integer), which overflows when matrix size exceeds 2^31-1 elements, even if index-mode=int64 and linearize=True are enabled.
This results in incorrect memory size interpretation and crashes in GPU memory allocation.
File
devito/types/dense.py
Affected Section
_C_ctype = POINTER(type(_C_structname, (Structure,),
{'_fields_': [(_C_field_data, c_restrict_void_p),
(_C_field_size, POINTER(c_int)),
(_C_field_nbytes, c_ulong),
(_C_field_nopad_size, POINTER(c_ulong)),
(_C_field_domain_size, POINTER(c_ulong)),
(_C_field_halo_size, POINTER(c_int)),
(_C_field_halo_ofs, POINTER(c_int)),
(_C_field_owned_ofs, POINTER(c_int)),
(_C_field_dmap, c_void_p)]}))
Proposed Fix
Replace c_int with c_long in the _C_ctype struct definition to safely handle arrays larger than 2^31 elements:
from ctypes import c_long
_C_ctype = POINTER(type(_C_structname, (Structure,),
{'_fields_': [(_C_field_data, c_restrict_void_p),
(_C_field_size, POINTER(c_long)),
(_C_field_nbytes, c_ulong),
(_C_field_nopad_size, POINTER(c_ulong)),
(_C_field_domain_size, POINTER(c_ulong)),
(_C_field_halo_size, POINTER(c_long)),
(_C_field_halo_ofs, POINTER(c_long)),
(_C_field_owned_ofs, POINTER(c_long)),
(_C_field_dmap, c_void_p)]}))
After this modification, all large-domain runs succeed without overflow.
Steps to Reproduce:
1. Use a large 3D simulation (> 2^31 elements):
2. Observe the runtime error (CUDA memory allocation failure, but triggered by invalid size).
3. Inspect dense.py - note c_int usage for size fields.
Observed Behavior
Out of memory allocating 18446744065653020036 bytes of device memory
Failing in Thread:1
Accelerator Fatal Error: call to cuMemAlloc returned error 2 (CUDA_ERROR_OUT_OF_MEMORY)
The reported allocation size is clearly invalid (≈ 1.8×10^31 bytes), caused by signed 32-bit overflow.
Expected Behavior
Correct allocation for 24.5 GB domain (~2.17×10^9 elements), no overflow, normal execution.
Environment
• Devito version: 4.8.20
• Backend: OpenACC / NVHPC 25.1
• MPI: Enabled
• Hardware: NVIDIA H100 (80 GB HBM)
• OS: HPC cluster environment - Linux
• Python: 3.10 (NVHPC env)
Fix verified on GPU backends - replacing c_int with c_long solves the issue fully.