在多个 GPU 上读/写 CuPy 数组

2024-05-15 • 问答

我有一个大数组。我需要在多个 GPU 上处理这个数组的一部分，而不是将相同的数组复制到每个 GPU。我有四个 GPU，我在设备 0 上创建了一个 CuPy 数组，我也能够获得数组指针。但我不知道在其他设备上取消引用这个指针的方法。我的示例代码是：

import cupy as cp

with cp.cuda.Device(0):
    x     = cp.zeros((4,4),dtype=cp.int32)
    x_ptr = x.data.ptr     # CuPy array pointer
with cp.cuda.Device(1):
    print(x)               # prints the array with no error
    x[0,0] = 99           # Error: Array device must be same as the current device: array device = 0 while current = 1

很明显，从 GPU 1 写入 GPU 0 并不简单，需要取消引用指针，我不确定如何使用 CuPy 实现。

我尝试使用 ctypes 将 CuPy 数组转换为 ctypes，因为有一种已知的方法可以使用以下代码取消引用指针，但我不断收到段错误核心转储，因此无法取得进一步进展：

import ctypes
import cupy as cp

with cp.cuda.Device(0):
    x        = cp.zeros((4,dtype=cp.int32)
    x_ctypes = ctypes.cast(x.data.ptr,ctypes.POINTER(ctypes.c_int32*x.size))

with cp.cuda.Device(1):
    print(x_ctypes)              # get: <__main__.LP_c_int_Array_16 object at 0x7f29d2841050>
    print(x_ctypes.contents)     # get: <__main__.c_int_Array_16 object at 0x7f29d28410e0>
    print(x_ctypes.contents[0])  # get: Segmentation fault (core dumped)

有没有一种有效的方法可以在不将数组 x 复制到 GPU 1 的情况下做到这一点？注意：GPU 已启用 access Peer

在多个 GPU 上读/写 CuPy 数组

xuquanfu 回答：在多个 GPU 上读/写 CuPy 数组

大家都在问