我正在尝试在多个GPU上的cupy中启动原始内核。
示例1(RawKernel):
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! user@0.1.0 start: `react-scripts start`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the aksh@0.1.0 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /home/aksh/.npm/_logs/2020-05-15T19_06_20_971Z-debug.log
此代码在我的计算机上以import numpy as np
import cupy
gpu_indices = (0,1)
n_gpu = len(gpu_indices)
source = """
extern "C" {
__global__
void my_add(const float *x1,const float *x2,float *y,const int n)
{
const int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid < n) {
y[tid] = x1[tid] + x2[tid];
}
}
}
"""
add_kernel = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
add_kernel[gpu_id] = cupy.RawKernel(source,'my_add')
x1 = [None] * n_gpu
x2 = [None] * n_gpu
y = [None] * n_gpu
streams = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
x1[gpu_id] = cupy.arange(25,dtype=cupy.float32).reshape(5,5)
x2[gpu_id] = cupy.arange(25,5)
y[gpu_id] = cupy.zeros((5,5),dtype=cupy.float32)
streams[gpu_id] = cupy.cuda.stream.Stream()
for gpu_id in gpu_indices:
cupy.cuda.Device(gpu_id).use()
streams[gpu_id].use()
add_kernel[gpu_id]((5,),(5,(x1[gpu_id],x2[gpu_id],y[gpu_id],x1[gpu_id].size),stream=streams[gpu_id])
streams[gpu_id].synchronize()
print(y[gpu_id])
(单个gpu)和gpu_id=(0,)
(双gpu)运行时没有错误。
示例2(RawModule):
gpu_id=(0,1)
两个代码之间的唯一区别是获取内核的方式。在示例2中,调用cupy.RawModule()来编译import numpy as np
import cupy
gpu_indices = (0,const int n)
{
const int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid < n) {
y[tid] = x1[tid] + x2[tid];
}
}
}
"""
module = cupy.RawModule(code=source)
add_kernel = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
add_kernel[gpu_id] = module.get_function('my_add')
# same as example 1
x1 = [None] * n_gpu
x2 = [None] * n_gpu
y = [None] * n_gpu
streams = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
x1[gpu_id] = cupy.arange(25,stream=streams[gpu_id])
streams[gpu_id].synchronize()
print(y[gpu_id])
,并使用get_function()获取内核。但是,此代码失败,显示source
:
gpu_id=(0,1)
有人可以在多个GPU上提供cupy.RawModule的示例吗?对我而言,由于某些原因,RawModule优于RawKernel。
这是我的环境:
- cupy 7.3,numpy 1.18.1
- CUDA 10.2(10.2.89_440.33.01)
- python 3.6.10(anaconda)
- Linux Mint 19.1 Tessa(基于Ubuntu 18.04)
- 两个gtx 1080 Ti gpu(与本机CUDA C / C ++编程配合使用)
添加:如果我尝试在每个GPU上编译[[ 0. 2. 4. 6. 8.]
[10. 12. 14. 16. 18.]
[20. 22. 24. 26. 28.]
[30. 32. 34. 36. 38.]
[40. 42. 44. 46. 48.]]
Traceback (most recent call last):
File "test_rawmodule.py",line 42,in <module>
add_kernel[gpu_id]((5,stream=streams[gpu_id])
File "cupy/core/raw.pyx",line 66,in cupy.core.raw.RawKernel.__call__
File "cupy/cuda/function.pyx",line 162,in cupy.cuda.function.Function.__call__
File "cupy/cuda/function.pyx",line 144,in cupy.cuda.function._launch
File "cupy/cuda/driver.pyx",line 293,in cupy.cuda.driver.launchKernel
File "cupy/cuda/driver.pyx",line 118,in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
,例如:
source
然后我有以下错误消息:
module = [None] * n_gpu
add_kernel = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
module[gpu_id] = cupy.RawModule(code=source)
add_kernel[gpu_id] = module[gpu_id].get_function('my_add')
这很奇怪,因为我认为module[gpu_id] = cupy.RawModule(code=source)
File "cupy/core/raw.pyx",line 252,in cupy.core.raw.RawModule.__init__
File "cupy/core/carray.pxi",line 146,in cupy.core.core.compile_with_cache
File "cupy/core/carray.pxi",line 194,in cupy.core.core.compile_with_cache
File "/home/kejang/anaconda3/envs/python3.6/lib/python3.6/site-packages/cupy/cuda/compiler.py",line 287,in compile_with_cache
extra_source,backend)
File "/home/kejang/anaconda3/envs/python3.6/lib/python3.6/site-packages/cupy/cuda/compiler.py",line 335,in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx",line 197,in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx",line 199,in cupy.cuda.function.Module.load
File "cupy/cuda/driver.pyx",line 240,in cupy.cuda.driver.moduleLoadData
File "cupy/cuda/driver.pyx",in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_CONTEXT_IS_DESTROYED: context is destroyed
应该像示例1一样在每个GPU上编译。