我正在一个项目中,在这个项目中,我获得了一个实现迭代过程的python模块,并且一些计算是由使用Tensorflow 2.0的GPU执行的。该模块在单个过程中独立使用时正常工作。
由于我必须使用不同的参数执行几次运行,因此我想并行化调用,但是当我从不同的进程调用模块(导入tensorflow)时,我得到了CUDA_ERROR_OUT_OF_MEMORY
和一个无限循环CUDA_ERROR_NOT_INITIALIZED
,因此产生的进程将永远挂起。
我当然尝试过limiting the GPU memory,但是如果我运行来自不同解释器的两个不同的python脚本,则它可以正常工作,但是对于我来说似乎不起作用。
尤其是如果我使用
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently,memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu,True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus),"Physical GPUs,",len(logical_gpus),"Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
我得到CUDA_ERROR_NOT_INITIALIZED
的无限循环,而如果我使用的话:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0],True)
else:
print("No GPU found,model running on CPU")
该进程也挂起,但是每个生成的进程都出现错误。
通过读取tensorflow控制台输出,第一个产生的进程似乎在GPU上分配了内存,但是它挂起了,并且其他进程抱怨内存已耗尽。奇怪的是,在nvidia-smi中,GPU内存似乎根本没有用尽。
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:03:00.0 On | N/A |
| 29% 42C P8 28W / 250W | 755MiB / 12035MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
我设法写了一个最小的可重现的例子:
文件“ tf_module.py”
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently,"Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
else:
print("Running on CPU")
def run(x,y):
return tf.add(x,y).numpy()
文件“ run.py”
from multiprocessing import Pool
import tf_module as experiment
def run_exp(params):
a,b = params
return experiment.run(a,b)
pool = Pool(2)
params = [(a,b) for a in range(3) for b in range(3)]
results = pool.map(run_exp,params)
将TF计算移出模块是不可行的,因为它是复杂管道的一部分,其中还涉及numpy,因此我需要并行处理这段代码。
我想念什么吗?
预先感谢