Tensorflow - 多 GPU 不适用于模型（输入）或计算梯度

2024-05-15 • 问答

当使用多个 GPU 对模型进行推理（例如调用方法：model(inputs)）并计算其梯度时，机器只使用一个 GPU，其余空闲。

例如在下面的代码片段中：

import tensorflow as tf
import numpy as np
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

# Make the tf-data
path_filename_records = 'your_path_to_records'
bs = 128

dataset = tf.data.TFRecordDataset(path_filename_records)
dataset = (dataset
           .map(parse_record,num_parallel_calls=tf.data.experimental.AUTOTUNE)
           .batch(bs)
           .prefetch(tf.data.experimental.AUTOTUNE)
          )

# Load model trained using MirroredStrategy
path_to_resnet = 'your_path_to_resnet'
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
    resnet50 = tf.keras.models.load_model(path_to_resnet)

for pre_images,true_label in dataset:
    with tf.GradientTape() as tape:
       tape.watch(pre_images)
       outputs = resnet50(pre_images)
       grads = tape.gradient(outputs,pre_images)

只使用了一个 GPU。您可以使用 nvidia-smi 分析 GPU 的行为。我不知道它是否应该是这样的，model(inputs) 和 tape.gradient 都没有多 GPU 支持。但如果是这样，那么这是一个大问题，因为如果您有一个大数据集并且需要计算关于输入的梯度（例如可解释性海豚），使用一个 GPU 可能需要几天时间。我尝试的另一件事是使用 model.predict()，但 tf.GradientTape 无法做到这一点。

到目前为止我已经尝试过但没有奏效

将所有代码放在镜像策略范围内。
使用不同的 GPU：我尝试过 A100、A6000 和 RTX5000。还更改了显卡数量并改变了批量大小。
指定 GPU 列表，例如 strategy = tf.distribute.MirroredStrategy(['/gpu:0','/gpu:1'])。
按照@Kaveh 的建议添加了此 strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())。

我怎么知道只有一个 GPU 在工作？

我在终端中使用了命令 watch -n 1 nvidia-smi 并观察到只有一个 GPU 处于 100%，其余的处于 0%。

工作示例

您可以在下面的 dog_vs_cats 数据集上找到一个使用 cnn 训练的工作示例。您不需要像我使用 tfds 版本那样手动下载数据集，也不需要训练模型。

笔记本：Working Example.ipynb

保存的模型：

mirrored_strategy = tf.distribute.MirroredStrategy() print(f'using distribution strategy\nnumber of gpus:{mirrored_strategy.num_replicas_in_sync}') dataset=tf.data.Dataset.from_tensor_slices(np.random.rand(64,224,3)).batch(8) #create distributed dataset ds = mirrored_strategy.experimental_distribute_dataset(dataset) #make variables mirrored with mirrored_strategy.scope(): resnet50=tf.keras.applications.resnet50.ResNet50() def step_fn(pre_images): with tf.GradientTape(watch_accessed_variables=False) as tape: tape.watch(pre_images) outputs = resnet50(pre_images)[:,0:1] return tf.squeeze(tape.batch_jacobian(outputs,pre_images)) #define distributed step function using strategy.run and strategy.gather @tf.function def distributed_step_fn(pre_images): per_replica_grads = mirrored_strategy.run(step_fn,args=(pre_images,)) return mirrored_strategy.gather(per_replica_grads,0) #loop over distributed dataset with distributed_step_fn for result in map(distributed_step_fn,ds): print(result.numpy().shape)

Tensorflow - 多 GPU 不适用于模型（输入）或计算梯度

iCMS 回答：Tensorflow - 多 GPU 不适用于模型（输入）或计算梯度

大家都在问