如何解决 Expected all tensors to be on the same device 和不能将 CUDA tensor 转换为 numpy 的矛盾

2024-05-16 • 问答

我正在尝试使用多 GPU 的 yolov3...


    def evaluate(self):
        self.models.eval()

        labels = []
        sample_metrics = []  # List of tuples (TP,confs,pred)
        for batch_i,(_,imgs,targets) in enumerate(tqdm.tqdm(self.valid_dataloader,desc="Detecting objects")):
            # Extract labels
            labels += targets[:,1].tolist()
            # Rescale target
            targets[:,2:] = xywh2xyxy(targets[:,2:])
            targets[:,2:] *= self.img_size

            #targets = targets.cuda()

            #imgs = Variable(imgs.type(Tensor),requires_grad=False)

            imgs = imgs.cuda()

            with torch.no_grad():
                outputs = self.models(imgs)
                outputs = non_max_suppression(outputs,conf_thres=self.conf_thres,nms_thres=self.nms_thres)

            sample_metrics += get_batch_statistics(outputs,targets,iou_threshold=self.iou_thres)

        # concatenate sample statistics
        true_positives,pred_scores,pred_labels = [np.concatenate(x,0) for x in list(zip(*sample_metrics))]
        precision,recall,AP,f1,ap_class = ap_per_class(true_positives,pred_labels,labels)

        return precision,ap_class

如果我使用那两条注释行，我将面临

TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first

但如果我不这样做，我会面对

RuntimeError: Expected all tensors to be on the same device,but found at least two devices,cuda:0 and cpu!

如何解决？

跟踪：

    Traceback (most recent call last):
    File "D:/Code/DeblurGANv2-master/train.py",line 415,in <module>
        trainer.train()
    File "D:/Code/DeblurGANv2-master/train.py",line 253,in train
        self._run_epoch(epoch)
    File "D:/Code/DeblurGANv2-master/train.py",line 287,in _run_epoch
        loss_detect = self.calculate()
    File "D:/Code/DeblurGANv2-master/train.py",line 224,in calculate
        precision,ap_class = self.evaluate()
    File "D:/Code/DeblurGANv2-master/train.py",line 160,in evaluate
        sample_metrics += get_batch_statistics(outputs,iou_threshold=self.iou_thres)
    File "D:\Code\DeblurGANv2-master\util\utils.py",line 172,in 
    get_batch_statistics
        if pred_label not in target_labels:
    File "D:\soft\Anaconda3\envs\DeblurGANv2-master\lib\site- 
    packages\torch\tensor.py",line 659,in __contains__
        return (element == self).any().item()  # type: ignore[union-attr]
    File "D:\soft\Anaconda3\envs\DeblurGANv2-master\lib\site- 
   packages\torch\tensor.py",line 27,in wrapped
        return f(*args,**kwargs)
    RuntimeError: Expected all tensors to be on the same device,but found 
    at least two devices,cuda:0 and cpu!

我假设问题出在sample_metrics上，无论是把这个张量放在GPU还是CPU上，总是有问题......

如何解决 Expected all tensors to be on the same device 和不能将 CUDA tensor 转换为 numpy 的矛盾

liync 回答：如何解决 Expected all tensors to be on the same device 和不能将 CUDA tensor 转换为 numpy 的矛盾

大家都在问