ValueError：无法将字符串转换为float：'horse'

我正在做一个Keras CIFAR10学习实验，图像是从Kaggle那里得到的，该图像是CSV文件，两列位于'id'，另一个为'label'。从这里开始我知道我需要将标签转换为张量，但不知道该怎么做。我到处都在互联网上寻找方法，但是找不到与从kaggle读取CSV文件有关的任何内容。也许这不是做到这一点的方法。...

这是链接https://www.kaggle.com/c/cifar-10，但是没有内核作为示例。

预先感谢您的帮助。

我正在使用来自tensorflow.keras.xxxxxx

import pandas as pd
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)

# Add extension to id_code to train images
train_df['id'] = train_df['id'].apply(str) + ".png"

display(train_df.head())

def preprocess_image(path,sigmaX=40):
    """
    The whole preprocessing pipeline:
    1. Read in image
    3. Resize image to desired size
    """
    image = cv2.imread(path)
    image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
    image = cv2.resize(image,(IMG_WIDTH,IMG_HEIGHT))

    return image

# Add Image augmentation to our generator
train_datagen = ImageDataGenerator(rotation_range=360,horizontal_flip=True,vertical_flip=True,validation_split=0.25,rescale=1. / 255)

# Use the dataframe to define train and validation generators
train_generator = train_datagen.flow_from_dataframe(train_df,x_col='id',y_col='label',directory = TRAIN_IMG_PATH,target_size=(IMG_WIDTH,IMG_HEIGHT),batch_size=BATCH_SIZE,class_mode='other',preprocessing_function=preprocess_image,subset='training')

val_generator = train_datagen.flow_from_dataframe(train_df,subset='validation')

Batch_Size  = 64
epochs      = 25

# loop over the number of models to train
for i in np.arange(0,5):

    # initialize the optimizer and model
    print("[INFO] training model {}/{}".format(i + 1,5))
    opt = Adam(lr=1e-5)

    conv_base = Resnet50(weights='imagenet',include_top=False,input_shape=(32,32,3))

    model = models.Sequential()
    model.add(conv_base)
    model.add(layers.UpSampling2D((2,2)))
    model.add(layers.UpSampling2D((2,2)))
    model.add(layers.flatten())
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(128,activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(64,activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(10,activation='softmax'))

    early_stop = EarlyStopping('val_loss',patience=5)
    reduce_lr = ReduceLROnPlateau('val_loss',factor=0.01,patience=3,verbose=1)

    ############################################################################
    trained_models_path = './best_model_adam/'
    model_names = trained_models_path + 'epoch_{epoch:02d}_val_acc_{val_acc:.4f}_'
    model_checkpoint = ModelCheckpoint(model_names +"model_{}.hdf5".format(i),verbose=1,save_best_only=True)
    ############################################################################

    callbacks = [model_checkpoint,early_stop,reduce_lr]

    #model.compile(optimizer=optimizers.RMSprop(lr=2e-5),loss='binary_crossentropy',metrics=['acc'])
    model.compile(optimizer=Adam(lr=1e-5),metrics=['acc'])

    # train the network
    history = model.fit_generator(
                            train_generator,epochs = epochs,steps_per_epoch= train_df.shape[0] // Batch_Size,validation_data= val_generator,validation_steps = val_generator.shape[0] // Batch_Size,#batch_size = Batch_Size,callbacks = [model_checkpoint,early_stop]
                        )

    # save the model to disk
    p = ["./models/model_{}.model".format(i)]
    model.save(os.path.sep.join(p))

    # evaluate the network
    predictions = model.predict(testX,batch_size=64)
    report = classification_report(testY.argmax(axis=1),predictions.argmax(axis=1),target_names=labelNames)

    # save the classification report to file
    p = ["./output/model_{}.txt".format(i)]
    f = open(os.path.sep.join(p),"w")
    f.write(report)
    f.close()

当我运行fit_generator时，我得到了他的错误

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value,dtype,shape,name)
    244   """
    245   return _constant_impl(value,name,verify_shape=False,--> 246                         allow_broadcast=True)
    247 
    248 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value,verify_shape,allow_broadcast)
    252   ctx = context.context()
    253   if ctx.executing_eagerly():
--> 254     t = convert_to_eager_tensor(value,ctx,dtype)
    255     if shape is None:
    256       return t

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value,dtype)
    113     return t
    114   else:
--> 115     return ops.Eagertensor(value,handle,device,dtype)
    116 
    117 

ValueError: could not convert string to float: 'horse'

您可以将类别标签转换为数字，然后为这些数字新建一列。 scikit-learn对此具有内置功能，但是没有它就足够简单了：

import pandas as pd
df = pd.DataFrame({'label':['cat','dog','horse'],'b':[1,2,3]})
all_labels= df.label.unique().tolist()
all_labels.sort()
label_to_number={label:all_labels.index(label) for label in all_labels}
df['label_num']=df.apply(lambda r:label_to_number[r.label],axis=1)

现在，您可以将label_number发送给您的训练（y_col ='label_number'）。所有这些都假定整数类别是可以的，并且您不需要“一次性编码”-如果您这样做，则scikit再次为此提供了条件。从here来看，整数类别似乎很好。

@jeremy_rutman，谢谢！我知道了

import pandas as pd
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)

# Add extension to id_code to train images
train_df['id'] = train_df['id'].apply(str) + ".png"

all_labels = train_df['label'].unique().tolist() 
all_labels.sort() 
label_to_number={label:all_labels.index(label) for label in all_labels} 
train_df['label']=train_df.apply(lambda r:label_to_number[r.label],axis=1)


display(train_df.head())
print(train_df['id'])
```
The model if fitting now,but for some reason,my two GPU's cards 
are not kicking in.... I think lots of things got broken with TensorFlow
 2.0,but that is another topic...
thanks a lot for your help.

ValueError：无法将字符串转换为float：'horse'

limo2737 回答：ValueError：无法将字符串转换为float：'horse'

大家都在问