我正在尝试向量化包含此数据框的“标题”列的文本。看来成功完成了。我认为提取目标用于我的预测。运行train_test_split时,出现以下错误:
TypeError:“ str”和“ float”的实例之间不支持“
# Import dependencies
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import GaussianNB
volunteer = pd.read_csv('https://assets.datacamp.com/production/repositories/1816/datasets/668b96955d8b252aa8439c7602d516634e3f015e/volunteer_opportunities.csv')
# Take the title text
title_text = volunteer['title']
# Create the vectorizer method
tfidf_vec = TfidfVectorizer()
# Transform the text into tf-idf vectors
text_tfidf = tfidf_vec.fit_transform(title_text)
# Split the dataset according to the class distribution of category_desc
y = volunteer["category_desc"]
#category_enc = pd.get_dummies(volunteer['category_desc'])
X_train,X_test,y_train,y_test = train_test_split(text_tfidf.toarray(),y,stratify=y)
# Create Naive Beyes model
nb = GaussianNB()
# Fit the model to the training data
nb.fit(X_train,y_train)
# Print out the model's accuracy
print(nb.score(X_test,y_test))
任何人都有解决方案,或者知道此错误的原因吗?