尝试在tfidxvectoizer输出上进行train_test_split时报告错误

2024-05-19 • 问答

我正在尝试向量化包含此数据框的“标题”列的文本。看来成功完成了。我认为提取目标用于我的预测。运行train_test_split时，出现以下错误：

TypeError：“ str”和“ float”的实例之间不支持“

# Import dependencies
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import GaussianNB

volunteer = pd.read_csv('https://assets.datacamp.com/production/repositories/1816/datasets/668b96955d8b252aa8439c7602d516634e3f015e/volunteer_opportunities.csv')

# Take the title text
title_text = volunteer['title']

# Create the vectorizer method
tfidf_vec = TfidfVectorizer()

# Transform the text into tf-idf vectors
text_tfidf = tfidf_vec.fit_transform(title_text)

# Split the dataset according to the class distribution of category_desc
y = volunteer["category_desc"]

#category_enc = pd.get_dummies(volunteer['category_desc'])
X_train,X_test,y_train,y_test = train_test_split(text_tfidf.toarray(),y,stratify=y)

# Create Naive Beyes model
nb = GaussianNB()

# Fit the model to the training data
nb.fit(X_train,y_train)

# Print out the model's accuracy
print(nb.score(X_test,y_test))

任何人都有解决方案，或者知道此错误的原因吗？

尝试在tfidxvectoizer输出上进行train_test_split时报告错误

a305069347 回答：尝试在tfidxvectoizer输出上进行train_test_split时报告错误

大家都在问