我正在从事一项工作,该工作要求在python中基本实现k-means聚类算法。我们正在使用的数据具有数字的原始等级和非数字的原始字母等级。我们应该编辑演讲中给出的代码以完成此作业。我一直收到一个错误,即无法将字符串转换为浮点数,我理解这一点,但是不确定如何继续。我们还没有讨论标签编码,所以我想它是不允许分配的,另外,我们还不能使用k模式,因为还没有教过它。谢谢。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import metrics
grades = pd.read_csv('grades.csv')
raw_grades = pd.DataFrame(grades,columns = ['raw grade','raw letter'])
raw_grades
np.random.seed(5)
myKMeans=KMeans(n_clusters=5).fit(raw_grades)
labels = kmeans.labels_
labels[0:20]
raw_grades看起来像这样:
raw grade raw letter
0 56.1 F 1 60.1 D = 2 60.4 D = 3 63.1 D = 4 64.4 D
我收到的错误消息是:
ValueError Traceback (most recent call
last)
<ipython-input-23-f444641587d3> in <module>
1 np.random.seed(5)
2 X = raw_grades[['raw grade','raw letter']]
----> 3 myKMeans=KMeans(n_clusters=5).fit(X)
4 labels = kmeans.labels_
5 labels[0:20]
/usr/local/lib/python3.6/dist-packages/sklearn/cluster/k_means_.py in
fit(self,X,y,sample_weight)
970 tol=self.tol,random_state=random_state,copy_x=self.copy_x,971 n_jobs=self.n_jobs,algorithm=self.algorithm,--> 972 return_n_iter=True)
973 return self
974
/usr/local/lib/python3.6/dist-packages/sklearn/cluster/k_means_.py in
k_means(X,n_clusters,sample_weight,init,precompute_distances,n_init,max_iter,verbose,tol,random_state,copy_x,n_jobs,algorithm,return_n_iter)
310 order = "C" if copy_x else None
311 X = check_array(X,accept_sparse='csr',dtype=[np.float64,np.float32],--> 312 order=order,copy=copy_x)
313 # verify that the number of samples given is larger than k
314 if _num_samples(X) < n_clusters:
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in
check_array(array,accept_sparse,accept_large_sparse,dtype,order,copy,force_all_finite,ensure_2d,allow_nd,ensure_min_samples,ensure_min_features,warn_on_dtype,estimator)
494 try:
495 warnings.simplefilter('error',ComplexWarning)
--> 496 array = np.asarray(array,dtype=dtype,order=order)
497 except ComplexWarning:
498 raise ValueError("Complex data not supported\n"
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in
asarray(a,order)
83
84 """
---> 85 return array(a,copy=False,order=order)
86
87
ValueError: could not convert string to float: 'A+'