我将数据分为两个部分,并针对数据集进行了K均值聚类以及归一化和PCA。现在,我想将聚类的图投影回箱图中,以检查哪些实例(行)位于哪些聚类中,并寻找异常值。
描述:使用熊猫加载数据,使用min_max_scaler进行标准化并进行预处理,应用PCA并进行聚类。
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing,decomposition,cluster
# Load data from input file
X = pd.read_csv("/Users/blah blah.csv")
X.plot.scatter(x=6,y=7)
# Normalise the data
min_max_scaler = preprocessing.MinmaxScaler()
np_scaled = min_max_scaler.fit_transform(X)
X_norm=pd.DataFrame(np_scaled,columns=X.columns)
# PCA
pca = decomposition.PCA(n_components=5)
pca_model = pca.fit(X_norm)
print(pca.explained_variance_ratio_)
print(pca_model.components_)
pca_array = pca_model.transform(X_norm)
X_pca = pd.DataFrame(data=pca_array,columns=['PC1','PC2','PC3','PC4','PC5'])
X_pca.plot.scatter(x=1,y=3)
# K-Means Implementation
n_clusters=5
kmeans = cluster.KMeans(n_clusters=n_clusters,init='random',n_init=1,algorithm='full')
ac = kmeans.fit(X_pca)
print('\n..........Cluster centers............\n')
print(kmeans.cluster_centers_)
print('\n.........Cluster labels.........\n')
print(kmeans.labels_)
print('\n.............Scatter Plot K-Means......... \n')
X_pca.plot.scatter(x=1,y=3,c=kmeans.labels_,cmap='rainbow',title='K-Means Clustering')
plt.show()