Glass Identification Database是一个不平衡的数据集,我想进行一些重采样。
有5种玻璃的214行数据。每种类型具有不同的行数。在下面,我想执行随机欠采样,将所有类型的采样数减至最小(即,每种类型仅包含9行。)
import pandas
dataset = pandas.read_csv("C:\\temp\\glass.csv"]),sep = ",")
dataset['Type'] = pandas.Categorical(dataset['Type']).codes
# Class count
count_class_0,count_class_1,count_class_2,count_class_3,count_class_4,count_class_5 = dataset.Type.value_counts()
# Divide by class
df_class_0 = dataset[dataset['Type'] == 0]
df_class_1 = dataset[dataset['Type'] == 1]
df_class_2 = dataset[dataset['Type'] == 2]
df_class_3 = dataset[dataset['Type'] == 3]
df_class_4 = dataset[dataset['Type'] == 4]
df_class_5 = dataset[dataset['Type'] == 5]
class_count = dataset.Type.value_counts()
print('Class 0:',class_count[0]) # 70
print('Class 1:',class_count[1]) # 76
print('Class 2:',class_count[2]) # 13
print('Class 3:',class_count[3]) # 29
print('Class 4:',class_count[4]) # 9
print('Class 5:',class_count[5]) # 17
# Random under-sampling
df_class_0_under = df_class_0.sample(count_class_4)
df_test_under = pandas.concat([df_class_0_under,df_class_4],axis=0)
print('Random under-sampling:')
print(df_test_under.Type.value_counts())
它显示未正确完成:
Random under-sampling:
0 13
4 9
完成它的正确方法是什么? (将所有类型的数量减至最少,即每种类型仅包含9行。)
谢谢。