设置为50至50的Train_test_split设置返回的准确度很高,但分成2个文件时返回的准确度较低

我有1个数据集(称为 train_plus_test.csv ),该数据集包含1275行,具有相应的列和标签,用于对“行走”和“躺着”两种活动进行分类。这是一个平衡的数据集,每个类的数目相同。

我在2种情况下实施随机森林

方案1:在 train_plus_test.csv 上进行训练,火车测试分度为0.75-0.25,它的准确度为91.8%

方案2:将上述文件 train_plus_test.csv 分为2个文件( training.csv ),然后在( testing.csv )进行测试分成75%-25%。然后,我在 train.csv 上训练模型,并在 test.csv 上进行预测,但准确性为52%。我现在想知道我到底在哪里错? @@

感谢您的阅读!

我在这里包括的python代码(如下)和上面的3个csv文件:

[GoogleDrive] https://drive.google.com/drive/folders/1AAOOFhR1QpoPPtSNTofBnouBaYHfFbir?usp=sharing&fbclid=IwAR10SjHCu-6Sszd-okes-IneAA8pWzals9-NNtAsmrw0ql28mk3geZfmnQI

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier

# Scenario 1 ==================>
dataset = pd.read_csv('train_plus_test.csv')
feature_cols = list(dataset.columns.values)
feature_cols.remove('label')
X = dataset[feature_cols] # Features
y = dataset['label'] # Target

clf_RF = RandomForestClassifier(n_estimators=100,random_state=0,max_features=8,min_samples_leaf=3)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25,random_state = 42)
clf_RF.fit(X_train,y_train)

y_pred_RF = clf_RF.predict(X_test)
print('accuracy of training')
print(metrics.accuracy_score(y_test,y_pred_RF))

# Scenario 2 ======= comment Secenario 1 before running the scenario 2 ===========>

train_dataset = pd.read_csv('train.csv')
test_dataset = pd.read_csv('test.csv')
feature_cols = list(train_dataset.columns.values)
feature_cols.remove('label')
clf_RF = RandomForestClassifier(n_estimators=100,min_samples_leaf=3 )
X = train_dataset[feature_cols] # Features
y = train_dataset['label'] # Target
clf_RF.fit(X,y)

X_test_data = test_dataset[feature_cols]
y_test_data = test_dataset['label']
y_test_pred = clf_RF.predict(X_test_data)
print('accuracy of testing')
print(metrics.accuracy_score(y_test_data,y_test_pred))
flyercatch 回答:设置为50至50的Train_test_split设置返回的准确度很高,但分成2个文件时返回的准确度较低

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/2768399.html

大家都在问