我正在将LSTM自动编码器用于异常检测。由于异常数据与正常数据相比非常少,因此仅将正常实例用于训练。测试数据包括异常和正常实例。在训练期间,模型损失看起来不错。但是,在测试中,模型产生的数据准确性较差。即异常点和法线点没有很好地分开。
我的代码段如下:
.............
.............
X_train = X_train.reshape(X_train.shape[0],lookback,n_features)
X_valid = X_valid.reshape(X_valid.shape[0],n_features)
X_test = X_test.reshape(X_test.shape[0],n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N,activation='relu',input_shape=(timesteps,n_features),return_sequences=True))
lstm_model.add(LSTM(encoding_dim,return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps,return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse',optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",save_best_only=True,verbose=0)
tb = TensorBoard(log_dir='./logs',histogram_freq=0,write_graph=True,write_images=True)
lstm_model_history = lstm_model.fit(X_train,X_train,epochs=epochs,batch_size=batch,shuffle=False,verbose=1,validation_data=(X_valid,X_valid),callbacks=[cp,tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions),2),axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class,pred_y)
plt.figure(figsize=(5,5))
sns.heatmap(conf_matrix,xticklabels=LABELS,yticklabels=LABELS,annot=True,fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
请提出可以在模型中进行哪些操作以提高准确性。