我试图在一个循环中训练1000x的Sequential模型.在每个循环中,我的程序都会泄漏内存,直到我用完并获得OOM异常.
我之前已经问了一个类似的问题
(Training multiple Sequential models in a row slows down)
并看到其他类似问题(Keras: Out of memory when doing hyper parameter grid search)
并且解决方案始终是在完成模型使用后将K.clear_session()添加到代码中.所以我在上一个问题中做到了这一点,我仍在泄露记忆
这是重现问题的代码.
- import random
- import time
- from keras.models import Sequential
- from keras.layers import Dense
- from keras import backend as K
- import tracemalloc
- def run():
- tracemalloc.start()
- num_input_nodes = 12
- num_hidden_nodes = 8
- num_output_nodes = 1
- random_numbers = random.sample(range(1000),50)
- train_x,train_y = create_training_dataset(random_numbers,num_input_nodes)
- for i in range(100):
- snapshot = tracemalloc.take_snapshot()
- for j in range(10):
- start_time = time.time()
- nn = Sequential()
- nn.add(Dense(num_hidden_nodes,input_dim=num_input_nodes,activation='relu'))
- nn.add(Dense(num_output_nodes))
- nn.compile(loss='mean_squared_error',optimizer='adam')
- nn.fit(train_x,train_y,nb_epoch=300,batch_size=2,verbose=0)
- K.clear_session()
- print("Iteration {iter}. Current time {t}. Took {elapsed} seconds".
- format(iter=i*10 + j + 1,t=time.strftime('%H:%M:%S'),elapsed=int(time.time() - start_time)))
- top_stats = tracemalloc.take_snapshot().compare_to(snapshot,'lineno')
- print("[ Top 5 differences ]")
- for stat in top_stats[:5]:
- print(stat)
- def create_training_dataset(dataset,input_nodes):
- """
- Outputs a training dataset (train_x,train_y) as numpy arrays.
- Each item in train_x has 'input_nodes' number of items while train_y items are of size 1
- :param dataset: list of ints
- :param input_nodes:
- :return: (numpy array,numpy array),train_x,train_y
- """
- data_x,data_y = [],[]
- for i in range(len(dataset) - input_nodes - 1):
- a = dataset[i:(i + input_nodes)]
- data_x.append(a)
- data_y.append(dataset[i + input_nodes])
- return numpy.array(data_x),numpy.array(data_y)
- run()
这是我从第一个内存调试打印得到的输出
/tensorflow/python/framework/ops.py:121:size = 3485 KiB(3485 KiB),count = 42343(42343)
/tensorflow/python/framework/ops.py:1400:size = 998 KiB(998 KiB),count = 8413(8413)
/tensorflow/python/framework/ops.py:116:size = 888 KiB(888 KiB),count = 32468(32468)
/tensorflow/python/framework/ops.py:1185:size = 795 KiB(795 KiB),count = 3179(3179)
/tensorflow/python/framework/ops.py:2354:size = 599 KiB(599 KiB),count = 5886(5886)
系统信息:
> python 3.5
> keras(1.2.2)
> tensorflow(1.0.0)