修复TF1.1.4上不存在的'ResourceExhaustedError',但2.0.0上没有

我正在尝试Tensorflow 2.0.0,并希望了解它在当前项目中的表现。

在Tensorflow 1.14上,尺寸和硬件相同,运行此模型没有问题。


def __init__(self,num_input=48959,num_hid1=2048,num_hid2=512,latent_rep=196,lr=0.00005,dtypetf=tf.float32,dtypenp=np.float32):

        self.graph = tf.Graph()
        actf=tf.nn.relu
        with self.graph.as_default():


            self.X=tf.placeholder(shape=[None,num_input],dtype=dtypetf )
            initializer=tf.variance_scaling_initializer(dtype=dtypetf)

            w1=tf.Variable(initializer([num_input,num_hid1]),dtype=dtypetf)
            w2=tf.Variable(initializer([num_hid1,num_hid2]),dtype=dtypetf)
            w3=tf.Variable(initializer([num_hid2,latent_rep]),dtype=dtypetf)
            w4=tf.Variable(initializer([latent_rep,dtype=dtypetf)
            w5=tf.Variable(initializer([num_hid2,dtype=dtypetf)
            w6=tf.Variable(initializer([num_hid1,num_input]),dtype=dtypetf)

            b1=tf.Variable(tf.zeros(num_hid1,dtype=dtypenp),dtype=dtypetf)
            b2=tf.Variable(tf.zeros(num_hid2,dtype=dtypetf)
            b3=tf.Variable(tf.zeros(latent_rep,dtype=dtypetf)
            b4=tf.Variable(tf.zeros(num_hid2,dtype=dtypetf)
            b5=tf.Variable(tf.zeros(num_hid1,dtype=dtypetf)
            b6=tf.Variable(tf.zeros(num_input,dtype=dtypetf)

            hid_layer1=actf(tf.matmul(self.X,w1)+b1)
            hid_layer2=actf(tf.matmul(hid_layer1,w2)+b2)
            hid_layer3=actf(tf.matmul(hid_layer2,w3)+b3)
            hid_layer4=actf(tf.matmul(hid_layer3,w4)+b4)
            hid_layer5=actf(tf.matmul(hid_layer4,w5)+b5)
            self.output=actf(tf.matmul(hid_layer5,w6)+b6)


def opt(self,loc='float32_mem.npy',epoches=250,batch_size=128,num_batches=100):

        imported = np.load(loc)
        with tf.Session(graph = self.graph) as sess:
            sess.run(self.init)
            for epoch in range(epoches):
                for iteration in range(num_batches):
                    sector = (iteration*batch_size)%1024
                    sector_end = (sector+batch_size-1)%1024
                    X_batch = imported[sector:sector_end]
                    sess.run(self.train,feed_dict={self.X:X_batch})

Tensorflow 2.0.0 Keras代码(不起作用)

keras.backend.set_floatx('float32')
model = keras.Sequential([
    keras.layers.Dense(48959,activation='relu'),keras.layers.Dense(1024,keras.layers.Dense(512,keras.layers.Dense(196,keras.layers.Dense(48959)
])
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])

model.fit(x=training_data,y=training_data,epochs=250,batch_size=128 )

或者我已经尝试过的:

epochs=250
batch_size=128
num_batches=100

imported  = np.load('float32_mem_3.npy')
for epoch in range(epochs):
    for iteration in range(num_batches):
        sector = (iteration*batch_size)%1024
        sector_end = (sector+batch_size-1)%1024
        X_batch = imported[sector:sector_end]
        model.train_on_batch(x=X_batch,y=X_batch)

我得到的错误是: ResourceExhaustedError: OOM when allocating tensor with shape[48959,48959] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

我想这无关紧要,但是我正在使用带有6GB VRAM的980 ti。

love990311 回答:修复TF1.1.4上不存在的'ResourceExhaustedError',但2.0.0上没有

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3156064.html

大家都在问