模型A :
ipt = Input(batch_shape=(32,240,4))
x1 = Conv1D(16,20,strides=200,padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16,200,strides=120,padding='same')(ipt)
x2 = BatchNormalization()(x2) # ...
模型B :
ipt = Input(batch_shape=(32,250,strides=200)(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16,strides=120)(ipt)
x2 = BatchNormalization()(x2) # ...
两者具有相同的权重形状-但是,由于B具有不同的构建顺序,因此A的优化器
weights
无法加载到B上(下面的图像和代码)。
这是一个更大的模型的一小段,每隔X个历元需要更改其timesteps
参数,并且ZeroPadding1D
似乎在使用时更改层构建顺序;这不会影响 model 权重,因为它们是通过字典进行映射的-而优化器权重是从列表到列表的顺序映射的。
在TF1和TF2以及带有keras
和tf.keras
的导入中均可重现。有什么问题,以及如何解决? Relevant Git
环境:Win-10操作系统,CUDA 10.0.130,cuDNN 7.6.0,Python 3.7.4,GTX 1070
观察:
- 交换任何其他层,而不仅仅是
BatchNormalization
-和concatenate
之前的任何层数;优化器权重最终仅在.get_weights()
中被交换 - 可以更改
strides
而不是batch_shape[1]
- 可以与
MaxPooling1D
一起使用strides > 1
-
padding='valid'
导致ZeroPadding1D
,但不更改构建顺序(不知道为什么)
model_A.summary()
:
Layer (type) Output Shape Param # Connected to
==================================================================================
input_1 (InputLayer) [(32,4)] 0
__________________________________________________________________________________
conv1d (Conv1D) (32,2,16) 1296 input_1[0][0]
__________________________________________________________________________________
conv1d_1 (Conv1D) (32,16) 12816 input_1[0][0]
__________________________________________________________________________________
bn_1 (BatchNormalization) (32,16) 64 conv1d[0][0]
__________________________________________________________________________________
bn_2 (BatchNormalization) (32,16) 64 conv1d_1[0][0]
__________________________________________________________________________________
concatenate (concatenate) (32,32) 0 bn_1[0][0]
bn_2[0][0]
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32,32) 0 concatenate[0][0]
__________________________________________________________________________________
dense (Dense) (32,1) 33 gap_0[0][0]
model_B.summary()
(请注意交换的图层)
input_2 (InputLayer) [(32,4)] 0
_____________________________________________________________________________________
conv1d_2 (Conv1D) (32,16) 1296 input_2[0][0]
_____________________________________________________________________________________
bn_1 (BatchNormalization) (32,16) 64 conv1d_2[0][0]
_____________________________________________________________________________________
conv1d_3 (Conv1D) (32,3,16) 12816 input_2[0][0]
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D) (32,16) 0 bn_1[0][0]
_____________________________________________________________________________________
bn_2 (BatchNormalization) (32,16) 64 conv1d_3[0][0]
_____________________________________________________________________________________
concatenate_1 (concatenate) (32,32) 0 zero_padding1d[0][0]
bn_2[0][0]
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32,32) 0 concatenate_1[0][0]
_____________________________________________________________________________________
dense_1 (Dense) (32,1) 33 gap_0[0][0]
最少可复制的代码:
# also works with `from keras`
from tensorflow.keras.layers import Input,Conv1D,ZeroPadding1D,concatenate
from tensorflow.keras.layers import BatchNormalization,Dense,GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x1 = Conv1D(16,padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16,padding='same')(ipt)
x2 = BatchNormalization()(x2)
x1,x2 = zero_pad(x1,x2)
preout = concatenate([x1,x2])
preout = GlobalAveragePooling1D()(preout)
out = Dense(1)(preout)
model = Model(ipt,out)
model.compile('adam','mse')
return model
def zero_pad(x1,x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff,0))(x1)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff),0))(x2)
return x1,x2
def make_data(batch_shape):
return (np.random.randn(*batch_shape),np.random.randint(0,(batch_shape[0],1)))
batch_shape_A = (32,4)
batch_shape_B = (32,4)
batch_shape_C = (32,4)
model_A = make_model(batch_shape_A)
model_B = make_model(batch_shape_B)
model_C = make_model(batch_shape_C) # 'control group'
x_A,y_A = make_data(batch_shape_A)
x_B,y_B = make_data(batch_shape_B)
x_C,y_C = make_data(batch_shape_C)
model_A.train_on_batch(x_A,y_A)
model_B.train_on_batch(x_B,y_B)
model_C.train_on_batch(x_C,y_C)
optimizer_weights_A = model_A.optimizer.get_weights()
model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")
model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print
输出:
model_C optimizer weights set successfully
ValueError: Optimizer weight shape (16,) not compatible with provided
weight shape (200,4,16)