使用sentiment140数据的Tensorflow hub-NNLM词嵌入给出输入形状错误

我正在使用 tensorflow hub "https://tfhub.dev/google/nnlm-en-dim128/2" 词嵌入对 Kaggle "sentiment140" 数据集进行情感分析。

数据集:Kaggle("sentiment140") https://www.kaggle.com/kazanova/sentiment140 Tensorflow 中心:https://tfhub.dev/google/nnlm-en-dim128/2

这里我使用 keras 顺序层,当我拟合模型时,它给出了值错误

ValueError: Python inputs incompatible with input_signature:
      inputs: (
        Tensor("IteratorGetNext:0",shape=(None,128),dtype=float32))
      input_signature: (
        TensorSpec(shape=(None,),dtype=tf.string,name=None))

我的代码:

    import pandas as pd
import tensorflow as tf
from sklearn.model_selection import  train_test_split
import seaborn as sns
import tensorflow_hub as hub
from tensorflow.keras import Sequential
import keras

tweet_df = pd.read_csv("training.1600000.processed.noemoticon.csv",names=['polarity','id','date','query','user','text'],encoding='latin-1')

tweet_df.info()

tweet_df.head()

"""#### 2.) Data Visualization"""

tweet_df['polarity'] = tweet_df['polarity'].replace(to_replace=4,value=1)

### Print two movies reviews from each class

print("Movie Review Polarity Negative class 0 :\n",tweet_df[tweet_df['polarity']==0]['text'].head(2) )

print("\n\nmovie Review Polarity Positive class 1 :\n",tweet_df['text'][tweet_df['polarity']==1].head(2) )

class_dist = tweet_df['polarity'].value_counts().rename_axis('Class Label').reset_index(name='Tweets')
#class_dist = class_dist['Class Label'].replace({0:'Negative',1:'Positve'})
class_dist

## Bar graph of Distribution of Classes
class_dist['class'] = ['Positive','Negative']
sns.set_theme(style='whitegrid')
sns.barplot(x='Class Label',y='Tweets',hue='class',data= class_dist)

### Train and test split 
X = tweet_df.iloc[:,5]
y = tweet_df.iloc[:,0]
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=5,test_size=0.2)

print("Training shape of X and y : ",X_train.shape,y_train.shape)
print("Testing shape of X and y : ",X_test.shape,y_test.shape)

"""#### 3.) Data Pre-processing"""

embed = hub.load("https://tfhub.dev/google/nnlm-en-dim128/2")
X_train_embed = embed(X_train)

y_train = tf.keras.utils.to_categorical(y_train,2)

X_train_embed.shape


X_sample = X_train_embed[:1000]
y_sample = y_train[:1000]
y_sample = tf.keras.utils.to_categorical(y_sample,2)


"""#### 4.) Model Building"""

hub_layer = hub.KerasLayer('https://tfhub.dev/google/nnlm-en-dim128/2',input_shape=[],trainable=False)

model = Sequential()
model.add(hub_layer)
model.add(keras.layers.Dense(128,'relu',name ='layer_1'))
model.add(keras.layers.Dense(64,name = 'layer_2'))
model.add(keras.layers.Dense(2,activation='sigmoid',name='output'))

model.compile(optimizer='adam',loss= 'BinaryCrossentropy',#'categorical_crossentropy',metrics=['accuracy'] )

NN_model = model.fit(X_sample,y_sample,epochs=20,validation_split=0.1,verbose=1)

输入形状:

X_sample.shape

TensorShape([1000,128])

y_sample.shape

(1000,2,2)

X_sample

<tf.Tensor: shape=(1000,dtype=float32,numpy=
array([[ 0.10381411,0.07044576,-0.0282673,...,0.08205549,0.15822364,-0.10019408],[-0.03332436,-0.00529242,0.20348714,-0.14174528,0.05178985,-0.12599435],[ 0.2461916,-0.03084931,0.05861813,0.07956063,-0.03579932,0.07493019],[ 0.4102695,0.15445013,0.19045362,0.12681636,0.12362286,-0.03969387],[-0.0144283,-0.05236297,0.04851832,0.05562773,0.01529189,0.12605236],[ 0.29280087,0.05795274,-0.11779188,-0.01890504,0.02824693,-0.13629636]],dtype=float32)>
iCMS 回答:使用sentiment140数据的Tensorflow hub-NNLM词嵌入给出输入形状错误

https://tfhub.dev/google/nnlm-en-dim128/2 所述,该模型需要一个字符串向量作为输入。由于您正在执行,您基本上两次调用模型

embed = hub.load("https://tfhub.dev/google/nnlm-en-dim128/2")
X_train_embed = embed(X_train)  # (n,128) float matrix

然后将该嵌入传递给 model,它实际上将字符串作为输入,因为它以 NNLM KerasLayer 开头。

我建议删除 embedX_train_embed,只用 model.fit 调用 X_train

model.fit(np.array(["Lyx is cool","Lyx is not cool"]),np.array([1,0]),epochs=20,validation_split=0.1,verbose=1)
本文链接:https://www.f2er.com/103863.html

大家都在问