从句子的numpy数组到嵌入的数组

我正在学习使用tensorflow并尝试对文本进行分类。我有一个数据集,其中每个文本都与标签0或1相关联。我的目标是使用一些句子嵌入进行分类。首先,我使用Gnews预编译嵌入从整个文本创建了嵌入:

embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding,input_shape=[2],dtype=tf.string,trainable=True,output_shape=[None,20])

现在我想尝试其他方法(类似于此方法http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/),我想:

  • 将每个文本分成几组。
  • 为每个文本创建一个嵌入数组,每个句子一个。
  • 将其用作我的模型的输入。

我能够将句子中的文本分开。每个文本都是保存为以下内容的句子数组:

[array(['AITA - Getting Hugged At The Bar .','This all happened less than an hour ago..','I was at a bar I frequent and talking to some people I know,suddenly I feel someone from behind me hugging and starting to grind against me.',"I know a lot of people at the bar,and assume it's a friend of mine,but when I look down at the shoes I do not recognize them.",'I look back and I see a dude I do not know,nor have I ever seen.',"He looks back at me,with horror in his eyes,because I'm a dude too...",'I feel an urge of rage inside me and shove him in the chest with my elbow so I can get away..','He goes to his table and I go back to mine.','I was with my roommate and his girlfriend.','They asked what happened and I told them,then I see the guy who hugged me looking around for me.','Him and two of his friends come up to us and he says:  .','"I just wanted to apologize,I thought you were someone else.".','I respond,"I understand,just check before you hug people.','Now,please fuck off".','He repeats his last statement,so do I.','This happens one more time and at this point his friends have surrounded me,my roommate is on his feet and I have left my beer at the table.','His friend goes in my face and says.','.','"He just wanted to apologize,you really shouldn\'t be yelling at us" and starts waiving his finger at me.. We are at a rock bar,it\'s loud,I was speaking louder just to be sure I am heard..','The manager knows me so he comes asking me what happened.','I explain the situation and he speaks with them then he tells me.','"They want to say sorry,can you guys shake hand?','".','"Yeah sure,I just want them to leave me alone."',"Honestly I didn't even want to touch the guy,but whatever.","We shake hands and they go away.. Me and my roommate look at their table and there's no one that looks anything like me.",'So,reddit,did I overreact?','Am I The Asshole here?'],dtype='<U190')
 array(["AITA if i don't want to pay my friend 5 dollars for a slice of pizzaSo,my friend bought herself,our other friend and I a pizza to eat for lunch.",'Me and other friend ate 1 slice of pizza from an extra large pizza.','Other friend has already paid my friend that bought the pizza 5 dollars..','I am trying to save money wherever i can,but she really wants me to pay her 5 dollars "so its fair".. AITA?'],dtype='<U146')

现在,当我尝试从数组的一个元素创建嵌入时,它就可以工作了。这是我的嵌入功能:

def embedding_f(test):
    print("test shape:",test.shape)
    # a = tf.constant(test)
    embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
    hub_layer = hub.KerasLayer(embedding,input_shape=[],20])
    ret = hub_layer(test)
    # print(ret)
    return ret.numpy()

# Works
emb = cnn.embedding_f(train_data[0])

但是,如果我尝试输入一批数据(如稍后在管道中所做的那样,程序会崩溃

# Crashes
emb = cnn.embedding_f(train_data[0:2])



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-76f4f9171cad> in <module>
----> 1 emb = cnn.embedding_f(train_data[0:2])

~/AITA/aita/cnn.py in embedding_f(test)
     22     hub_layer = hub.KerasLayer(embedding,23                                trainable=True,20])
---> 24     ret = hub_layer(test)
     25     # print(ret)
     26     return ret.numpy()

/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self,*args,**kwargs)
    817           return ops.convert_to_tensor_v2(x)
    818         return x
--> 819       inputs = nest.map_structure(_convert_non_tensor,inputs)
    820       input_list = nest.flatten(inputs)
    821 

/usr/lib/python3.8/site-packages/tensorflow/python/util/nest.py in map_structure(func,*structure,**kwargs)
    615 
    616   return pack_sequence_as(
--> 617       structure[0],[func(*x) for x in entries],618       expand_composites=expand_composites)
    619 

/usr/lib/python3.8/site-packages/tensorflow/python/util/nest.py in <listcomp>(.0)
    615 
    616   return pack_sequence_as(
--> 617       structure[0],618       expand_composites=expand_composites)
    619 

/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py in _convert_non_tensor(x)
    815         # `SparseTensors` can't be converted to `Tensor`.
    816         if isinstance(x,(np.ndarray,float,int)):
--> 817           return ops.convert_to_tensor_v2(x)
    818         return x
    819       inputs = nest.map_structure(_convert_non_tensor,inputs)

/usr/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor_v2(value,dtype,dtype_hint,name)
   1276     ValueError: If the `value` is a tensor not of given `dtype` in graph mode.
   1277   """
-> 1278   return convert_to_tensor(
   1279       value=value,1280       dtype=dtype,/usr/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value,name,as_ref,preferred_dtype,ctx,accepted_result_types)
   1339 
   1340     if ret is None:
-> 1341       ret = conversion_func(value,dtype=dtype,name=name,as_ref=as_ref)
   1342 
   1343     if ret is NotImplemented:

/usr/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py in _default_conversion_function(***failed resolving arguments***)
     50 def _default_conversion_function(value,as_ref):
     51   del as_ref  # Unused.
---> 52   return constant_op.constant(value,name=name)
     53 
     54 

/usr/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in constant(value,shape,name)
    259     ValueError: if called on a symbolic tensor.
    260   """
--> 261   return _constant_impl(value,verify_shape=False,262                         allow_broadcast=True)
    263 

/usr/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value,verify_shape,allow_broadcast)
    268   ctx = context.context()
    269   if ctx.executing_eagerly():
--> 270     t = convert_to_eager_tensor(value,dtype)
    271     if shape is None:
    272       return t

/usr/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value,dtype)
     94       dtype = dtypes.as_dtype(dtype).as_datatype_enum
     95   ctx.ensure_initialized()
---> 96   return ops.Eagertensor(value,ctx.device_name,dtype)
     97 
     98 

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

错误指出无法将Numpy数组转换为张量。我尝试将KerasLayer的input_shape参数更改为无济于事。我看到的唯一解决方案是,在找到结果到网络其余部分之前,逐个循环遍历所有文本,以计算每个文本的嵌入,但这似乎效率很低(并且笔记本电脑需要太多内存)。我看到的带有词嵌入的示例,不过可以这样做。

从多个句子中获取嵌入列表的正确方法是什么?

iCMS 回答:从句子的numpy数组到嵌入的数组

我认为您的 output_shape 应该设置为 [20](来自 https://www.tensorflow.org/hub/api_docs/python/hub/KerasLayer):

hub.KerasLayer("/tmp/text_embedding_model",output_shape=[20],# Outputs a tensor with shape [batch_size,20].
    input_shape=[],# Expects a tensor of shape [batch_size] as input.
    dtype=tf.string)    # Expects a tf.string input tensor.

使用 TF 2.4.1 和 tensorflow_hub 0.11.0,这对我有用:

data = np.array(['AITA - Getting Hugged At The Bar .','This all happened less than an hour ago..'])
model_url = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
embedding = hub.KerasLayer(model_url,input_shape=[],dtype=tf.string,trainable=True,output_shape=[20])(data)

如果不想在KerasLayer之上添加图层,也可以直接调用

model = hub.load(model_url)
embedding = model(data)
本文链接:https://www.f2er.com/2270317.html

大家都在问