GloVe 嵌入中的奇怪值导致导入错误

我正在尝试加载 GloVe 嵌入数据,当我打印出单词及其相应的嵌入时,出现异常。使用以下代码:

import pandas as pd
import numpy as np
filename = "glove.840B.300d.txt"

embeddings_dict = {}

with open(filename,'r',encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        try:
            vector = np.asarray(values[1:],"float32")
        except ValueError:
            print('Weird value detected with vector (showing first few values): ',values[:5])
            print('\n')
            pass
        embeddings_dict[word] = vector

这是输出遇到问题的片段:

Weird value detected with vector (showing first few values):  ['.','.','-0.1573','-0.29517']


Weird value detected with vector (showing first few values):  ['at','name@domain.com','0.0061218','0.39595','-0.22079']


Weird value detected with vector (showing first few values):  ['.','.']


Weird value detected with vector (showing first few values):  ['to','0.33865','0.12698','-0.16885']


Weird value detected with vector (showing first few values):  ['.','0.035974','-0.024421','0.71402']


Weird value detected with vector (showing first few values):  ['.','0.033459']


Weird value detected with vector (showing first few values):  ['email','0.33529','0.32949','0.2646']


Weird value detected with vector (showing first few values):  ['or','0.48374','0.49669','-0.25089']


Weird value detected with vector (showing first few values):  ['contact','0.016426','0.13728','0.18781']


Weird value detected with vector (showing first few values):  ['Email','0.37344','0.024573','-0.12583']


Weird value detected with vector (showing first few values):  ['on','0.037295','-0.15381','-0.045189']


Weird value detected with vector (showing first few values):  ['At','Killerseats.com','-0.13854','-0.01706','-0.13651']


Weird value detected with vector (showing first few values):  ['by','0.6882','-0.36436','0.62079']


Weird value detected with vector (showing first few values):  ['in','mylot.com','-0.18148','0.47096','0.32916']


Weird value detected with vector (showing first few values):  ['emailing','0.39173','-0.39132','-0.4266']


Weird value detected with vector (showing first few values):  ['Contact','0.14933','-0.28605','0.3444']


Weird value detected with vector (showing first few values):  ['at','0.44321','-0.40005','-0.20065']


Weird value detected with vector (showing first few values):  ['•','-0.13288','-0.31383','-0.032356']

这些线路是怎么回事?我是否错误地导入了 GloVe?我应该忽略这些吗?我真的不知道如何处理这些。

ma_xiaohzhi 回答:GloVe 嵌入中的奇怪值导致导入错误

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/22389.html

大家都在问