我正在尝试加载 GloVe 嵌入数据,当我打印出单词及其相应的嵌入时,出现异常。使用以下代码:
import pandas as pd
import numpy as np
filename = "glove.840B.300d.txt"
embeddings_dict = {}
with open(filename,'r',encoding='utf8') as f:
for line in f:
values = line.split()
word = values[0]
try:
vector = np.asarray(values[1:],"float32")
except ValueError:
print('Weird value detected with vector (showing first few values): ',values[:5])
print('\n')
pass
embeddings_dict[word] = vector
这是输出遇到问题的片段:
Weird value detected with vector (showing first few values): ['.','.','-0.1573','-0.29517']
Weird value detected with vector (showing first few values): ['at','name@domain.com','0.0061218','0.39595','-0.22079']
Weird value detected with vector (showing first few values): ['.','.']
Weird value detected with vector (showing first few values): ['to','0.33865','0.12698','-0.16885']
Weird value detected with vector (showing first few values): ['.','0.035974','-0.024421','0.71402']
Weird value detected with vector (showing first few values): ['.','0.033459']
Weird value detected with vector (showing first few values): ['email','0.33529','0.32949','0.2646']
Weird value detected with vector (showing first few values): ['or','0.48374','0.49669','-0.25089']
Weird value detected with vector (showing first few values): ['contact','0.016426','0.13728','0.18781']
Weird value detected with vector (showing first few values): ['Email','0.37344','0.024573','-0.12583']
Weird value detected with vector (showing first few values): ['on','0.037295','-0.15381','-0.045189']
Weird value detected with vector (showing first few values): ['At','Killerseats.com','-0.13854','-0.01706','-0.13651']
Weird value detected with vector (showing first few values): ['by','0.6882','-0.36436','0.62079']
Weird value detected with vector (showing first few values): ['in','mylot.com','-0.18148','0.47096','0.32916']
Weird value detected with vector (showing first few values): ['emailing','0.39173','-0.39132','-0.4266']
Weird value detected with vector (showing first few values): ['Contact','0.14933','-0.28605','0.3444']
Weird value detected with vector (showing first few values): ['at','0.44321','-0.40005','-0.20065']
Weird value detected with vector (showing first few values): ['•','-0.13288','-0.31383','-0.032356']
这些线路是怎么回事?我是否错误地导入了 GloVe?我应该忽略这些吗?我真的不知道如何处理这些。