Keras词嵌入矩阵的第一行为零

2024-05-19 • 问答

我正在查看Keras Glove单词嵌入示例，但不清楚为什么嵌入矩阵的第一行填充零。

首先，在单词与数组相关联的位置创建嵌入索引。

embeddings_index = {}
with open(os.path.join(GLOVE_DIR,'glove.6B.100d.txt')) as f:
    for line in f:
        word,coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs,'f',sep=' ')
        embeddings_index[word] = coefs

然后通过查看标记器创建的索引中的单词来创建嵌入矩阵。

# prepare embedding matrix
num_words = min(MAX_NUM_WORDS,len(word_index) + 1)
embedding_matrix = np.zeros((num_words,EMBEDDING_DIM))
for word,i in word_index.items():
    if i >= MAX_NUM_WORDS:
        continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector

由于循环将从i=1开始，因此如果矩阵初始化不同，则第一行将仅包含零和随机数。有跳过第一行的原因吗？

zjhsir123 回答：Keras词嵌入矩阵的第一行为零

整个过程始于Tokenizer的程序员出于某种原因保留索引0或出于某种兼容性（某些其他语言使用1进行索引）或编码的事实技术原因。

但是，他们使用numpy时，它们希望使用以下简单的索引：

embedding_matrix[i] = embedding_vector

索引，因此[0]索引行保持为零，并且不会出现“如果矩阵初始化不同，则为随机数” 的情况，因为此数组已被用零初始化。因此，从这一行开始，我们根本不需要第一行，但是您无法删除它，因为numpy数组会失去将其索引与令牌生成器的索引对齐的作用。

glove keras word-embedding

本文链接：https://www.f2er.com/2839025.html

Keras词嵌入矩阵的第一行为零

zjhsir123 回答：Keras词嵌入矩阵的第一行为零

大家都在问