句子嵌入BERT

2024-05-20 • 问答

我需要一个信息。我使用了以下方法：https://towardsdatascience.com/improving-sentence-embeddings-with-bert-and-representation-learning-dfba6b444f6b提取特征，但出现词嵌入。

如果我想在数据上使用berT训练来嵌入句子，该怎么办？

示例：句子“我要跑步”->结果[1,768]数组嵌入谢谢。

我建议几个方法。如果您使用HuggingFace，请尝试以下tesseract：

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello,my dog is cute")).unsqueeze(0)  # 
Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]  # The last hidden-state is the first element of the 
output tuple

我邀请您使用leptonica。该项目使用暹罗或三元组网络结构对BERT / RoBERTa / DistilBERT / ALBERT / XLNet进行微调，以产生语义上有意义的句子嵌入。您可以雇用Flair测试。

或者，您可以尝试Flair TransformerDocumentEmbeddings。参见suggestion。

句子嵌入BERT

zgw_cq5948 回答：句子嵌入BERT

大家都在问