快速文本预训练句子相似度

2024-06-02 • 问答

我想使用快速文本预训练模型来计算相似度一组句子之间的一个句子。谁能帮我？最好的方法是什么？

我通过训练tfidf模型来计算句子之间的相似度。这样写代码。是否可以更改它并使用快速文本预训练模型？例如使用向量训练tfidf模型？

def generate_tfidf_model(sentences):
    print("generating TfIdf model")
    texts = [[sentence for sentence in doc.split()] for doc in sentences]
    dictionary = gensim.corpora.Dictionary(texts)    
    feature_cnt = len(dictionary.token2id)
    mycorpus = [dictionary.doc2bow(doc,allow_update=True) for doc in texts]
    tfidf_model = gensim.models.Tfidfmodel(mycorpus)
    index = gensim.similarities.SparseMatrixSimilarity(tfidf_model[mycorpus],num_features = feature_cnt)
    return tfidf_model,index,dictionary

def query_search(query,tfidf_model,dictionary):
    query = normal_stemmer_sentence(query)
    query_vector = dictionary.doc2bow(query.split())
    similarity = index[tfidf_model[query_vector]]
    return similarity

jaychpk 回答：快速文本预训练句子相似度

我认为，如果可以使用单词嵌入，则不必计算TfIdf。

一种简单而有效的方法包括：

使用针对您的语言的预训练词嵌入（例如FastText-get_sentence_vector https://fasttext.cc/docs/en/python-module.html#model-object）计算代表两个字符串的两个向量
计算两个向量之间的余弦相似度（1：相等的字符串； 0：真实不同的字符串；阅读https://masongallo.github.io/machine/learning,/python/2016/07/29/cosine-similarity.html）。

fasttext information-retrieval nlp sentence-similarity

本文链接：https://www.f2er.com/2974964.html

快速文本预训练句子相似度

jaychpk 回答：快速文本预训练句子相似度

大家都在问