当用同义词检查两个字符串之间的相似性时,它几乎可以给我们带来超过肯定的准确性。例如
如果我们将两个字符串与word-net的同义词集进行比较,并使用wup_similarity方法,
word1= car
word2= horse
我们将几乎返回1.0
。
我该如何克服这个问题?
我想收到“如果我比较汽车,我只想检查文字网络中的车辆,而不是动物中的车辆”
请给我任何建议或建议,谢谢。
for word1 in pre_process_text_MARKING_SCHEME:
simi = []
for word2 in pre_process_text_ANSWER_SHEET:
sims = []
syns1 = wordnet.synsets(word1)
syns2 = wordnet.synsets(word2)
for sense1,sense2 in product(syns1,syns2):
d = wordnet.wup_similarity(sense1,sense2)
if d != None:
#print(d,"similarity between word ",sense1," and ",sense2)
sims.append(d)
if sims != []:
max_sim = max(sims)
#print("maximum similarity between",word1,word2," is ",max_sim )
simi.append(max_sim)
if simi != []:
max_final = max(simi)
final.append(max_final)
#print(max_final,"max_final between ",word2)
##---------------Final Output---------------##
similarity_index = numpy.mean(final)
similarity_index = round(similarity_index,2)
print("Similarity index value : ",similarity_index)
if similarity_index > 0.8:
print("Similar")
elif similarity_index >= 0.6:
print("Somewhat Similar")
else:
print("Not Similar")