用Python中的其他分数反击

我正在使用Counter来跟踪出现子字符串是文本的次数,但是我正在搜索具有最高得分的子字符串。分数定义为len(substring) * (occurences-1)

目前我正在这样做:

from collections import Counter
from operator import itemgetter

input_string = "My amazing string with all sorts of values in it. " + \
"Kaas is lekker! I want to know how many times a certain substring" + \
"of a minumum size appears in it,so I can so some value encodings" + \
". Kaas is lekker! Performance is a problem when a string become l" + \
"arger. Kaas is lekker! So many strings to replace,what string is" + \
" best?"

larger_then = 5
length = len(input_string)
subs = [input_string[i:j+1] \
  for i in range(0,length-larger_then) \
  for j in range(i+larger_then,length)]

countr = Counter()
countr.update(subs)

scores = map(lambda kv: (kv[0],len(kv[0]) * (kv[1]-1)),countr.most_common())
max_key,max_score = max(scores,key=itemgetter(1))

print("")
print("Max key is '{}' with score {}".format(max_key,max_score))
print("")
top_20 = list(countr.most_common(20))
print("Top 20 commons:",*top_20,sep="\n- ")

哪个返回:

Max key is '. Kaas is lekker! ' with score 36
Top 20 commons:
- ('string',5)
- (' strin',4)
- (' string',4)
- (' string ',3)
- ('string ',3)
- ('tring ',3)
- ('. Kaas',3)
- ('. Kaas ',3)
- ('. Kaas i',3)
- ('. Kaas is',3)
- ('. Kaas is ',3)
- ('. Kaas is l',3)
- ('. Kaas is le',3)
- ('. Kaas is lek',3)
- ('. Kaas is lekk',3)
- ('. Kaas is lekke',3)
- ('. Kaas is lekker',3)
- ('. Kaas is lekker!',3)
- ('. Kaas is lekker! ',3)
- (' Kaas ',3)

是否有更快的方法来计数具有不同分数的子字符串?

以下是带有运行代码的Repl.it的链接:https://repl.it/@keestalkstech/CavernousKindheartedScripts

hca0728 回答:用Python中的其他分数反击

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3096210.html

大家都在问