用Python中的其他分数反击

2024-05-19 • 问答

我正在使用Counter来跟踪出现子字符串是文本的次数，但是我正在搜索具有最高得分的子字符串。分数定义为len(substring) * (occurences-1)。

目前我正在这样做：

from collections import Counter
from operator import itemgetter

input_string = "My amazing string with all sorts of values in it. " + \
"Kaas is lekker! I want to know how many times a certain substring" + \
"of a minumum size appears in it,so I can so some value encodings" + \
". Kaas is lekker! Performance is a problem when a string become l" + \
"arger. Kaas is lekker! So many strings to replace,what string is" + \
" best?"

larger_then = 5
length = len(input_string)
subs = [input_string[i:j+1] \
  for i in range(0,length-larger_then) \
  for j in range(i+larger_then,length)]

countr = Counter()
countr.update(subs)

scores = map(lambda kv: (kv[0],len(kv[0]) * (kv[1]-1)),countr.most_common())
max_key,max_score = max(scores,key=itemgetter(1))

print("")
print("Max key is '{}' with score {}".format(max_key,max_score))
print("")
top_20 = list(countr.most_common(20))
print("Top 20 commons:",*top_20,sep="\n- ")

哪个返回：

Max key is '. Kaas is lekker! ' with score 36
Top 20 commons:
- ('string',5)
- (' strin',4)
- (' string',4)
- (' string ',3)
- ('string ',3)
- ('tring ',3)
- ('. Kaas',3)
- ('. Kaas ',3)
- ('. Kaas i',3)
- ('. Kaas is',3)
- ('. Kaas is ',3)
- ('. Kaas is l',3)
- ('. Kaas is le',3)
- ('. Kaas is lek',3)
- ('. Kaas is lekk',3)
- ('. Kaas is lekke',3)
- ('. Kaas is lekker',3)
- ('. Kaas is lekker!',3)
- ('. Kaas is lekker! ',3)
- (' Kaas ',3)

是否有更快的方法来计数具有不同分数的子字符串？

以下是带有运行代码的Repl.it的链接：https://repl.it/@keestalkstech/CavernousKindheartedScripts

用Python中的其他分数反击

hca0728 回答：用Python中的其他分数反击

大家都在问