您可以使用此方法:
tokens = ["This","is","a","sentence"]
sentence = nlp.tokenizer.tokens_from_list(tokens)
print(sentence)
This is a sentence
,
如果您使用:
sentence = nlp.tokenizer.tokens_from_list(tokens)
与spacy.matcher / Matcher会出现错误:
尝试使用nlp()代替nlp.make_doc()或list(nlp.pipe())代替
列表(nlp.tokenizer.pipe())。
我的解决方式:在for循环中遍历每个项目:
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern = [{'LEMMA': 'sentence','POS': 'NOUN'}]
matcher.add('Searched Word',None,pattern)
X = ["Sentence one","Sentence two","Sentence three","sentence last !"]
for i in X.index:
doc = nlp(X[i])
matches = matcher(doc)
for match_id,start,end in matches:
matched_span = doc[start:end]
print(matched_span.text)
一种更好的方法是使用 nlp.pipe :
for doc in nlp.pipe(X):
print([token.text for token in doc])
也有利于更快的算法运行和更有效的文本处理。
希望这会有所帮助。谢谢。
本文链接:https://www.f2er.com/3002994.html