实体标签是令牌的属性(请参见here)
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')
s = "His friend Nicolas is here."
doc = nlp(s)
print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His','friend','PERSON','is','here','.']
print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here .
编辑:
为了处理实体可以跨越多个单词的情况,可以使用以下代码代替:
s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
newString = s
for e in reversed(doc.ents): #reversed to not modify the offsets of other entities when substituting
start = e.start_char
end = start + len(e.text)
newString = newString[:start] + e.label_ + newString[end:]
print(newString)
#His friend PERSON is here with PERSON and PERSON.
,
@DBaker 答案的略短版本,它使用 end_char
而不是计算它:
for ent in reversed(doc.ents):
text = text[:ent.start_char] + ent.label_ + text[ent.end_char:]
,
当实体可以跨越多个单词时,对上面@DBaker 的解决方案进行了更优雅的修改:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')
nlp.add_pipe("merge_entities")
s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His','with','and','.']
print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here with PERSON and PERSON .
您可以查看有关 Spacy here 的文档。它使用内置的流水线来完成这项工作,并且对多处理有很好的支持。我相信这是官方支持的用标签替换实体的方式。
本文链接:https://www.f2er.com/3159248.html