spacy：给定文档中的字符串，如何在文档中查找字符串的开始和结束字符索引

2024-05-03 • 问答

import spacy
nlp = spacy.load('en')
doc = nlp('An example sentence in the city of london')
str1 = 'in the city'
str2 = 'example sentence'

我想在文档中找到所有str的开始和结束字符索引（从列表中）。如何使用spacy做到这一点？

到目前为止，我所做的是：复杂的循环匹配每个字符显然不能很好地缩放。

tszgcs 回答：spacy：给定文档中的字符串，如何在文档中查找字符串的开始和结束字符索引

如果您有Spacy文档实例，则“文档中的字符串”是文档的属性（请参阅相关文档here）。然后，您可以使用正则表达式：

import re
doc = nlp('An example sentence in the city of london')
listOfStrings = [ 'in the city','example sentence' ]

for s in listOfStrings:
    res = re.search(s,doc.text)
    if res:
        print(s,res.start(),res.end())

# in the city 20 31
# example sentence 3 19

python-3.x spacy

本文链接：https://www.f2er.com/3164678.html

spacy：给定文档中的字符串，如何在文档中查找字符串的开始和结束字符索引

tszgcs 回答：spacy：给定文档中的字符串，如何在文档中查找字符串的开始和结束字符索引

大家都在问