从我的评论中复制:
使用“作为分隔符分割后,您可以简单地提取列表中所有奇数索引的元素。然后,正常分割这些元素(使用空格分隔符),并将列表连接在一起。
示例:
text = """Lorem "ipsum dolor sit amet,consectetur adipiscing elit.". Praesent non sem urna. Pellentesque elementum "turpi'" est,"in fermentum diam auctor aliquam!". Morbi rhoncus erat ipsum,eu "tristique" """
text_split_by_quotes = text.split('"')
# get the odd-indexed elements (here's one way to do it):
text_in_quotes = text_split_by_quotes[1::2]
# split each normally (by whitespace) and flatten the list (here's one way to do it):
ans = []
for text in text_in_quotes:
ans.extend(text.split())
# print answer
print(ans)
>>> ['ipsum','dolor','sit','amet,','consectetur','adipiscing','elit.',"turpi'",'in','fermentum','diam','auctor','aliquam!','tristique']
,
以下是两种可能的方法:
desired = [
'ipsum','turpi\'','tristique'
]
text = """
Lorem "ipsum dolor sit amet,consectetur adipiscing elit.". Praesent non sem
urna. Pellentesque elementum "turpi'" est,"in fermentum diam auctor aliquam!".
Morbi rhoncus erat ipsum,eu "tristique"
"""
def extract_quoted(text):
words = []
next_pos = -1
while True:
try:
pos = text.index('"',next_pos + 1)
except ValueError:
break
try:
next_pos = text.index('"',pos + 1)
except ValueError as e:
raise ValueError("mismatched quotes") from e
quoted_segment = text[pos + 1:next_pos]
words.extend(quoted_segment.split())
return words
def split_only(text):
return [word for chunk in text.split('"')[1::2] for word in chunk.split()]
if __name__ == "__main__":
print(extract_quoted(text) == desired)
print(split_only(text) == desired)
第一个是更明确地说明文本的形式
“解析”,而第二个可能更多是基于浮华的单行拆分
您正在寻找的方法。
,
我尝试过:
a = """Lorem "ipsum dolor sit amet,eu "tristique" """
in_quote = 0
res = []
word = ''
for i in a:
if i == '"':
in_quote = 1 - in_quote
if word:
res+=[word]
word = ''
elif in_quote:
if i == ' ':
res+=[word]
word = ''
else:
word+=i
print(res)
,
检查一下这种逻辑,基本上是在拆分之后,您可以选择第二个索引,因为您开始时在文本中没有双引号。
text = 'Lorem "ipsum dolor sit amet,consectetur adipiscing elit.". Praesent non sem urna. Pellentesque elementum "turpi" est,eu "tristique"'
print(text)
split_text = text.split('"')
print(split_text)
new_split_text = [elem for i,elem in enumerate(split_text) if i%2 == 1]
print(new_split_text)
如果您要一支衬板:
new_split_text = [elem for i,elem in enumerate(text.split('"')) if i%2 == 1]
输出:
['ipsum dolor sit amet,consectetur adipiscing elit.','turpi','in fermentum diam auctor aliquam!','tristique']
本文链接:https://www.f2er.com/3025700.html