如果要匹配整个字符串,则可以匹配小数点并重复以逗号开头的模式。
然后使用相同的模式并重复以|
开头的模式
^[+-]?\d+\.\d+(?:,[+-]?\d+\.\d+)*(?:\|[+-]?\d+\.\d+(?:,[+-]?\d+\.\d+)*)*$
-
^
字符串的开头
-
[+-]?\d+\.\d+
匹配可选的+
或-
和小数部分
-
(?:
非捕获组
-
,[+-]?\d+\.\d+
匹配与逗号之前相同的模式
-
)*
关闭组并重复0次以上
-
(?:
非捕获组
-
\|
匹配|
-
[+-]?\d+\.\d+
匹配可选的+
或-
和小数部分
-
(?:
非捕获组
-
,[+-]?\d+\.\d+
匹配与逗号之前相同的模式
-
)*
关闭组并重复0次以上
-
)*
关闭组并重复0次以上
-
$
字符串结尾
regex demo
,
这就是解析器的作用(检查正确的格式):
from parsimonious.grammar import Grammar
data = """
37.1000,-88.1000
37.1000,-88.1000|37.1450,-88.1060
37.1000,-88.1060|35.1450,-83.1060
"""
grammar = Grammar(
r"""
line = pair (pipe pair)*
pair = point ws? comma ws? point
point = ~"-?\d+(?:.\d+)?"
comma = ","
pipe = "|"
ws = ~"\s+"
"""
)
for line in data.split("\n"):
try:
grammar.parse(line)
print("Correct format: {}".format(line))
except:
print("Not correct: {}".format(line))
这将产生
Not correct:
Correct format: 37.1000,-88.1000
Correct format: 37.1000,-88.1060
Correct format: 37.1000,-83.1060
Not correct:
Bot Not correct:
语句来自空行。
如果您实际上要检索值,则需要编写另一个
Visitor
类:
class Points(NodeVisitor):
grammar = Grammar(
r"""
line = pair (pipe pair)*
pair = point ws? comma ws? point
point = ~"-?\d+(?:.\d+)?"
comma = ","
pipe = "|"
ws = ~"\s+"
"""
)
def generic_visit(self,node,visited_children):
return visited_children or node
def visit_pair(self,visited_children):
x,*_,y = visited_children
return (x.text,y.text)
def visit_line(self,visited_children):
pairs = [visited_children[0]]
for potential_pair in [item[1] for item in visited_children[1]]:
pairs.append(potential_pair)
return pairs
point = Points()
for line in data.split("\n"):
try:
pairs = point.parse(line)
print(pairs)
except ParseError:
print("Not correct: {}".format(line))
,
您甚至不需要正则表达式。保持简单。
步骤1
分割为,
。
s.split(',')
步骤2
在|
上分割,并确保每个结果的类型均为float
(相反,可以毫无错误地将其转换为该类型)。如果不需要,可以删除此处的第二步(验证)。
r = s.split('|')
for v in r:
try:
float(v)
except ValueError:
print(v + ' is not a float')
步骤3
结合。
Test it here
strings = [
'37.1000,-88.1000','37.1000,-88.1060',-83.1060'
]
def split_on_comma(s):
return s.split(',')
def split_on_bar(s):
r = s.split('|')
for v in r:
try:
float(v)
except ValueError:
print(v + ' is not a float')
return r
for s in strings:
for c in split_on_comma(s):
print(split_on_bar(c))
没有验证和功能,您的代码将变为:
for s in strings:
for c in s.split(','):
for b in c.split('|'):
print(b)
您可以根据自己的喜好更改输出,但这显示了拆分和验证数据所需的每个步骤。
,
如果要成对检索值,并且使用简单的正则表达式或仅使用split()
for value in values:
pairs = re.findall("([\d.,-]+)\|?",value)
for pair in pairs:
v1,v2 = pair.strip().split(",")
# or
for value in values:
pairs = value.split("|")
for pair in pairs:
v1,")
本文链接:https://www.f2er.com/3164702.html