替换未知数量的命名组

我正在研究这样的模式：

<type>"<prefix>"<format>"<suffix>";<neg_type>"<prefix>"<format>"<suffix>"

所以我在这里写了2个带有或不带前缀的示例：

n"prefix"#,##0"suffix";-"prefix"#,##0"suffix"
n#,##0"suffix";-#,##0"suffix"

事实上，我写了下面的正则表达式来捕获我的组：

raw = r"(?P<type>^.)(?:\"(?P<prefix>[^\"]*)\"){0,1}(?P<format>[^\"]*)(?:\"(?P<suffix>[^\"]*)\"){0,1};(?P<negformat>.)(?:\"(?P=prefix)\"){0,1}(?P=format)(?:\"(?P=suffix)\"){0,1}"

现在，我正在解析包含此类结构的大文本，我想替换前缀或后缀（仅当它们存在时！）。由于捕获的组数未知（可能为空），我不知道如何轻松地替换我（使用re.sub）。

此外，由于某些实现约束，我按顺序处理前缀和后缀（因此，即使属于同一句子，我也无法获得要替换的后缀而不是要替换的前缀。

首先，我们可以通过对字符串使用单引号来简化您的正则表达式。这消除了转义"字符的必要性。其次，{0,1}可以替换为?：

raw = r'(?P<type>^.)(?:"(?P<prefix>[^"]*)")?(?P<format>[^"]*)(?:"(?P<suffix>[^"]*)")?;(?P<negformat>.)(?:"(?P<prefix2>(?P=prefix))")?(?P=format)(?:"(?P<suffix2>(?P=suffix))")?'

请注意，我在上面添加了(?P<prefix2>)和(?P<suffix2)命名组，用于第二次出现的前缀和后缀。

我正在假设您的模式可以在文本中重复（如果模式仅出现一次，则此代码仍将起作用）。在那种情况下，必须从最后出现到第一次出现字符替换，以便即使进行字符替换后，由正则表达式引擎返回的start和last字符偏移信息也保持正确。同样，当我们发现模式出现时，必须首先按suffix2，prefix2，suffix和prefix的顺序进行替换。

我们使用re.finditer遍历文本以返回match对象并将它们形成为list，我们将其反转以使我们可以首先处理最后的匹配项：

import re

raw = r'(?P<type>^.)(?:"(?P<prefix>[^"]*)")?(?P<format>[^"]*)(?:"(?P<suffix>[^"]*)")?;(?P<negformat>.)(?:"(?P<prefix2>(?P=prefix))")?(?P=format)(?:"(?P<suffix2>(?P=suffix))")?'

s = """a"prefix"format"suffix";b"prefix"format"suffix"
x"prefix_2"format_2"suffix_2";y"prefix_2"format_2"suffix_2"
"""
new_string = s

matches = list(re.finditer(raw,s,flags=re.MULTILINE))
matches.reverse()
if matches:
    for match in matches:
        if match.group('suffix2'):
            new_string = new_string[0:match.start('suffix2')] + 'new_suffix' + new_string[match.end('suffix2'):]
        if match.group('prefix2'):
            new_string = new_string[0:match.start('prefix2')] + 'new_prefix' + new_string[match.end('prefix2'):]
        if match.group('suffix'):
            new_string = new_string[0:match.start('suffix')] + 'new_suffix' + new_string[match.end('suffix'):]
        if match.group('prefix'):
            new_string = new_string[0:match.start('prefix')] + 'new_prefix' + new_string[match.end('prefix'):]
print(new_string)

打印：

a"new_prefix"format"new_suffix";b"new_prefix"format"new_suffix"
x"new_prefix"format_2"new_suffix";y"new_prefix"format_2"new_suffix"

出于演示目的，以上代码对每次出现的模式进行了相同的替换。

第二点担心：

没有什么可以阻止您对文本进行两次传递，一次是替换前缀，一次是替换后缀，因为这些众所周知。显然，您只需要为每次通过检查某些组，但是您仍可以使用相同的正则表达式。当然，对于模式的每次出现，您都可以进行唯一的替换。上面的代码显示了如何查找和进行替换。

允许0到9个实例或前缀

import re

raw = r'(?P<type>^.)(?:"(?P<prefix>[^"]*)")?(?P<format>[^"]*)(?:"(?P<suffix>[^"]*)")?;(?P<negformat>.)(?P<prefix2>(?:"(?P=prefix)"){0,9})(?P=format)(?:"(?P<suffix2>(?P=suffix))")?'

s = """a"prefix"format"suffix";b"prefix""prefix""prefix"format"suffix"
x"prefix_2"format_2"suffix_2";y"prefix_2"format_2"suffix_2"
"""
new_string = s

matches = list(re.finditer(raw,flags=re.MULTILINE))
matches.reverse()
if matches:
    for match in matches:
        if match.group('suffix2'):
            new_string = new_string[0:match.start('suffix2')] + 'new_suffix' + new_string[match.end('suffix2'):]
        if match.group('prefix2'):
            start = match.start('prefix2')
            end = match.end('prefix2')
            repl = s[start:end]
            n = repl.count('"') // 2
            new_string = new_string[0:start] + (n * '"new_prefix"') + new_string[end:]
        if match.group('suffix'):
            new_string = new_string[0:match.start('suffix')] + 'new_suffix' + new_string[match.end('suffix'):]
        if match.group('prefix'):
            new_string = new_string[0:match.start('prefix')] + 'new_prefix' + new_string[match.end('prefix'):]
print(new_string)

打印：

a"new_prefix"format"new_suffix";b"new_prefix""new_prefix""new_prefix"format"new_suffix"
x"new_prefix"format_2"new_suffix";y"new_prefix"format_2"new_suffix"

替换未知数量的命名组

y48108320 回答：替换未知数量的命名组

大家都在问