使用Python中的列表解码文本文件

2024-04-29 • 问答

我对这句话进行了编码：

这是一个了不起的“摘要”，而这是这个惊人的摘要的结尾。

对此：

1 2 3 4“ 5” 6 7：2 8 9 10 7 4 5。

对应的索引表（作为文本文件）为：

word,index
This,1
is,2
an,3
amazing,4
abstract,5
AND,6
this,7
the,8
end,9
of,10

现在我要从这些数字开始：'1 2 3 4“ 5” 6 7：2 8 9 10 7 4 5.'到使用索引表的相应单词。

我使用以下代码以切片列表形式打开索引表文本文件：

index_file = open("decompress.txt","r")

content_index = index_file.read().split()
print(content_index)

输出：

['word,index','This,1','is,2','an,3','amazing,4','abstract,5','AND,6','this,7','the,8','end,9','of,10']

然后我使用以下代码将每个元素切成新列表：

for line in content_index:
    fields = line.split(",")

输出：

['word','index']
['This','1']
['is','2']
['an','3']
['amazing','4']
['abstract','5']
['AND','6']
['this','7']
['the','8']
['end','9']
['of','10']

我尝试使用fields [0]和fields [1]和for循环解码数字，但是没有成功。任何帮助将不胜感激！

首先，最好使用dict并替换您的代码：

for line in content_index:
    fields = line.split(",")

收件人：

fields = {}
for line in content_index:
    word,number = line.split(',')
    fields[number] = word

然后，您可以使用正则表达式轻松用任何其他字符串替换特定模式（在您的情况下为数字）。用于查找数字的正则表达式为\d+，其中\d表示digit，而+表示one or more 所以：

import re

original_string = ' 1 2 3 4 "5" 6 7: 2 8 9 10 7 4 5. '

def replacement(match):
    """
    This function accepts regular expression match and returns corresponding replacement if it's found in `fields`
    """
    return fields.get(match.group(0),'')  # Learn more about match groups at `re` documentation.

result = re.sub(r'\d+',replacement,original_string)  # This line will iterate through original string,calling `replacement` for each number in this string,substituting return value to string.

所以最终的代码将是：

import re

fields = {}

with open('decompress.txt') as f:
    for line in f.readlines():
        word,')
        fields[number] = word

original_string = ' 1 2 3 4 "5" 6 7: 2 8 9 10 7 4 5. '

def replacement(match):
    """
    This function accepts regular expression match and returns corresponding replacement if it's found in `fields`
    """
    return fields.get(match.group(0),'')

result = re.sub(r'\d+',original_string)
print(result)

您可以在re库的Python文档中了解有关正则表达式的更多信息。这是用于文本处理和解析的非常强大的工具。

对于这种情况，您可以使用re module和多个comprehensions中的正则表达式。

第一次导入时，将所有行列出：

re.search

之后，将(.*)与模式,一起使用-选择任何想法，(\d+)-昏迷前，选择parsed_lines = [re.search(r'(.*),(\d+)',line) for line in lines if 'index' not in line]-大于后的数字。在这种情况下，请跳过文档的第一行。

fields = {int(line_match.group(2)): line_match.group(1) for line_match in parsed_lines}
# {1: 'This',2: 'is',3: 'an',4: 'amazing',5: 'abstract',#  6: 'AND',7: 'this',8: 'the',9: 'end',10: 'of'}

最后，创建一个字典，索引为键，文本为值。

parsed_lines = [re.search(r'(.*),\d+',line).group(1) for line in lines if 'index' not in line]

UPD：或在第二步中列出：

gcloud projects list

使用Python中的列表解码文本文件

q58603432 回答：使用Python中的列表解码文本文件

大家都在问