Python：通过使用\ n作为换行符来读取文件。文件还包含\ r \ n

Python的行拆分算法无法完成您想要的；以\r\n结尾的行也以\r结尾。最多，您可以将newline参数设置为'\n'或''，如果行以\r\n而不是{{ 1}}。您可以使用生成器函数为您完成此操作：

\n

然后将其用作：

def collapse_CRLF(fileobject):
    buffer = []
    for line in fileobject:
        if line.endswidth('\r\n'):
            buffer.append(line)
        else:
            yield ''.join(buffer) + line
            buffer = []
   if buffer:
       yield ''.join(buffer)

但是，如果这是CSV文件，那么您确实要使用csv module 。它已经为您处理了混合了with collapse_CRLF(open(outputfile,encoding="ISO-8859-15",newline='')) as csvfile:和\r\n结尾的文件，因为它已经知道如何在RFC 4180 CSV files中保留裸换行符了：

\n

请注意，在有效的CSV文件中，import csv with open(outputfile,newline='') as inputfile: reader = csv.reader(inputfile)是行之间的分隔符，而\r\n在列值中有效。因此，如果出于任何原因不想使用\n模块此处，您仍然想使用csv。

摘自标准库中内置函数open的文档：

从流中读取输入时，如果换行符为None，则启用通用换行符模式。输入中的行可以以'\ n'，'\ r'或'\ r \ n'结尾，在返回给调用者之前，这些行会转换为'\ n'。如果为”，则启用通用换行模式，但行结尾不翻译就返回给呼叫者。如果它具有其他任何合法值，则输入行仅由给定的字符串终止，并且该行的末尾不翻译就返回给调用方。

文件对象本身无法明确区分数据字节（在您的情况下）'\r\n'与分隔符'\n' –这是字节解码器的权限。因此，可能作为一种选择，可以编写自己的decoder并使用关联的编码作为text file中的encoding。但这有点乏味，对于小文件，使用更简单的方法（使用re模块）要容易得多。 @Martijn Pieters提出的解决方案应用于迭代大文件。

import re

with open('data.csv','tr',newline='') as f:
    file_data = f.read()

# Approach 1:
lines1 = re.split(r'(?<!\r)\n',file_data)
if not lines1[-1]:
    lines1.pop()
# Approach 2:
lines2 = re.findall(r'(?:.+?(?:\r\n)?)+',file_data)
# Approach 3:
iterator_lines3 = map(re.Match.group,re.finditer(r'(?:.+?(?:\r\n)?)+',file_data))

assert lines1 == lines2 == list(iterator_lines3)
print(lines1)

如果我们需要在每行末尾使用'\n'：

# Approach 1:
nlines1 = re.split(r'(?<!\r\n)(?<=\n)',file_data)
if not nlines1[-1]:
    nlines1.pop()
# Approach 2:
nlines2 = re.findall(r'(?:.+?(?:\r\n)?)+\n?',file_data)
# Approach 3:
iterator_nlines3 = map(re.Match.group,re.finditer(r'(?:.+?(?:\r\n)?)+\n',file_data))

assert nlines1 == nlines2 == list(iterator_nlines3)
print(nlines1)

结果：

['Hello\r\nmy name is Alex','Hello\r\nmy name is John']
['Hello\r\nmy name is Alex\n','Hello\r\nmy name is John\n']

我确定您的答案是完全正确的，并且在技术上是先进的。遗憾的是，CSV文件完全不符合RFC 4180 。

因此，我将采用以下解决方案并更正我的临时字符“ ||”之后：

with open(outputfile_corrected,'w') as correctedfile_handle:
    with open(outputfile,newline='') as csvfile:
        csvfile_content = csvfile.read()
        csvfile_content_new = csvfile_content.replace('\r\n','||')
    correctedfile_handle.write(csvfile_content_new)

（有人对此发表了评论，但答案已被删除）

您事先对文件进行规范化是否可行？用以下代码段将所有CRLF替换为LF：

'%ex-ex'

Python：通过使用\ n作为换行符来读取文件。文件还包含\ r \ n

zj005386 回答：Python：通过使用\ n作为换行符来读取文件。文件还包含\ r \ n

大家都在问