在 Python 中读取文件的每 N 个字节

2024-05-17 • 问答

我有一个包含许多记录的大型 Fortran 二进制文件（记录是对 Fortran 中 write 函数的调用）。每条记录包含 4 个浮点数。前三个浮点数表示粒子的 x,y,z 坐标，最后一个浮点数表示时间。这个文件的编写方式用两个整数包装每条记录。所以总的来说，包括包装器在内的每条记录都有 24 个字节。我正在尝试编写一个 Python 脚本，该脚本仅读取每条记录中的 z 坐标。为此，我需要读取字节 12-16、36-40 等。

所以我尝试了以下

import struct
import numpy as np

def read_field(fin):
    field_offset = 8 #Offset of the first byte of the z coordinate WITHIN a record
    field_type = 'f' #A float
    field_dtype = np.dtype('>'+_field_type) #Numpy dtype
    field_data = np.empty(num_of_records,dtype=field_dtype) #num_of_records is a global variable,equal to file size divided by record size
    field_size = 4 #Bytes in a float
    record_size = 24 #Number of bytes in a record INCLUDING the wrapper
    
    fin.seek(0) #go to the beginning of the file
    fin.seek(4) #First 4 bytes are the wrapper of the first record
    fin.seek(field_offset,1) #Go to the beginning of the z coordinate
    for i in range(num_of_records):
        field_data[i],= struct.unpack('>'+field_type,fin.read(field_size)) #Unpack the bytes of the float
        fin.seek(record_size-field_size,1) #Go to the beginning of the next z coordinate
    return field_data

这似乎可以完成这项工作，但与使用 np.fromfile 读取整个文件相比，速度非常慢。我可以接受后者，但我的文件很大，加载所有文件的内存效率极低。

有没有更快的方法可以做到这一点？

在 Python 中读取文件的每 N 个字节

kart1117 回答：在 Python 中读取文件的每 N 个字节

大家都在问