read_table处理丢失的字符串

2024-05-05 • 问答

在尝试使用熊猫read_table()从.txt文件导入数据时，遇到了一个有趣的问题。

文件中的一列仅部分填充了字符串（ OrigVol ）。然后，用以下列中的值填充空单元格，并在末尾添加NaN，这会导致空单元格。下图显示了在pd.read_table( file.txt,sep = r'\s+')之后使用import pandas as pd的结果。

OrigVol 列应包含空字符串或NaN，而不是 prim_pos_eutags 列。我是熊猫的新手，很难提出准确再现问题的MWE。一个将需要该文件（我无法上传）。

也许我错过了指定导入选项的机会。让我知道是否需要更多信息。

任何提示都非常感谢！

根据建议，我提供了示例文件的前8行

Name            Subtype     ProcName            Material    Creator     OrigVol trklen                  prim_pos_eutags
top             0           initStep            undefined   undefined   top     0                       prim_pos_eudata
top             91          Transportation      Vacuum      undefined           2.27009877562523E-12    prim_pos_eudata
QC5L_2_v        91          Transportation      Vacuum      undefined           3.50000000000227        prim_pos_eudata
DRIFT_8609_v    91          Transportation      Vacuum      undefined           3.80000000000518        prim_pos_eudata
BC1L_2_v        23          SynRad              Vacuum      undefined           68.1607816456518        prim_pos_eudata
BC1L_2_v        91          Transportation      Vacuum      undefined           79.0910350747966        prim_pos_eudata
DRIFT_8610_v    91          Transportation      Vacuum      undefined   QC2_v   79.3910346856657        prim_pos_eudata

input_file = open('test.txt',"r") data = [] for line in input_file: splitted_list = line.split() if len(splitted_list) == 8: data.append( splitted_list ) if len(splitted_list) < 8: # 8 is number of columns,change it if it is wrong splitted_list.insert(5,"") # add an empty string inn the 5th column (change the number if it is wrong) data.append(splitted_list) test = pd.DataFrame.from_records( data ) # first row to select header header = test.iloc[0] # exclude first line test = test[1:] # reset the header test.columns = header test

input_file = open('test.txt',"rb") data = [] for line in input_file: splitted_list = line.split() if len(splitted_list) < 8: # 8 is number of columns,"") # add an empty string inn the 5th column (change the number if it is wrong) data.append(splitted_list) df = pd.DataFrame.from_records(data)

read_table处理丢失的字符串

wstsxaiyun 回答：read_table处理丢失的字符串

大家都在问