我正在尝试写入存储在本地文件系统上的现有Parquet文件。但是当多次写入时,前一个会被覆盖而不是添加。
from datetime import datetime
import os
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
def append_to_parquet_table(dataframe,filename):
full_path = os.path.join('.',filename)
table = pa.Table.from_pandas(dataframe)
writer = pq.ParquetWriter(full_path,table.schema)
writer.write_table(table=table)
def save(passed):
data = {'number': [1234],'verified': [passed],'date': datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
data_df = pd.DataFrame(data)
append_to_parquet_table(data_df,'results.parquet')
save(True)
save(False)
为什么第一个数据集是“更新的”而不是新的?