如何将pyarrow实木复合地板数据写入s3存储桶？

2024-05-05 • 问答

我已经创建了一个数据框，并使用pyarrow（也提到了here）将该df转换为镶木地板文件：

def convert_df_to_parquet(self,df):
    table = pa.Table.from_pandas(df)
    buf = pa.BufferOutputStream()
    pq.write_table(table,buf)
    return buf

现在，我想将上传的内容保存到s3存储桶中，并为upload_file()尝试了不同的输入参数，但我尝试的所有方法均无效：

s3_client.upload_file(parquet_file,bucket_name,destination_key)#1st
s3_client.put_object(Bucket=bucket_name,Key=destination_key,Body=parquet_file)#2nd
s3_client.put_object(Bucket=bucket_name,Body=parquet_file.getvalue())#3rd
s3_client.put_object(Bucket=bucket_name,Body=parquet_file.read1())#4th

错误：

 s3_client.put_object(Bucket=bucket_name,Body=parquet_file.read1())
  File "pyarrow/io.pxi",line 376,in pyarrow.lib.NativeFile.read1
  File "pyarrow/io.pxi",line 310,in pyarrow.lib.NativeFile.read
  File "pyarrow/io.pxi",line 320,line 155,in pyarrow.lib.NativeFile.get_input_stream
  File "pyarrow/io.pxi",line 170,in pyarrow.lib.NativeFile._assert_readable
OSError: only valid on readonly files

import awswrangler as wr wr.pandas.to_parquet( dataframe=df,path="s3://...",dataset=True,mode="overwrite",# Could be append,overwrite or overwrite_partitions database="my_database",# Optional,only with you want it available on Athena/Glue Catalog table="my_table",partition_cols=["PARTITION_COL_NAME"])

如何将pyarrow实木复合地板数据写入s3存储桶？

lisa813328 回答：如何将pyarrow实木复合地板数据写入s3存储桶？

大家都在问