Spark向Hive写入错误的table_name作为分区规范,并抛出“分区规范包含非分区列”错误

我的Hive表是用orange = { "1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":0,"8":0,"9":0,} if all(orange[key] == 0 for key in ["1","2","3"]): print("hi") 定义的 在PySpark中写入表格时,我做了

PARTITIONED BY (ds STRING,model STRING)

但是我遇到以下错误:

output_df
  .repartition(250)
  .write
  .mode('overwrite')
  .format('parquet')\
  .partitionBy('ds','model')\
  .saveAsTable('{table_schema}.{table_name}'.format(table_schema=table_schema,table_name=table_name))

Spark或Hive似乎将org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: Partition spec {ds=2019-10-06,model=p1kr,table_name=drv_projection_table} contains non-partition columns 误认为是一个分区。我的表的S3路径为table_name,但未将s3://some_path/qubole/table_name=drv_projection_table指定为分区的一部分。

fszxg 回答:Spark向Hive写入错误的table_name作为分区规范,并抛出“分区规范包含非分区列”错误

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3129233.html

大家都在问