Pyspark EMR Jupyter笔记本加载JAR依赖Postgres

我正在尝试将数据帧写入Postgres数据库,并且正在关注问题here。我已经像该用户一样确认可以直接从命令行运行代码,但是我需要通过Jupyter Notebook运行它。这是我所拥有的:

from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClasspath','/home/hadoop/postgresql-42.2.5.jar').getOrCreate()


df = spark.read.format("parquet").option("inferSchema","false").option("header","true").load(input).select(fields)

mode = "append"
url = "jdbc:postgresql://host/db"
properties = {"user": "user","password": "password","driver": "org.postgresql.Driver"}
df.write.jdbc(url=url,table="test_result",mode=mode,properties=properties)

但我不断收到错误消息:

An error occurred while calling o188.jdbc.
: java.lang.ClassnotFoundException: org.postgresql.Driver

拜托,有人指出我在做错什么,我才刚刚结束。

编辑:

我已将代码更新为此:

from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClasspath','/home/hadoop/postgresql-42.2.5.jar').getOrCreate()

mode = "append"
url = "jdbc:postgresql://host/db"
properties = {"user": "user","password": "password"}
df.write.jdbc(url=url,properties=properties)

并出现新错误:

An error occurred while calling o236.jdbc.
: java.sql.SQLException: No suitable driver

编辑2:

链接的帖子适用于未使用Jupyter笔记本的dockerized MySQL配置

topten800 回答:Pyspark EMR Jupyter笔记本加载JAR依赖Postgres

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3142414.html

大家都在问