我正在使用Azure的Databricks,并希望使用PySpark将查询下推到Azure SQL。我已经尝试了很多方法,并找到了使用Scala的解决方案(下面的代码),但是要做到这一点,我需要将部分代码转换为scala,然后再次带回PySpark。
%scala
import java.util.Properties
import java.sql.DriverManager
val jdbcusername = username
val jdbcPassword = password
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = "entire-string-connection-to-Azure-SQL"
// Create a Properties() object to hold the parameters.
val connectionProperties = new Properties()
connectionProperties.put("user",s"${jdbcusername}")
connectionProperties.put("password",s"${jdbcPassword}")
connectionProperties.setProperty("Driver",driverClass)
val connection = DriverManager.getconnection(jdbcUrl,jdbcusername,jdbcPassword)
val stmt = connection.createStatement()
val sql = "TRUNCATE TABLE dbo.table"
stmt.execute(sql)
connection.close()
是否可以使用PySpark而非Scala语言实现DML代码下推?
发现了一些相关内容,但仅适用于读取数据和DDL命令:
jdbcUrl = "jdbc:mysql://{0}:{1}/{2}".format(jdbcHostname,jdbcPort,jdbcdatabase)
connectionProperties = {
"user" : jdbcusername,"password" : jdbcPassword,"driver" : "com.mysql.jdbc.Driver"
}
pushdown_query = "(select * from employees where emp_no < 10008) emp_alias"
df = spark.read.jdbc(url=jdbcUrl,table=pushdown_query,properties=connectionProperties)