我们有以下代码:
val sparkSession = SparkSession.builder
.master("local")
.appName("example")
.getOrCreate()
val data = Seq(
Row(1,"a","b","c","d"),Row(5,"z","d")
)
val schema = StructType(
List(
StructField("id",Integertype,true),StructField("f2",StringType,StructField("f3",StructField("f4",StructField("f5",true)
)
)
val df1 = sparkSession.createDataFrame(
sparkSession.sparkContext.parallelize(data),schema
)
目标是将该数据帧写入JSON数组:
[{"id":1,"f2":"a","f3":"b","f4":"c","f5":"d"},{"id":5,"f2":"z","f5":"d"}]
因此我们需要两个方括号,但数据帧大小超过50Gb,因此解决方案 df1.toJSON.collect.mkString(“ [”,“,”,“]”) 无效。有什么方法可以解决Spark性能良好的问题?
预先感谢