我试图创建一个Spark数据集,然后使用mapPartitions尝试访问其每个元素并将它们存储在变量中。使用下面的代码来实现同样的效果:
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val df = spark.sql("select col1,col2,col3 from table limit 10")
val schema = StructType(Seq(
StructField("col1",StringType),StructField("col2",StructField("col3",StringType)))
val encoder = RowEncoder(schema)
df.mapPartitions{iterator => { val myList = iterator.toList
myList.map(x=> { val value1 = x.getString(0)
val value2 = x.getString(1)
val value3 = x.getString(2)}).iterator}} (encoder)
我针对此代码遇到的错误是:
<console>:39: error: type mismatch;
found : org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row]
required: org.apache.spark.sql.Encoder[Unit]
val value3 = x.getString(2)}).iterator}} (encoder)
最终,我的目标是将行元素存储在变量中,并对它们执行一些操作。不知道我在这里想念什么。任何对此的帮助将不胜感激!