在Java Spark中尝试zipWithIndex时出错

我尝试使用handleSubmit在Spark中添加具有行号的列,如下所示:

zipWithIndex

但是我试图在JAVA中做如下相同的事情

val  df = sc.parallelize(Seq((1.0,2.0),(0.0,-1.0),(3.0,4.0),(6.0,-2.3))).toDF("x","y")
val rddzip = df.rdd.zipWithIndex;
val newSchema = StructType(df.schema.fields ++ Array(StructField("rowid",LongType,false)))
val dfZippedWithId =  spark.createDataFrame(rddzip.map{ case (row,index) => Row.fromSeq(row.toSeq ++ Array(index))},newSchema)

我得到以下错误

JavaRDD<Row> rdd = (JavaRDD) df.toJavaRDD().zipWithIndex().map(t -> {
    Row r = t._1;
    Long index = t._2 + 1;
    ArrayList<Object> list = new ArrayList<>();
    for(Object item: JavaConverters.seqAsJavaListConverter(r.toSeq()).asJava()) {
        list.add(item);
    }
    return RowFactory.create(JavaConverters.seqAsJavaListConverter(t._1.toSeq()).asJava().add(t._2));
});
StructType newSchema = df.schema()
        .add(new StructField(name,DataTypes.LongType,true,Metadata.empty()));
return df.sparkSession().createDataFrame(rdd,newSchema);

有帮助吗?

zjytianlang 回答:在Java Spark中尝试zipWithIndex时出错

在scala版本中,您将传递给spark.createDataFrame RDD[Row];在Java中,您将传递JavaPairRDD,则应将其映射到JavaRDD[Row]

        Dataset<Row> df = ss.range(10).toDF();
        df.show();

        JavaPairRDD<Row,Long> rddzip = df.toJavaRDD().zipWithIndex();
        JavaRDD<Row> rdd = rddzip.map(s->{
            Row r = s._1;
            Object[] arr = new Object[r.size()+1];
            for (int i = 0; i < arr.length-1; i++) {
                arr[i] = r.get(i);
            }
            arr[arr.length-1] = s._2;
            return RowFactory.create(arr);
        });

        StructType newSchema = df.schema().add(new StructField("rowid",DataTypes.LongType,false,Metadata.empty()));

        Dataset<Row> df2 = ss.createDataFrame(rdd,newSchema);

        df2.show();
    +---+
    | id|
    +---+
    |  0|
    |  1|
    |  2|
    |  3|
    |  4|
    |  5|
    |  6|
    |  7|
    |  8|
    |  9|
    +---+

    +---+-----+
    | id|rowid|
    +---+-----+
    |  0|    0|
    |  1|    1|
    |  2|    2|
    |  3|    3|
    |  4|    4|
    |  5|    5|
    |  6|    6|
    |  7|    7|
    |  8|    8|
    |  9|    9|
    +---+-----+


本文链接:https://www.f2er.com/3125132.html

大家都在问