我很困惑为什么以下代码从标题中引发异常:
class Scratch {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.master("local")
.getOrCreate();
JavaSparkContext javaSparkContext = new JavaSparkContext(spark.sparkContext());
JavaRDD<String> javaRdd = javaSparkContext.parallelize(Arrays.asList("a","b","c"));
JavaRDD<Foo> simpleRdd = javaRdd.map(Foo::new);
Dataset fooDs = spark.createDataset(simpleRdd.rdd(),Encoders.bean(Foo.class)); // works perfectly
UDF1<String,Bar> newBar = Bar::new;
UserDefinedFunction newBarUdf = udf(newBar,Encoders.bean(Bar.class).schema());
Dataset fooBarDs = fooDs.withColumn("test",newBarUdf.apply(col("name"))); // doesn't work
fooBarDs.show(false);
}
public static class Bar implements Serializable {
private String alias;
public Bar(String alias) {
this.alias = alias;
}
public Bar() {
}
public String getalias() {
return alias;
}
public void setalias(String alias) {
this.alias = alias;
}
}
public static class Foo implements Serializable {
private String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Foo(String name) {
this.name = name;
}
public Foo() {
}
}
}
有人知道为什么不支持此操作吗,因为感觉应该如此。我究竟做错了什么?有解决方法吗?
注意:实际示例中有几列具有原始字段,我需要能够将Java类转换为嵌套结构的列。