以下代码给出了每列中具有三个值的数据帧,如下所示.
import org.graphframes._ import org.apache.spark.sql.DataFrame val v = sqlContext.createDataFrame(List( ("1","Al"),("2","B"),("3","C"),("4","D"),("5","E") )).toDF("id","name") val e = sqlContext.createDataFrame(List( ("1","3",5),("1","2",8),6),"4",7),"1","5",8) )).toDF("src","dst","property") val g = GraphFrame(v,e) val paths: DataFrame = g.bfs.fromExpr("id = '1'").toExpr("id = '5'").run() paths.show() val df=paths df.select(df.columns.filter(_.startsWith("e")).map(df(_)) : _*).show
以上代码的OutPut如下:
+-------+-------+-------+ | e0| e1| e2| +-------+-------+-------+ |[1,2,8]|[2,4,7]|[4,5,8]| +-------+-------+-------+
在上面的输出中,我们可以看到每列有三个值,它们可以解释如下.
e0 : source 1,Destination 2 and distance 8 e1: source 2,Destination 4 and distance 7 e2: source 4,Destination 5 and distance 8
基本上e0,e1和e3是边缘.我想总结每列的第三个元素,即添加每个边的距离以获得总距离.我怎样才能做到这一点?
解决方法
它可以这样做:
val total = df.columns.filter(_.startsWith("e")) .map(c => col(s"$c.property")) // or col(c).getItem("property") .reduce(_ + _) df.withColumn("total",total)