考虑以下两个数据框:
+---+
|id |
+---+
|1 |
|2 |
|3 |
+---+
+---+-----+
|idz|word |
+---+-----+
|1 |bat |
|1 |mouse|
|2 |horse|
+---+-----+
我正在做Left join on ID=IDZ
:
val r = df1.join(df2,(df1("id") === df2("idz")),"left_outer").
withColumn("ID_EMPLOYE_VENDEUR",when(col("word") =!= ("null"),col("word")).otherwise(null)).drop("word")
r.show(false)
+---+----+------------------+
|id |idz |ID_EMPLOYE_VENDEUR|
+---+----+------------------+
|1 |1 |mouse |
|1 |1 |bat |
|2 |2 |horse |
|3 |null|null |
+---+----+------------------+
但是,如果我只想保留ID只有一个等于IDZ的行怎么办?如果没有,我想在ID_EMPLOYE_VENDEUR中使用null。所需的输出是:
+---+----+------------------+
|id |idz |ID_EMPLOYE_VENDEUR|
+---+----+------------------+
|1 |1 |null | --Because the Join resulted two different lines
|2 |2 |horse |
|3 |null|null |
+---+----+------------------+
我应该指出,我正在开发大型DF。该解决方案在时间上应该不是很昂贵。
谢谢