这是您可以怎么做
Demography = Row("City","Country","Population","Government")
demo1 = Demography("a","AD",1.2,"Democratic")
demo2 = Demography("b","Democratic")
demo3 = Demography("c","Democratic")
demo4 = Demography("m","XX","Democratic")
demo5 = Demography("n","Democratic")
demo6 = Demography("o","Democratic")
demo7 = Demography("q","Democratic")
demographic_data = [demo1,demo2,demo3,demo4,demo5,demo6,demo7]
demographic_data_df = spark.createDataFrame(demographic_data)
demographic_data_df.show(10)
+----+-------+----------+----------+
|City|Country|Population|Government|
+----+-------+----------+----------+
| a| AD| 1.2|Democratic|
| b| AD| 1.2|Democratic|
| c| AD| 1.2|Democratic|
| m| XX| 1.2|Democratic|
| n| XX| 1.2|Democratic|
| o| XX| 1.2|Democratic|
| q| XX| 1.2|Democratic|
+----+-------+----------+----------+
new = demographic_data_df.groupBy('Country').count().select('Country',f.col('count').alias('n'))
max = new.agg(f.max('n').alias('n'))
new.join(max,on = "n",how = "inner").show()
+---+-------+
| n|Country|
+---+-------+
| 4| XX|
+---+-------+
另一个选择是将数据框注册为临时表并运行常规SQL查询。要将其注册为临时表,您可以执行以下
demographic_data_df.registerTempTable("demographic_data_table")
希望有帮助
本文链接:https://www.f2er.com/3124011.html