我有一个包含3列的pyspark数据帧:Violation_Location,Violation_Code和Ticket_Frequency。但是,在Violation_Code和Violation_Location列中有几个类别(每个类别都超过100个)。
我想根据门票频率获得Violation_Location和Violation_Code的前10名。
*,*:after,*:before {
margin: 0;
padding: 0;
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
}
@media screen and (max-width: 600px) {
#navbar {
position: relative;
padding-right: 40px;
}
#navbar .navbar-content.fa-bars {
position: absolute;
right: 20px;
top: 0;
z-index: 9;
cursor: pointer;
}
.responsive .navbar-content:not(:first-child) {
display: inline-block !important;
}
}
Precint = spark.sql("SELECT Violation_Location,Violation_Code,Count(*) as Ticket_Frequency from table_view2 group by Violation_Location,Violation_Code order by Ticket_Frequency desc")
Precint.show()
到目前为止,我只能根据Ticket_Frequency绘制前10个Violation_Location。任何形式的帮助,我们感激不尽,谢谢!
Violation_Location|Violation_Code|Ticket_Frequency|
+------------------+--------------+----------------+
| null| 36| 1098296|
| null| 7| 471754|
| null| 5| 248774|
| 18| 14| 132123|
| 114| 21| 84051|
| 14| 14| 83664|
| 19| 46| 82640|
| 14| 69| 69006|