我有两行多列的DataFrame,如何转置为两列多行?

我有一个这样的spark DataFrame:

+---+---+---+---+---+---+---+
| f1| f2| f3| f4| f5| f6| f7|
+---+---+---+---+---+---+---+
|  5|  4|  5|  2|  5|  5|  5|
+---+---+---+---+---+---+---+

您怎么可能去

+---+---+
| f1|  5|
+---+---+
| f2|  4|
+---+---+
| f3|  5|
+---+---+
| f4|  2|
+---+---+
| f5|  5|
+---+---+
| f6|  5|
+---+---+
| f7|  5|
+---+---+

spark scala中是否有一个简单的代码可用于换位?

fgvlty 回答:我有两行多列的DataFrame,如何转置为两列多行?

bins = [0,5,10]
labels = ['{}-{}'.format(i,j) for i,j in zip(bins[:-1],bins[1:])] 
b = pd.cut(df['kids_age'],bins=bins,labels=labels,include_lowest=True)
mux = pd.MultiIndex.from_product([df['city'].unique(),labels],names=['city','kids_age'])

df = (df.groupby(['city',b])
        .size()
        .reindex(mux,fill_value=0)
        .reset_index(name='count'))
print (df)
  city kids_age  count
0    A      0-5      2
1    A     5-10      1
2    B      0-5      1
3    B     5-10      0
4    C      0-5      0
5    C     5-10      1
,
spark 2.4+ use map_from_arrays
scala> var df =Seq(( 5,4,2,5)).toDF("f1","f2","f3","f4","f5","f6","f7")

scala> df.select(array('*).as("v"),lit(df.columns).as("k")).select('v.getItem(0).as("cust_id"),map_from_arrays('k,'v).as("map")).select(explode('map)).show(false)
+---+-----+
|key|value|
+---+-----+
|f1 |5    |
|f2 |4    |
|f3 |5    |
|f4 |2    |
|f5 |5    |
|f6 |5    |
|f7 |5    |
+---+-----+

希望它可以帮助您。

,

我写了一个函数

object DT {
  val KEY_COL_NAME = "dt_key"
  val VALUE_COL_NAME = "dt_value"

  def pivot(df: DataFrame,valueDataType: DataType,cols: Array[String],keyColName: String,valueColName: String): DataFrame = {
    val tempData: RDD[Row] = df.rdd.flatMap(row => row.getValuesMap(cols).map(Row.fromTuple))
    val keyStructField = DataTypes.createStructField(keyColName,DataTypes.StringType,false)
    val valueStructField = DataTypes.createStructField(valueColName,true)
    val structType = DataTypes.createStructType(Array(keyStructField,valueStructField))
    df.sparkSession.createDataFrame(tempData,structType).select(col(keyColName),col(valueColName).cast(valueDataType))
  }

  def pivot(df: DataFrame,valueDataType: DataType): DataFrame = {
    pivot(df,valueDataType,df.columns,KEY_COL_NAME,VALUE_COL_NAME)
  }
}

有效

df.show()
DT.pivot(df,DoubleType).show()

喜欢

+---+---+-----------+---+---+       +------+-----------+
| f1| f2|         f3| f4| f5|       |dt_key|   dt_value|
+---+---+-----------+---+---+  to   +------+-----------+
|100|  1|0.355072464|  0| 31|       |    f1|      100.0|
+---+---+-----------+---+---+       |    f5|       31.0|
                                    |    f3|0.355072464|
                                    |    f4|        0.0|
                                    |    f2|        1.0|
                                    +------+-----------+

+---+---+-----------+-----------+---+        +------+-----------+
| f1| f2|         f3|         f4| f5|        |dt_key|   dt_value|
+---+---+-----------+-----------+---+  to    +------+-----------+
|100|  1|0.355072464|          0| 31|        |    f1|      100.0|
| 63|  2|0.622775801|0.685809375| 16|        |    f5|       31.0|
+---+---+-----------+-----------+---+        |    f3|0.355072464|
                                             |    f4|        0.0|
                                             |    f2|        1.0|
                                             |    f1|       63.0|
                                             |    f5|       16.0|
                                             |    f3|0.622775801|
                                             |    f4|0.685809375|
                                             |    f2|        2.0|
                                             +------+-----------+

非常好!

本文链接:https://www.f2er.com/3153387.html

大家都在问