熊猫排序-保持秩序的问题

我有一个df

    param       per     per_date    per_num             
0   XYZ         1.0     2018-10-01  11.0                
1   XYZ         2.0     2017-08-01  15.25               
2   XYZ         1.0     2019-10-01  11.25
3   XYZ         2.0     2019-08-01  15.71 
4   XYZ         3.0     2020-10-01  NaN 
5   MMG         1.0     2021-10-01  12.50                          
6   MMG         2.0     2021-10-01  11.75               
7   MMG         3.0     2011-01-01  NaN                
8   ZZZ         4.0     2023-01-01  19.00 
9   ZZZ         3.0     2014-01-01  13.00
10  MMM         1.0     2016-03-01  12.01
11  MMM         2.0     2019-01-01  16.00
12  ZZZ         1.0     2009-06-01  12.50
13  ZZZ         2.0     2018-01-01  19.00

我需要这样的输出

        param       per     per_date    per_num 
    0   MMG         1.0     2021-10-01  12.50                          
    1   MMG         2.0     2021-10-01  11.75               
    2   MMG         3.0     2011-01-01  NaN  
    3   MMM         1.0     2016-03-01  12.01
    4   MMM         2.0     2019-01-01  16.00
    5   XYZ         1.0     2018-10-01  11.0                
    6   XYZ         2.0     2017-08-01  15.25               
    7   XYZ         1.0     2019-10-01  11.25
    8   XYZ         2.0     2019-08-01  15.71 
    9   XYZ         3.0     2020-10-01  NaN 
    10  ZZZ         1.0     2009-06-01  12.50
    11  ZZZ         2.0     2018-01-01  19.00              
    12  ZZZ         4.0     2023-01-01  19.00 
    13  ZZZ         3.0     2014-01-01  13.00

但是,当我进行排序时,

df= df.sort_values(['param','per']).reset_index(drop=True)
df

我明白了,(不是我想要的)

   param  per   per_date    per_num
0   MMG   1.0   2021-10-01  12.50
1   MMG   2.0   2021-10-01  11.75
2   MMG   3.0   2011-01-01  NaN
3   MMM   1.0   2016-03-01  12.01
4   MMM   2.0   2019-01-01  16.00
5   XYZ   1.0   2018-10-01  11.00
6   XYZ   1.0   2019-10-01  11.25
7   XYZ   2.0   2017-08-01  15.25
8   XYZ   2.0   2019-08-01  15.71
9   XYZ   3.0   2020-10-01  NaN
10  ZZZ   1.0   2009-06-01  12.50
11  ZZZ   2.0   2018-01-01  19.00
12  ZZZ   3.0   2014-01-01  13.00
13  ZZZ   4.0   2023-01-01  19.00

如果您在上方看到的原始df xyz的{​​{1}}值为1,2,然后以1,2,3开头,则它们是两个不同的类别。我想保持原样。但是,per都是一种顺序,因此是一个类别,但是顺序不相同,因此需要排序。如何在熊猫中做到这一点?

赞赏任何建议。

youyou_1125 回答:熊猫排序-保持秩序的问题

我们可以使用Categorical

df.param = pd.Categorical(df.param,categories = df.param.unique())
df = df.sort_values(['param','per']).reset_index(drop = True)
df
Out[348]: 
   param  per    per_date  per_num
0    XYZ  1.0  2018-10-01    11.00
1    XYZ  1.0  2019-10-01    11.25
2    XYZ  2.0  2017-08-01    15.25
3    XYZ  2.0  2019-08-01    15.71
4    XYZ  3.0  2020-10-01      NaN
5    MMG  1.0  2021-10-01    12.50
6    MMG  2.0  2021-10-01    11.75
7    MMG  3.0  2011-01-01      NaN
8    ZZZ  1.0  2009-06-01    12.50
9    ZZZ  2.0  2018-01-01    19.00
10   ZZZ  3.0  2014-01-01    13.00
11   ZZZ  4.0  2023-01-01    19.00
12   MMM  1.0  2016-03-01    12.01
13   MMM  2.0  2019-01-01    16.00

更新

df.param = pd.Categorical(df.param,categories = df.param.unique())

df['Key']=df.groupby(['param','per']).cumcount()
df = df.sort_values(['param','Key','per']).reset_index(drop = True).drop('Key',1)
df
Out[375]: 
   param  per    per_date  per_num
0    XYZ  1.0  2018-10-01    11.00
1    XYZ  2.0  2017-08-01    15.25
2    XYZ  3.0  2020-10-01      NaN
3    XYZ  1.0  2019-10-01    11.25
4    XYZ  2.0  2019-08-01    15.71
5    MMG  1.0  2021-10-01    12.50
6    MMG  2.0  2021-10-01    11.75
7    MMG  3.0  2011-01-01      NaN
8    ZZZ  1.0  2009-06-01    12.50
9    ZZZ  2.0  2018-01-01    19.00
10   ZZZ  3.0  2014-01-01    13.00
11   ZZZ  4.0  2023-01-01    19.00
12   MMM  1.0  2016-03-01    12.01
13   MMM  2.0  2019-01-01    16.00
,

更新:

df.assign(sortkey=df.groupby('param')
                    .apply(lambda x:x.duplicated(['param','per']).cumsum())
                    .reset_index(level=0,drop=True))\
  .sort_values(['param','sortkey','per'])

输出:

   param  per    per_date  per_num  sortkey
5    MMG  1.0  2021-10-01    12.50        0
6    MMG  2.0  2021-10-01    11.75        0
7    MMG  3.0  2011-01-01      NaN        0
10   MMM  1.0  2016-03-01    12.01        0
11   MMM  2.0  2019-01-01    16.00        0
0    XYZ  1.0  2018-10-01    11.00        0
1    XYZ  2.0  2017-08-01    15.25        0
2    XYZ  1.0  2019-10-01    11.25        1
3    XYZ  2.0  2019-08-01    15.71        2
4    XYZ  3.0  2020-10-01      NaN        2
12   ZZZ  1.0  2009-06-01    12.50        0
13   ZZZ  2.0  2018-01-01    19.00        0
9    ZZZ  3.0  2014-01-01    13.00        0
8    ZZZ  4.0  2023-01-01    19.00        0
,

怎么样?

df.assign(sortkey=-df.groupby(['param','per']).cumcount()).sort_values(['param','per']).reset_index(drop=True)
,

您需要按参数和每个值进行第一个分组,然后将唯一的每个值分配给一个分组程序。然后再次按参数和石斑鱼分组,并按参数和每个

对每个组进行排序
import pandas as pd
from pandas import Timestamp

df = pd.DataFrame([['XYZ',1.0,Timestamp('2018-10-01 00:00:00'),11.0],['XYZ',2.0,Timestamp('2017-08-01 00:00:00'),15.25],Timestamp('2019-10-01 00:00:00'),11.25],Timestamp('2019-08-01 00:00:00'),15.71],['MMG',Timestamp('2021-10-01 00:00:00'),12.5],11.75],['ZZZ',4.0,Timestamp('2023-01-01 00:00:00'),19.0],3.0,Timestamp('2014-01-01 00:00:00'),13.0],['MMM',Timestamp('2016-03-01 00:00:00'),12.01],Timestamp('2019-01-01 00:00:00'),16.0],Timestamp('2009-06-01 00:00:00'),Timestamp('2018-01-01 00:00:00'),19.0]],columns=('param','per','per_date','per_num'))

df["grouper"] = df.groupby(["param","per"]).cumcount()

df.groupby(["param","grouper"])\
.apply(lambda g: g.sort_values(["param","per"]))\
.reset_index(drop=True)

结果

param  per   per_date  per_num  grouper
0    MMG  1.0 2021-10-01    12.50        0
1    MMG  2.0 2021-10-01    11.75        0
2    MMM  1.0 2016-03-01    12.01        0
3    MMM  2.0 2019-01-01    16.00        0
4    XYZ  1.0 2018-10-01    11.00        0
5    XYZ  2.0 2017-08-01    15.25        0
6    XYZ  1.0 2019-10-01    11.25        1
7    XYZ  2.0 2019-08-01    15.71        1
8    ZZZ  1.0 2009-06-01    12.50        0
9    ZZZ  2.0 2018-01-01    19.00        0
10   ZZZ  3.0 2014-01-01    13.00        0
11   ZZZ  4.0 2023-01-01    19.00        0
本文链接:https://www.f2er.com/3165354.html

大家都在问