将数据框中的列转换为百分位数等级-Python 3.x

我有一个Pandas数据帧,看起来像这样:

| id | name |         time        |
|:--:|:----:|:-------------------:|
|  1 | eric | 2014-05-16 15:15:11 |
|  2 | eric |  2014-05-27 3:43:43 |
|  3 | eric | 2014-04-24 13:25:20 |
|  4 | tony | 2014-04-19 20:18:58 |
|  5 | tony |  2014-05-08 17:8:5  |
|  6 | tony | 2014-05-21 16:55:44 |
|  7 | eric |  2014-05-18 11:26:3 |
|  8 | eric | 2014-04-05 17:51:53 |
|  9 | tony | 2014-04-06 14:21:39 |
| 10 | tony | 2014-05-08 22:24:27 |
| 11 | tony |  2014-04-10 23:11:2 |
| 12 |  zac | 2014-05-04 13:13:44 |
| 13 | eric |  2014-04-03 6:50:1  |
| 14 | eric |  2014-04-25 6:22:39 |
| 15 | tony |  2014-04-14 0:23:55 |
| 16 |  zac | 2014-04-19 12:12:54 |
| 17 |  zac |  2014-05-30 1:36:15 |

我想要做的是将time列中的值替换为一天中的时间的百分等级。我需要将此datetime对象转换为百分等级。

在Oracle SQL中,我可以这样做:

SELECT id,name,FLOOR( (RANK() OVER (ORDER BY TO_CHAR(time,'hh24:mm:ss')) -1) * 10 / COUNT(*) OVER ()) AS "Rank"

所需的输出类似于:

| ID | THE_NAME | Rank |
|:--:|:--------:|:----:|
| 15 |   tony   |   0  |
| 17 |    zac   |   0  |
|  2 |   eric   |   1  |
| 13 |   eric   |   1  |
| 14 |   eric   |   2  |
|  7 |   eric   |   2  |
| 16 |    zac   |   3  |
|  3 |   eric   |   4  |
| 12 |    zac   |   4  |
|  9 |   tony   |   5  |
|  1 |   eric   |   5  |
|  6 |   tony   |   6  |
|  8 |   eric   |   7  |
|  5 |   tony   |   7  |
|  4 |   tony   |   8  |
| 10 |   tony   |   8  |
| 11 |   tony   |   9  |

还有,SQL FIDDLE

对于此类问题,我在Stack Overflow上没有找到任何参考,这就是为什么我目前没有尝试显示失败的原因。

注意::我看到Pandas有一个rank function,但我不明白如何在{24.15}类型中使用它,而我只需要提取24小时的时间

尝试@PrinceFrancis解决方案:

datetime

收益:

df['time'] = df['time'].dt.strftime('%H:%M:%S')
df = df.sort_values(['time']).reset_index().drop('index',axis=1)
total_size = len(df.index)
df['Rank'] = df.index * 10 / total_size
print(df)

当尝试通过以下方式转换为int时: name time Rank 0 tony 00:23:55 0.000000 1 zac 01:36:15 0.588235 2 eric 03:43:43 1.176471 3 eric 06:22:39 1.764706 4 eric 06:50:01 2.352941 5 eric 11:26:03 2.941176 6 zac 12:12:54 3.529412 7 zac 13:13:44 4.117647 8 eric 13:25:20 4.705882 9 tony 14:21:39 5.294118 10 eric 15:15:11 5.882353 11 tony 16:55:44 6.470588 12 tony 17:08:05 7.058824 13 eric 17:51:53 7.647059 14 tony 20:18:58 8.235294 15 tony 22:24:27 8.823529 16 tony 23:11:02 9.411765

它产生错误:

df['Rank'] = int(df.index * 10 / total_size)
servicesp417 回答:将数据框中的列转换为百分位数等级-Python 3.x

不确定等级函数,但是您可以按如下所示重新索引数据框后应用rank formula来获得结果

df['Rank'] = df.index * 10 / total_size

完整示例如下

import pandas as pd
from datetime import datetime

df = pd.DataFrame({
    'name' : ('eric','eric','tony','zac','zac'),'time' : [datetime.strptime(d,'%Y-%m-%d %H:%M:%S') for d in ('2014-05-16 15:15:11','2014-05-27 3:43:43','2014-04-24 13:25:20','2014-04-19 20:18:58','2014-05-08 17:08:05','2014-05-21 16:55:44','2014-05-18 11:26:03','2014-04-05 17:51:53','2014-04-06 14:21:39','2014-05-08 22:24:27','2014-04-10 23:11:02','2014-05-04 13:13:44','2014-04-03 6:50:01','2014-04-25 6:22:39','2014-04-14 0:23:55','2014-04-19 12:12:54','2014-05-30 1:36:15')]
})
df['time'] = df['time'].dt.strftime('%H:%M:%S')
df = df.sort_values(['time']).reset_index().drop('index',axis=1)
total_size = len(df.index)
df['Rank'] = df.index * 10 / total_size
df

结果是

    name    time    Rank
0   tony    00:23:55    0
1   zac     01:36:15    0
2   eric    03:43:43    1
3   eric    06:22:39    1
4   eric    06:50:01    2
5   eric    11:26:03    2
6   zac     12:12:54    3
7   zac     13:13:44    4
8   eric    13:25:20    4
9   tony    14:21:39    5
10  eric    15:15:11    5
11  tony    16:55:44    6
12  tony    17:08:05    7
13  eric    17:51:53    7
14  tony    20:18:58    8
15  tony    22:24:27    8
16  tony    23:11:02    9
,

已编辑此问题的解决方案取决于弗朗西斯亲王提到的解决方案

解决方案:-

import pandas as pd
from datetime import datetime

df = pd.DataFrame({
    'name' : ('eric',axis=1)
total_size = len(df.index)
df['Rank'] = df.index * 10 / total_size

然后您可以在熊猫中使用apply函数进行投射

def casting(value):
    return int(value)

df['Rank'] = df['Rank'].apply(casting)

df
Out[1]: 
    name      time  Rank
0   tony  00:23:55     0
1    zac  01:36:15     0
2   eric  03:43:43     1
3   eric  06:22:39     1
4   eric  06:50:01     2
5   eric  11:26:03     2
6    zac  12:12:54     3
7    zac  13:13:44     4
8   eric  13:25:20     4
9   tony  14:21:39     5
10  eric  15:15:11     5
11  tony  16:55:44     6
12  tony  17:08:05     7
13  eric  17:51:53     7
14  tony  20:18:58     8
15  tony  22:24:27     8
16  tony  23:11:02     9

使用排名功能

示例中熊猫的排序功能以排序为基础根据排名给出排名。

df['Rank'] = df['time'].rank()
df = df.sort_values('Rank')
df
Out[2]: 
    name      time  Rank
0   tony  00:23:55   1.0
1    zac  01:36:15   2.0
2   eric  03:43:43   3.0
3   eric  06:22:39   4.0
4   eric  06:50:01   5.0
5   eric  11:26:03   6.0
6    zac  12:12:54   7.0
7    zac  13:13:44   8.0
8   eric  13:25:20   9.0
9   tony  14:21:39  10.0
10  eric  15:15:11  11.0
11  tony  16:55:44  12.0
12  tony  17:08:05  13.0
13  eric  17:51:53  14.0
14  tony  20:18:58  15.0
15  tony  22:24:27  16.0
16  tony  23:11:02  17.0
本文链接:https://www.f2er.com/3085691.html

大家都在问