离散化:将连续值转换为一定数量的类别

1   Create a column Usage_Per_Year from Miles_Driven_Per_Year by discretizing the values into three equally sized categories. The names of the categories should be Low,Medium,and High.

2   Group by Usage_Per_Year and print the group sizes as well as the ranges of each.

3   Do the same as in #1,but instead of equally sized categories,create categories that have the same number of points per category.

4   Group by Usage_Per_Year and print the group sizes as well as the ranges of each.

我的代码如下

df["Usage_Per_Year "],bins = pd.cut(df["Miles_Driven_Per_Year"],3,precision=2,retbins=True)
group_label = pd.Series(["Low","Medium","High"])
#3.3.2
group_size = df.groupby("Usage_Per_Year").size()
#print(group_size)
print(group_size.reset_index().set_index(group_label))
#3.3.3
Year2 = pd.cut(df["Miles_Driven_Per_Year"],precision=2)
group_label = pd.Series(["Low","High"])
#3.3.4
group_size = df.groupby("Usage_Per_Year").size()
#print(group_size)
print(group_size.reset_index().set_index(group_label))

输出如下:

               Usage_Per_Year     0 Low       (-1925.883,663476.235]  6018 Medium  (663476.235,1326888.118]     0 High     (1326888.118,1990300.0]     1
               Usage_Per_Year     0 Low       (-1925.883,1990300.0]     1

但是 -1925 是错误的...

正确答案应该是这样的。

离散化:将连续值转换为一定数量的类别

我该怎么办...

iCMS 回答:离散化:将连续值转换为一定数量的类别

也许第 1 行有错别字:df["Usage_Per_Year "]?列名末尾有一个空格。

pd.cut 将值分成相等的大小。这就是为什么您的所有垃圾箱都具有相同大小的原因。看来您应该在分箱后计算每个组的最小值和最大值。

此外,要将值归入相等的频率,您应该使用 pd.qcut


示例输入:

import numpy as np
import pandas as pd

rng = np.random.default_rng(20210514)
df = pd.DataFrame({
    'Miles_Driven_Per_Year': rng.gamma(1.05,10000,(1000,)).astype(int)
})

# 1
group_label = ['Low','Medium','High']
df['Usage_Per_Year'] = pd.cut(df['Miles_Driven_Per_Year'],bins=3,labels=group_label)

# 2
print(df.groupby('Usage_Per_Year').agg(['count','min','max']))

# 3
df['Usage_Per_Year'] = pd.qcut(df['Miles_Driven_Per_Year'],q=3,labels=group_label)

# 4
print(df.groupby('Usage_Per_Year').agg(['count','max']))

示例输出:

               Miles_Driven_Per_Year              
                               count    min    max
Usage_Per_Year                                    
Low                              878     31  20905
Medium                           107  20955  41196
High                              15  41991  62668
               Miles_Driven_Per_Year              
                               count    min    max
Usage_Per_Year                                    
Low                              334     31   4378
Medium                           333   4449  11424
High                             333  11442  62668
本文链接:https://www.f2er.com/418124.html

大家都在问