1 Create a column Usage_Per_Year from Miles_Driven_Per_Year by discretizing the values into three equally sized categories. The names of the categories should be Low,Medium,and High.
2 Group by Usage_Per_Year and print the group sizes as well as the ranges of each.
3 Do the same as in #1,but instead of equally sized categories,create categories that have the same number of points per category.
4 Group by Usage_Per_Year and print the group sizes as well as the ranges of each.
我的代码如下
df["Usage_Per_Year "],bins = pd.cut(df["Miles_Driven_Per_Year"],3,precision=2,retbins=True)
group_label = pd.Series(["Low","Medium","High"])
#3.3.2
group_size = df.groupby("Usage_Per_Year").size()
#print(group_size)
print(group_size.reset_index().set_index(group_label))
#3.3.3
Year2 = pd.cut(df["Miles_Driven_Per_Year"],precision=2)
group_label = pd.Series(["Low","High"])
#3.3.4
group_size = df.groupby("Usage_Per_Year").size()
#print(group_size)
print(group_size.reset_index().set_index(group_label))
输出如下:
Usage_Per_Year 0 Low (-1925.883,663476.235] 6018 Medium (663476.235,1326888.118] 0 High (1326888.118,1990300.0] 1
Usage_Per_Year 0 Low (-1925.883,1990300.0] 1
但是 -1925 是错误的...
我该怎么办...