如何获得特定范围内数字的频率

2024-04-29 • 问答

我的数据框包含2列（“时间戳”，“ ip”）和400行

timestamp     ip

1000          ip1

1500          ip2
 .             .
 .             .

11000         ip1
 .             . 
 .             .

50000         ip5

我想做的是检查第1列中每10行与第10行之间的差是否小于等于10，如果是，则获取第2列中重复次数最多的值，并将其从中删除数据框仅在测试为真的这一行中

所以我想要得到的是创建一个带有第10步的循环，并经过第一列，如果结果为true，则在第10行和第1行之间进行测试。我做另一个循环以获取第10行一个列表，然后获取在此列表中重复次数最多的ip，并尝试从原始数据帧中删除，但是我不确定这是否是正确的方法，即我的代码的样子：

#created an empty list
list=[]
#create a dataframe from the csv file
read=pd.read_csv(r"example.csv",delimiter = ';')
#get the number of rws in the csv file
lines = sum(1 for line in open('example.csv'))
#my first loop to make the test on the  10000 with the step 10 
for i in range(0,lines-1,10):
    if  int((read['timestamp'][i+1]))-(int (read['timestamp'][i]) )<= 10000:
# second loop the extract the result in a list
        for j  in range (i,i+10) :
           l.append(read['ip'][j])
           df = pd.DataFrame({'ip':l})
        print(df)
        l=[]

所以我从这段代码中得到的结果是这样的：

                           IP
0                          ip1
1                          ip2
2                          ip3
3                          ip4
4                          ip5
5                          ip5
6                          ip5
7                          ip1
8                          ip5
9                          ip6

All what i want to do now is to get which ip is the most frequent and how frequent is this ip that means
Ip5 :4

out_df = pd.DataFrame() while len(read) > 10: sub_df,read = read.iloc[:10],read.iloc[10:] if len(read) > 10 else (read,pd.DataFrame()) if sub_df.tail(1)['Time'].values[0] - sub_df.head(1)['Time'].values[0] <= 10000 and len(sub_df) == 10: sub_df = sub_df[sub_df["IP"] != sub_df['IP'].value_counts().index.values[0]] out_df = pd.concat([out_df,sub_df])

如何获得特定范围内数字的频率

jita2009 回答：如何获得特定范围内数字的频率

大家都在问