按条件按序列切片数据集

我有数据集:

data = {'host': ['A','A','B','B'],'TS': ['1','2','3','7','9','11','8','14','16','18'],'Predict' : ['None','None','134','127','121','124']}

我想按非None值系列划分数据集并获取该系列的时差。

我具有时差功能。并尝试提取系列的索引,但不知道如何使用

def timediffs(series):
    series['tdiff'] = series['ts'].diff().fillna(0.0)
    return series
predict_index = df.index.where(df['Predict'].notna()).to_series().bfill()

最后,我想获得像这样的数据集:

new_data = {'host': ['A','19'],'124'],'Time_diff' : ['0','1','0',],'New_predict' : ['134','124',]
       }

new_df = pd.DataFrame(new_data)
zyr137583910 回答:按条件按序列切片数据集

首先,我们将'None'替换为NaN。然后,我们使用backfill (bfill)来创建列New_predict,最后使用GroupBy.diff来获取Time_diff

df['New_predict'] = df.replace('None',np.NaN).loc[:,'Predict'].bfill()
# df['TS'] = df['TS'].astype(int)
df['Time_diff'] = df.groupby('New_predict')['TS'].diff().fillna(0)

   host  TS Predict New_predict  Time_diff
0     A   1    None         134        0.0
1     A   2    None         134        1.0
2     A   3     134         134        1.0
3     A   7    None         127        0.0
4     A   9    None         127        2.0
5     A  11     127         127        2.0
6     B   7    None         121        0.0
7     B   8    None         121        1.0
8     B   9     121         121        1.0
9     B  14    None         124        0.0
10    B  16    None         124        2.0
11    B  18     124         124        2.0
,

在样本数据中,首先需要进行必要的预处理数据-将TS转换为数字,并将Predict None字符串转换为NaN或Nonetype:

df['TS'] = df['TS'].astype(int)
df['Predict'] = pd.to_numeric(df['Predict'],errors='coerce')
#if need replace strings None to NaN 
#df['Predict'] = df['Predict'].mask(df['Predict'] == 'None')

然后仅在Predict列中回填丢失的数据,并为Time_diff使用DataFrameGroupBy.diff并将第一个值替换为0

df['New_predict'] = df['Predict'].bfill()
df['Time_diff'] = df.groupby('New_predict')['TS'].diff().fillna(0).astype(int)
print (df)
   host  TS  Predict  New_predict  Time_diff
0     A   1      NaN        134.0          0
1     A   2      NaN        134.0          1
2     A   3    134.0        134.0          1
3     A   7      NaN        127.0          0
4     A   9      NaN        127.0          2
5     A  11    127.0        127.0          2
6     B   7      NaN        121.0          0
7     B   8      NaN        121.0          1
8     B   9    121.0        121.0          1
9     B  14      NaN        124.0          0
10    B  16      NaN        124.0          2
11    B  18    124.0        124.0          2
本文链接:https://www.f2er.com/3166852.html

大家都在问