使用R编程。我有Vendor_id,Bank_account_no和Date超过300万的数据集。我想获取每个Bank_account_no更改的vendor_id的行,例如,在三个月内,从X到X到X(至少三倍,可能超过三个)到Y(仅一次)到X。 数据集的更改都是随机的,因此窗口的固定值与每个vendor_id的行数无关。 我使用 rle 函数来获取不同Bank_account_no的长度。考虑到要为每个vendor_id运行此逻辑,因此不确定如何在R中为这么多行创建逻辑。 可能是data.table可以帮助您。 输入如下:
Vendor_ID Bank_account_no Date
<!-- -->
dddd X 24-12-2018
dddd X 24-12-2018
dddd X 26-12-2018
dddd Y 27-12-2018
dddd X 28-12-2018
dddd X 29-12-2018
dddd X 29-12-2018
dddd X 31-12-2018
dddd X 24-01-2019
dddd Z 25-01-2019
dddd X 28-01-2019
dddd G 28-01-2019
dddd G 28-01-2019
eeee A 30-01-2019
eeee A 31-01-2019
eeee A 31-01-2019
eeee B 31-01-2019
eeee A 31-01-2019
输出应为:
Vendor_ID Bank_account_no Date Case
<!-- -->
dddd X 24-12-2018 Case1
dddd X 24-12-2018 Case1
dddd X 26-12-2018 Case1
dddd Y 27-12-2018 Case1
dddd X 28-12-2018 Case1
dddd X 29-12-2018 Case2
dddd X 29-12-2018 Case2
dddd X 31-12-2018 Case2
dddd X 24-01-2019 Case2
dddd Z 25-01-2019 Case2
dddd X 28-01-2019 Case2
eeee A 30-01-2019 Case3
eeee A 31-01-2019 Case3
eeee A 31-01-2019 Case3
eeee B 31-01-2019 Case3
eeee A 31-01-2019 Case3