对于下面的数据框,我试图从 FactorCol1 有条件地创建八个额外的列 Last1Col7activ 到 Last10Col7inactive:
library(tidyverse)
Data_Frame <- data.frame(Col1 = c("A1","A1","A2","A3","A3"),Col2 = c("2011-03-11","2014-08-21","2016-01-17","2017-06-30","2018-07-11","2018-11-28","2019-09-04","2020-02-29","2020-07-12"),Col3 = c("2018-10-22","2019-05-24","2020-12-25","2018-10-12","2019-09-24","2020-12-19","2018-10-22","2019-06-14","2020-12-20"),Col4 = c(4,2,1,4,4),Col5 = c(7,6,3,5,2),FactorCol1 = c("active","inactive","active","inactive"),FactorCol2 = c("Level2","Level2","Level3","Level1","Level3"))
Data_Frame$Col1 <- as.factor(Data_Frame$Col1)
Data_Frame$Col2 <- as.Date(Data_Frame$Col2)
Data_Frame$Col3 <- as.Date(Data_Frame$Col3)
Data_Frame$FactorCol1 <- as.factor(Data_Frame$FactorCol1)
Data_Frame$FactorCol2 <- as.factor(Data_Frame$FactorCol2)
Data_Frame <- Data_Frame %>% group_by(Col1) %>% mutate(Col6 = lubridate::time_length(lubridate::interval(Col2,max(Col3)),"years"))
Data_Frame <- Data_Frame %>% group_by(Col1) %>% dplyr::mutate(Col7 = ifelse(Col6 <= 1,ifelse(Col6 >1 & Col6 <=2,ifelse(Col6 >2 & Col6 <=5,ifelse(Col6 >5 & Col6 <=10,10,11)))))
Data_Frame <- Data_Frame %>% group_by(Col1) %>% dplyr::mutate(Col8 = ifelse(FactorCol1 == 'active',0))
Data_Frame <- Data_Frame %>% group_by(Col1) %>% dplyr::mutate(Col9 = ifelse(FactorCol1 == 'inactive',0))
Data_Frame <- as.data.frame(Data_Frame)
Data_Frame <- map_dfc(c(1,10),~ Data_Frame %>%
group_by(Col1) %>%
transmute(!! sprintf("Last%dCol7active",.x) := ifelse(((Col7 <= .x)),sum(Col8 == 1),0),!! sprintf("Last%dCol7inactive",sum(Col9 == 1),0)) %>%
ungroup %>%
select(-Col1)) %>%
bind_cols(Data_Frame,.)
在上面,
Col6:每组内 max(Col3) 和 Col2 的时间差
Col7:Col6 中值的 (
Col8:FactorCol1 中的活动元素设置为 1
Col9:FactorCol1 中的非活动元素设置为 1
Last1Col7active:在每个组内(Col1 中的 A1 到 A3),FactorCol1 中的活动元素数在 Col7 中
Last1Col7inactive:在每个组内,FactorCol1 中的非活动元素数在 Col7 中
Last5Col7active:在每个组(Col1 中的 A1 到 A3)中,FactorCol1 中的活动元素数在 Col7 中
Last5Col7inactive:在每个组中,FactorCol1 中的非活动元素数在 Col7 中
Last10Col7active:在每个组内(Col1 中的 A1 到 A3),FactorCol1 中的活动元素数在 Col7 中
Last10Col7inactive:在每个组内,FactorCol1 中的非活动元素数量在 Col7 中
怎么了?还有,有没有一种方法可以自动生成这些列,而无需输入转换中的级别名称?