library(dplyr)
train %>%
group_by(y) %>%
mutate_at(vars(-y),function(v){
if_else(is.na(v),mean(v,na.rm = TRUE),v)
}) %>%
ungroup()
## A tibble: 4 x 3
# y x1 x2
# <dbl> <dbl> <dbl>
#1 1 2 8
#2 2 4 NaN
#3 1 3.5 6
#4 1 5 12
,
按“ y”列分组后,我们可以使用na.aggregate
library(dplyr)
library(zoo)
train %>%
group_by(y) %>%
mutate_at(vars(-one_of(group_vars(.))),~if(all(is.na(.))) NA_real_ else na.aggregate(.))
# A tibble: 4 x 3
# Groups: y [2]
# y x1 x2
# <dbl> <dbl> <dbl>
#1 1 2 8
#2 2 4 NA
#3 1 3.5 6
#4 1 5 12
或在基于{y1列} na.aggregate
将数据集放入split
个list
中的data.frame
后应用train[-1] <- unsplit(lapply(split(train[-1],train$y),na.aggregate),train$y)
-k
,
请考虑使用ave
来确定是否将ifelse
的条件包裹在NA
中的分组平均值:
# ITERATE THROUGH ALL COLUMNS BUT FIRST
for(i in c("x1","x2")) {
train[[i]] <- ifelse(test = is.na(train[[i]]),yes = ave(train[[i]],train$y,FUN=function(x) mean(x,na.rm=TRUE)),no = train[[i]])
}
train
# y x1 x2
# 1 1 2.0 8
# 2 2 4.0 NaN
# 3 1 3.5 6
# 4 1 5.0 12
本文链接:https://www.f2er.com/3156507.html