用数值减去行并忽略NA

2024-05-19 • 问答

我有几个包含约18列的数据帧。 50000行。每行条目代表一个特定站点（=列）上的度量，并且数据包含NA值。

我需要减去每列的连续行（例如row（i + 1）-row（i））以检测阈值，但是我需要忽略（并保留）NA，以便仅包含数字的条目值相减。

我发现data.table解决方案对单个列Iterate over a column ignoring but retaining NA values in R和多个列操作（例如Summarizing multiple columns with dplyr?）都非常有用。

但是，我还没有设法结合SO中建议的方法（即在多列上应用diff并忽略NA）

以下是示例df的插图和我尝试的解决方案：

library(data.table)

df <- data.frame(x=c(1:3,NA,9:7),y=c(NA,4:6,15:13),z=c(6,2,7,14,20,2))

这就是单列的工作方式

 diff_x <- df[!is.na(x),lag_diff := x - shift(x)]  # actually what I want,but for more columns at once

这就是我如何在diff的几列上应用lapply函数

diff_all <- setDT(df)[,lapply(.SD,diff)]  # not exactly what I want because NAs are not ignored and  the difference between numeric values is not calculated

我希望您能对如何在第二秒内实施有效的base或类似语句提出任何建议（data.table，dplyr，!is.na，...解决方案）代码行非常多。

lag_diff <- function(x) { which_nna <- which(!is.na(x)) out <- rep(NA_integer_,length(x)) out[which_nna] <- x[which_nna] - shift(x[which_nna]) out } cols <- c("x","y","z") setDT(df) df[,paste0("lag_diff_",cols) := lapply(.SD,lag_diff),.SDcols = cols]

# x y z lag_diff_x lag_diff_y lag_diff_z # 1: 1 NA 6 NA NA NA # 2: 2 4 2 1 NA -4 # 3: 3 5 7 1 1 5 # 4: NA 6 14 NA 1 7 # 5: NA NA 20 NA NA 6 # 6: 9 15 NA 6 9 NA # 7: 8 14 NA -1 -1 NA # 8: 7 13 2 -1 -1 -18

library("data.table") df <- data.frame(x=c(1:3,NA,9:7),y=c(NA,4:6,15:13),z=c(6,2,7,14,20,2)) setDT(df) # diff_x <- df[!is.na(x),lag_diff := x - shift(x)] # actually what I want,but lag_d <- function(x) { y <- x[!is.na(x)]; x[!is.na(x)] <- y - shift(y); x } df[,lapply(.SD,lag_d)]

用数值减去行并忽略NA

shuiyezhu 回答：用数值减去行并忽略NA

大家都在问