获取通过重采样计算的多个回归系数值

2024-05-19 • 问答

我正在使用重采样来计算多个线性模型的系数。在使用boot函数之前，但将来需要在分析中包括新的统计信息，因此我认为这种方法更好。可复制的示例：

iris <- iris[,1:4]

nboots <- 100
ncol = ncol(iris)
boot.r.squared <- numeric(nboots)      
boot.p_model <- numeric(nboots)
boot.coef_p <- numeric(nboots)
boot.coef_estimate <- matrix(nrow= nboots,ncol=ncol)
boot.coef_error <- matrix(nrow= nboots,ncol=ncol)
boot.coef_t <- matrix(nrow= nboots,ncol=ncol)
boot.coef_p <- matrix(nrow= nboots,ncol=ncol)

for(i in 1:nboots){
  boot.samp <- iris[sample(nrow(iris),size = 100,replace = TRUE,),] 
  model <- lm(boot.samp$Sepal.Length ~ .,boot.samp)
  model.sum <- summary(model)

  boot.r.squared[i] <- model.sum$r.squared
  stat <- model.sum$fstatistic
  boot.p_model[i] <- pf(stat[1],stat[2],stat[3],lower.tail = FALSE)

  boot.coef_estimate[i,1:length(model$coefficients)] <- model$coefficients[1]
  boot.coef_error[i,1:length(model$coefficients)] <- model$coefficients[2]
  boot.coef_t[i,1:length(model$coefficients)] <- model$coefficients[3]
  boot.coef_p[i,1:length(model$coefficients)] <- model$coefficients[4] 
}

但是我不能正确地以矩阵形式保存系数。我想保存4个矩阵，每个矩阵中包含：参数，错误，统计t和p值。

使用此代码，所有列均相同。我尝试将put [，1]保存到第一列，但是会发生此错误。我该如何解决该错误？

模型$系数[，1]中的错误：维数错误

您的代码对我有用，没有错误。您可能尝试了不同的重复次数，却忘记了在循环之前更新定义的空对象的大小，因为当我更改为nboots <- 1000时，我可能会再次出错。

您可以在没有for循环的情况下执行此操作，而应使用replicate()（基本上是lapply()的包装器）。这样做的好处是它可能更直接一些，并且您无需事先定义空对象。

该过程是 first ，用于定义引导程序功能，例如bootFun()， second ，即使用replicate()的实际引导程序，它会产生一个数组，最后是 third ，以将数组中的数据处理为所需的矩阵。

# Step 1 bootstrap function
bootFun <- function(X) {
  fit <- lm(Sepal.Length ~ .,data=X)
  s <- summary(fit)
  r2 <- s$r.squared
  f <- s$fstatistic
  p.f <- pf(f[1],f[2],f[3],lower.tail = FALSE)
  ms <- s$coefficients
  return(cbind(ms,r2,p.f))
}

# Step 2 bootstrap
nboots <- 1e2
set.seed(42)  # for sake of reproducibility
A <- replicate(nboots,bootFun(iris[sample(1:nrow(iris),size=nrow(iris),replace=TRUE),]))
# Note: I used here `size=nrow(iris)`,but you also could use size=100 if this is your method

# Step 3 process array ("de-transpose" and transform to list)
Ap <- aperm(A,c(3,1,2))  # aperm() is similar to t() for matrices
Aps <- asplit(Ap,3)  # split the array into a handy list

# or in one step
Aps <- asplit(aperm(A,2)),3)

结果

现在您有了一个包含所有矩阵的列表，请查看列表对象的名称。

names(Aps)
# [1] "Estimate"   "Std. Error" "t value"    "Pr(>|t|)"   "r2"         "p.f"

因此，如果我们要我们的估计矩阵，我们只需：

estimates <- Aps$Estimate

estimates[1:3,]
#      (Intercept) Sepal.Width Petal.Length Petal.Width
# [1,]    1.353531   0.7655760    0.8322749  -0.7775090
# [2,]    1.777431   0.6653308    0.7353491  -0.6024095
# [3,]    2.029428   0.5825554    0.6941457  -0.4795787

请注意， F 统计量和 R ²也是矩阵，并且在每个行中针对每个系数重复。由于您只需要一个，因此只需提取例如第一列：

Aps$p.f[1:3,]
#       (Intercept)  Sepal.Width Petal.Length  Petal.Width
# [1,] 2.759019e-65 2.759019e-65 2.759019e-65 2.759019e-65
# [2,] 5.451912e-66 5.451912e-66 5.451912e-66 5.451912e-66
# [3,] 3.288712e-54 3.288712e-54 3.288712e-54 3.288712e-54

Aps$p.f[1:3,1]
# [1] 2.759019e-65 5.451912e-66 3.288712e-54

基准

两种方法的速度实际上是相同的，但对replicate()来说却有一个小的优势。这里是一个基准，比较了我使用nboot=1,000复制的两种方法。

# Unit: seconds
#      expr      min       lq     mean   median       uq      max neval cld
#   forloop 2.215259 2.235797 2.332234 2.381035 2.401933 2.700622   100   a
# replicate 2.218291 2.240570 2.313526 2.257905 2.400532 2.601958   100   a

关于原始代码，您将model和model.sum混合了。插入系数，误差等时，需要获取summary（model）的系数。

参见下文：

for(i in 1:nboots){
  boot.samp <- iris[sample(nrow(iris),size = 100,replace = TRUE,),] 
  model <- lm(boot.samp$Sepal.Length ~ .,boot.samp)
  model.sum <- summary(model)

  boot.r.squared[i] <- model.sum$r.squared
  stat <- model.sum$fstatistic
  boot.p_model[i] <- pf(stat[1],stat[2],stat[3],lower.tail = FALSE)

  # this is the part where you need to use model.sum
  boot.coef_estimate[i,1:length(model$coefficients)] <- model.sum$coefficients[,1]
  boot.coef_error[i,2]
  boot.coef_t[i,3]
  boot.coef_p[i,4] 
}

作为附带的注释，您想知道使用boot进行系数估计的变量如何。对于其他统计数据（例如r平方，p值），在引导程序之后对其进行解释并不是那么有用。因此，请考虑一下您想从引导中了解的内容。

也请查看@ jay.sf的答案，这是包装计算函数的一种更简洁的方法。

获取通过重采样计算的多个回归系数值

y4328978 回答：获取通过重采样计算的多个回归系数值

结果

基准

大家都在问