在这种情况下,您可以在R中使用“ olsrr”软件包进行逐步回归分析。我为您提供了一个示例代码,以在R中进行逐步回归分析
library("olsrr")
#Load the data
d <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/v.csv",h = T)
# stepwise regression
vv <- lm(dint ~ Age + genre + Length + cf.training + error.type + cf.scope + cf.type + cf.revision,data = d)
summary(vv)
k <- ols_step_both_p(vv,pent = 0.05,prem = 0.1)
# stepwise regression plot
plot(k)
# final model
k$model
It will provide you exactly same output as that of SPSS.
,
正如其他人指出的那样,问题在于您似乎具有多重共线性。另一个是您的数据集中缺少值。缺少的值可能应该被删除。至于相关变量,您应该检查数据以识别此共线性,然后将其删除。确定要删除和保留哪些变量是一个非常特定于域的主题。但是,如果您希望决定使用regularisation并拟合模型,同时保留所有变量,则可以。当n
(样本数)小于p
(预测数)时,这也使您可以拟合模型。
我在下面显示了代码,该代码演示了如何检查数据中的相关结构以及如何确定哪些变量之间的相关性最高(感谢this answer。我已经提供了一个拟合此类模型的示例,使用L2正则化(通常称为岭回归)。
d <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/v.csv",h = T) # Data
nms <- c("Age","genre","Length","cf.training","error.type","cf.scope","cf.type","cf.revision")
d[nms] <- lapply(d[nms],as.factor) # make factor
vv <- lm(dint~Age+genre+Length+cf.training+error.type+cf.scope+cf.type+cf.revision,data = d)
df <- d
df[] <- lapply(df,as.numeric)
cor_mat <- cor(as.matrix(df),use = "complete.obs")
library("gplots")
heatmap.2(cor_mat,trace = "none")
## https://stackoverflow.com/questions/22282531/how-to-compute-correlations-between-all-columns-in-r-and-detect-highly-correlate
library("tibble")
library("dplyr")
library("tidyr")
d2 <- df %>%
as.matrix() %>%
cor(use = "complete.obs") %>%
## Set diag (a vs a) to NA,then remove
(function(x) {
diag(x) <- NA
x
}) %>%
as.data.frame %>%
rownames_to_column(var = 'var1') %>%
gather(var2,value,-var1) %>%
filter(!is.na(value)) %>%
## Sort by decreasing absolute correlation
arrange(-abs(value))
## 2 pairs of variables are almost exactly correlated!
head(d2)
#> var1 var2 value
#> 1 id study.name 0.9999430
#> 2 study.name id 0.9999430
#> 3 Location timed 0.9994082
#> 4 timed Location 0.9994082
#> 5 Age ed.level 0.7425026
#> 6 ed.level Age 0.7425026
## Remove some variables here,or maybe try regularized regression (see below)
library("glmnet")
## glmnet requires matrix input
X <- d[,c("Age","cf.revision")]
X[] <- lapply(X,as.numeric)
X <- as.matrix(X)
ind_na <- apply(X,1,function(row) any(is.na(row)))
X <- X[!ind_na,]
y <- d[!ind_na,"dint"]
glmnet <- glmnet(
x = X,y = y,## alpha = 0 is ridge regression
alpha = 0)
plot(glmnet)
由reprex package(v0.3.0)于2019-11-08创建
本文链接:https://www.f2er.com/3136589.html