创建提前和滞后年份虚拟变量以在R中进行回归

这是一个示例数据帧,其中PRE5_id1,POST5_id1,PRE5_id2,POST5_id2是我想要获取的变量。我正在寻找超前和滞后值,该值在自然死亡之前的年份(PRE5)和自然死亡年份之后的5年(POST5)将具有五个值,即1。我不确定在创建这些PRE和POST变量时如何留在该国家/地区范围内,在这种情况下,PRE和POST变量仅在同一国家/地区变为+5和-5。

我打算为每个ID进行单独的回归(我的数据集中总共有69个自然死亡,因此最多为ID69),并为每个回归包括PRE5和POST5,例如:lm(gdp.growth。 rate〜country + year + PRE5_id1 + POST5_id1),以便在回归中是否也可以创建这些PRE和POST虚拟变量也可以使用。

> df <- data.frame(country = rep("Angola",length(20)),year=c(1940:1959),leader = c("David","NA","Henry","Tom","Chris","Alia","NA"),natural.death = c(0,NA,1,NA),gdp.growth.rate=c(1:20),+                    id1=c(0,0),+                  id2=c(0,+                  PRE5_id1=c(0,+                  PRE5_id2=c(0,+                  POST5_id1=c(0,+                  POST5_id2=c(0,0))
> df
   country year leader natural.death gdp.growth.rate id1 id2 PRE5_id1 PRE5_id2 POST5_id1 POST5_id2
1   Angola 1940  David             0               1   0   0        0        0        0        0
2   Angola 1941     NA            NA               2   0   0        1        0        0        0
3   Angola 1942     NA            NA               3   0   0        1        0        0        0
4   Angola 1943     NA            NA               4   0   0        1        0        0        0
5   Angola 1944  Henry             0               5   0   0        1        0        0        0
6   Angola 1945     NA            NA               6   0   0        1        0        0        0
7   Angola 1946    Tom             1               7   1   0        0        0        0        0
8   Angola 1947     NA            NA               8   0   0        0        0        1        0
9   Angola 1948  Chris             0               9   0   0        0        1        1        0
10  Angola 1949     NA            NA              10   0   0        0        1        1        0
11  Angola 1950     NA            NA              11   0   0        0        1        1        0
12  Angola 1951     NA            NA              12   0   0        0        1        1        0
13  Angola 1952     NA            NA              13   0   0        0        1        0        0
14  Angola 1953   Alia             1              14   0   1        0        0        0        0
15  Angola 1954     NA            NA              15   0   0        0        0        0        1
16  Angola 1955     NA            NA              16   0   0        0        0        0        1
17  Angola 1956     NA            NA              17   0   0        0        0        0        1
18  Angola 1957     NA            NA              18   0   0        0        0        0        1
19  Angola 1958     NA            NA              19   0   0        0        0        0        1
20  Angola 1959     NA            NA              20   0   0        0        0        0        0

任何帮助将不胜感激。谢谢!

尝试以下答案之一并将原始df修改为以下内容(请参见下文)后,我得到以下output.df(请参见下文):

> df <- data.frame(country=c("Angola","Angola",+                            "Angola","US",+                            "US","US"),+                  year=c(1940:1949,1940:1949),+                  leader = c("David",+                             "Tom",+                             "Alia",+                  natural.death = c(0,0))

> output.df
          country year leader natural.death gdp.growth.rate id1 id2 id1.PRE
Angola.1   Angola 1940  David             0               1   0   0       0
Angola.2   Angola 1941     NA            NA               2   0   0       1
Angola.3   Angola 1942     NA            NA               3   0   0       1
Angola.4   Angola 1943     NA            NA               4   0   0       1
Angola.5   Angola 1944  Henry             0               5   0   0       1
Angola.6   Angola 1945     NA            NA               6   0   0       1
Angola.7   Angola 1946    Tom             1               7   1   0       0
Angola.8   Angola 1947     NA            NA               8   0   0       0
Angola.9   Angola 1948  Chris             0               9   0   0       0
Angola.10  Angola 1949     NA            NA              10   0   0       0
US.1           US 1940     NA            NA              11   0   0       0
US.2           US 1941     NA            NA              12   0   0       0
US.3           US 1942     NA            NA              13   0   0       0
US.4           US 1943   Alia             1              14   0   1       0
US.5           US 1944     NA            NA              15   0   0       0
US.6           US 1945     NA            NA              16   0   0       0
US.7           US 1946     NA            NA              17   0   0       0
US.8           US 1947     NA            NA              18   0   0       0
US.9           US 1948     NA            NA              19   0   0       0
US.10          US 1949     NA            NA              20   0   0       0
          id1.POST id2.PRE id2.POST
Angola.1         0       0        0
Angola.2         0       0        1
Angola.3         0       0        1
Angola.4         0       0        1
Angola.5         0       0        1
Angola.6         0       0        1
Angola.7         0       0        0
Angola.8         1       0        0
Angola.9         1       0        0
Angola.10        1       0        0
US.1             0       1        0
US.2             1       1        0
US.3             1       1        0
US.4             1       0        0
US.5             1       0        1
US.6             1       0        1
US.7             0       0        1
US.8             0       0        1
US.9             0       0        1
US.10            0       0        0
felixwoo80 回答:创建提前和滞后年份虚拟变量以在R中进行回归

使用基数R的一种方法。我们创建一个函数generate_dummy,该函数为每个"id"列返回包含PRE和POST数据的两列。

generate_dummy <- function(x) {
   inds <- which(x == 1)
   if(length(inds) == 1) {
     vec <- seq_along(x)
     data.frame(PRE = +(vec > (inds - 6) & vec < (inds)),POST = +(vec > (inds) & vec < (inds + 6)))
     }
     else  data.frame(PRE = rep(0,length(x)),POST = rep(0,length(x)))
}


#Columns which start with id
cols <- grep("^id",names(df),value = TRUE)

要将其应用于每个国家/地区,我们按国家/地区划分数据,并对每个国家/地区应用generate_dummy函数,然后合并结果。

output <- cbind(df,do.call(rbind,lapply(split(df,df$country),function(x) 
                       do.call(cbind,lapply(x[cols],generate_dummy)))))
row.names(output) <- NULL  

output
#   country year leader natural.death gdp.growth.rate id1 id2 id1.PRE id1.POST id2.PRE id2.POST
#1   Angola 1940  David             0               1   0   0       0        0       0        0
#2   Angola 1941     NA            NA               2   0   0       1        0       0        0
#3   Angola 1942     NA            NA               3   0   0       1        0       0        0
#4   Angola 1943     NA            NA               4   0   0       1        0       0        0
#5   Angola 1944  Henry             0               5   0   0       1        0       0        0
#6   Angola 1945     NA            NA               6   0   0       1        0       0        0
#7   Angola 1946    Tom             1               7   1   0       0        0       0        0
#8   Angola 1947     NA            NA               8   0   0       0        1       0        0
#9   Angola 1948  Chris             0               9   0   0       0        1       1        0
#10  Angola 1949     NA            NA              10   0   0       0        1       1        0
#11  Angola 1950     NA            NA              11   0   0       0        1       1        0
#12  Angola 1951     NA            NA              12   0   0       0        1       1        0
#13  Angola 1952     NA            NA              13   0   0       0        0       1        0
#14  Angola 1953   Alia             1              14   0   1       0        0       0        0
#15  Angola 1954     NA            NA              15   0   0       0        0       0        1
#16  Angola 1955     NA            NA              16   0   0       0        0       0        1
#17  Angola 1956     NA            NA              17   0   0       0        0       0        1
#18  Angola 1957     NA            NA              18   0   0       0        0       0        1
#19  Angola 1958     NA            NA              19   0   0       0        0       0        1
#20  Angola 1959     NA            NA              20   0   0       0        0       0        0

数据

df <- data.frame(country = rep("Angola",length(20)),year=c(1940:1959),leader = c("David","NA","Henry","Tom","Chris","Alia","NA"),natural.death = c(0,NA,1,NA),gdp.growth.rate=c(1:20),id1=c(0,0),id2=c(0,0))
本文链接:https://www.f2er.com/2988638.html

大家都在问