如何使用r在以其组ID为条件的单个列中查找多个日期之间的间隔?

所以基本上我有一个称为df的数据框。 df有多个列,但我要重点关注的是DT(一个POSIXct变量)和CITY(一个字符变量)。每个城市在DT列中都有不同数量的条目。我想创建第三列,分别是每个城市的每个时间顺序DT之间的间隔。因此,每个城市都有其自己的独立日期和日期间隔。

我创建了df的子集,以便更直观地了解正在处理的内容。

DT <- as.POSIXct(c("2019-11-02 20:00:00 CET","2019-11-02 19:00:00 CET","2019-11-02 20:00:00 CET","2019-11-03 19:30:00 CET","2019-11-04 19:00:00 CET","2019-11-05 19:30:00 CET","2019-11-05 19:00:00 CET","2019-11-05 20:00:00 CET","2019-11-06 19:30:00 CET","2019-11-06 20:30:00 CET","2019-11-06 19:00:00 CET","2019-11-08 19:30:00 CET","2019-11-08 20:30:00 CET","2019-11-08 20:00:00 CET","2019-11-08 19:00:00 CET","2019-11-09 20:00:00 CET","2019-11-10 21:30:00 CET","2019-11-10 19:30:00 CET","2019-11-10 18:00:00 CET","2019-11-10 21:00:00 CET","2019-11-11 19:30:00 CET","2019-11-11 22:30:00 CET","2019-11-12 21:00:00 CET","2019-11-12 19:00:00 CET" ))
CITY <- c("TOR","ORL","WAS","DAL","CLE","ATL","TOR","CLE")
df <- data.frame(DT,CITY)
df <- df %>% arrange(CITY)
df

我在下面创建的第三列是我想要的结果,而前两列是我目前拥有的。

days <- c(NA,1,2,NA,3,4,2)
df <- data.frame(df,days_since_last_entry)
df

任何帮助将不胜感激

Shuzhenabc123 回答:如何使用r在以其组ID为条件的单个列中查找多个日期之间的间隔?

您可以使用data.table执行以下操作:

require(data.table); setDT(df)
df[,Diff := difftime(DT,shift(DT),units = 'days'),keyby = CITY]

结果

> df
                     DT CITY           Diff
 1: 2019-11-05 19:30:00  ATL        NA days
 2: 2019-11-06 19:30:00  ATL 1.0000000 days
 3: 2019-11-08 19:30:00  ATL 2.0000000 days
 4: 2019-11-10 21:00:00  ATL 2.0625000 days
 5: 2019-11-12 21:00:00  ATL 2.0000000 days
 6: 2019-11-03 19:30:00  CLE        NA days
 7: 2019-11-05 19:00:00  CLE 1.9791667 days
 8: 2019-11-08 19:00:00  CLE 3.0000000 days
 9: 2019-11-10 19:30:00  CLE 2.0208333 days
10: 2019-11-12 19:00:00  CLE 1.9791667 days
11: 2019-11-03 19:30:00  DAL        NA days
12: 2019-11-06 20:30:00  DAL 3.0416667 days
13: 2019-11-08 20:30:00  DAL 2.0000000 days
14: 2019-11-09 20:00:00  DAL 0.9791667 days
15: 2019-11-11 19:30:00  DAL 1.9791667 days
16: 2019-11-02 19:00:00  ORL        NA days
17: 2019-11-05 20:00:00  ORL 3.0416667 days
18: 2019-11-06 20:30:00  ORL 1.0208333 days
19: 2019-11-08 19:00:00  ORL 1.9375000 days
20: 2019-11-10 18:00:00  ORL 1.9583333 days
21: 2019-11-02 20:00:00  TOR        NA days
22: 2019-11-06 19:30:00  TOR 3.9791667 days
23: 2019-11-08 20:00:00  TOR 2.0208333 days
24: 2019-11-10 21:30:00  TOR 2.0625000 days
25: 2019-11-11 22:30:00  TOR 1.0416667 days
26: 2019-11-02 20:00:00  WAS        NA days
27: 2019-11-04 19:00:00  WAS 1.9583333 days
28: 2019-11-06 19:00:00  WAS 2.0000000 days
29: 2019-11-08 19:00:00  WAS 2.0000000 days
                     DT CITY           Diff

# Verifying against provided expected output
> df[,all.equal(round(Diff),days)]
[1] TRUE

如果您想在Diff中四舍五入和/或数字输入,只需将difftimeround(as.numeric())包裹

,

不确定您的预期输出是什么。以下是我的解决方案:

DF <- data.frame(days = sapply(split(df,df$CITY),function(v) Reduce("-",Map(as.Date,c(tail(v["DT"],1)[[1]],head(v["DT"],1)[[1]])))))

给出:

> DF
        days
ATL        7
CLE        9
DAL        8
ORL        8
TOR        9
WAS        6
本文链接:https://www.f2er.com/3102612.html

大家都在问