按小时分组计数

2024-05-05 • 问答

我想获取每种类型（version1和version2）每小时的计数。

样本数据：

type <- c('version1','version1','version2','version2')

startdate <- as.POSIXct(c('2017-11-1 02:11:02.000','2018-3-25 02:13:02.000','2019-3-14 03:45:02.000','2017-3-14 02:55:02.000','2018-3-14 03:45:02.000'))


df <- data.frame(type,startdate)

df

      type           startdate
1 version1 2017-11-01 02:11:02
2 version1 2018-03-25 02:13:02
3 version1 2019-03-14 03:45:02
4 version2 2017-03-14 02:55:02
5 version2 2018-03-14 03:45:02

在此df中，我们看到version1的02h有两个计数，而03h有一个计数。

version2的一个数字为02h，一个数字为03h。

所需的输出：

   hour version1 version2
1 00:00        0        0
2 01:00        0        0
3 02:00        2        1
4 03:00        1        1

我们首先可以从startdate获得小时，每小时获得count行数，type得到小时。 complete缺少小时数，并用0填充其计数，然后使用pivot_wider获取宽格式的数据。

library(dplyr)
library(tidyr)

df %>%
  mutate(hr = lubridate::hour(startdate)) %>%
  count(hr,type) %>%
  complete(type,hr = seq(0,max(hr)),fill = list(n = 0)) %>%
  pivot_wider(names_from = type,values_from = n)

# A tibble: 4 x 3
#     hr version1 version2
#  <int>    <dbl>    <dbl>
#1     0        0        0
#2     1        0        0
#3     2        2        1
#4     3        1        1

开始日期变量出了点问题。因此，我使用软件包lubridate

对其进行了设置

library(dplyr)
library(tidyr)

type = c('version1','version1','version2','version2')

startdate = lubridate::ymd_hms(c('2017-11-1T02:11:02.000','2018-3-25T02:13:02.000','2019-3-14T03:45:02.000','2017-3-14T02:55:02.000','2018-3-14T03:45:02.000'))

tibble(type = type,startdate = startdate) %>%
  count(type,hour = lubridate::hour(startdate)) %>%
  spread(type,n)

# A tibble: 2 x 3
   hour version1 version2
  <int>    <int>    <int>
1     2        2        1
2     3        1        1

Base R解决方案：

# Extract the hour and store it as a vector: 

df$hour <- gsub(".* ","",trunc(df$startdate,units = "hours"))

# Count the number of observations of each type in each hour: 

df$type_hour_cnt <- with(df,ave(paste(type,hour,sep = " - "),paste(type,FUN = seq_along))

# Reshape dataframe: 

df <- as.data.frame(as.matrix(xtabs(type_hour_cnt ~ hour + type,df,sparse = T)))

# Extract rownames and store them as "hour" vector and then delete row.names: 

df <- data.frame(cbind(hour = row.names(df),df),row.names = NULL)

按小时分组计数

carol0045 回答：按小时分组计数

大家都在问