Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 当我有一个包含电影开始和结束时间的数据帧时,如何计算每小时观看的电影数量?_R - Fatal编程技术网

R 当我有一个包含电影开始和结束时间的数据帧时,如何计算每小时观看的电影数量?

R 当我有一个包含电影开始和结束时间的数据帧时,如何计算每小时观看的电影数量?,r,R,我有一个数据框,看起来像: start_timestamp end_timestamp 2012-11-18 05:53:36.0 2012-11-18 7:46:40.0 2012-11-18 06:34:23.0 2012-12-18 09:21:57.0 hour moves_being_played 2012-11-18 05:00:00.0 NA 2012-11-18 06:00:00.0 NA 2012-11-18 07:0

我有一个数据框,看起来像:

start_timestamp        end_timestamp
2012-11-18 05:53:36.0  2012-11-18 7:46:40.0
2012-11-18 06:34:23.0  2012-12-18 09:21:57.0
hour                   moves_being_played
2012-11-18 05:00:00.0  NA
2012-11-18 06:00:00.0  NA
2012-11-18 07:00:00.0  NA
2012-11-18 08:00:00.0  NA
2012-11-18 09:00:00.0  NA
我希望输出像这样:

hour                   moves_being_played
2012-11-18 05:00:00.0  1
2012-11-18 06:00:00.0  2
2012-11-18 07:00:00.0  2
2012-11-18 08:00:00.0  1
2012-11-18 09:00:00.0  1
我能想到的唯一方法是创建一个如下表:

start_timestamp        end_timestamp
2012-11-18 05:53:36.0  2012-11-18 7:46:40.0
2012-11-18 06:34:23.0  2012-12-18 09:21:57.0
hour                   moves_being_played
2012-11-18 05:00:00.0  NA
2012-11-18 06:00:00.0  NA
2012-11-18 07:00:00.0  NA
2012-11-18 08:00:00.0  NA
2012-11-18 09:00:00.0  NA

然后使用一个for循环,在给定的时间段内每小时迭代一次,看看有多少
start\u时间戳
更低,并与一个
end\u时间戳
配对,后者更大,但效率似乎非常低

@alistaire的评论是一个简洁、高效的解决方案,很可能既是一个实际的答案,也是一个公认的答案

抛开这一条,展示类似但更复杂情况下的通用习惯用法(没有足够的
do()
示例):

library(dplyr)

df <- data_frame(
  start_timestamp=as.POSIXct(c("2012-11-18 05:53:36.0", "2012-11-18 06:34:23.0")),
  end_timestamp=as.POSIXct(c("2012-11-18 07:46:40.0", "2012-11-18 09:21:57.0"))
)

hourly_count <- function(x) {

  range(x$start_timestamp, x$end_timestamp) %>%
    format("%Y-%m-%d %H:00:00") %>%
    as.POSIXct()-> rng

  hrs <- seq(from=rng[1], to=rng[2], by="1 hour")

  data_frame(hour=hrs, is_playing=TRUE)

}

rowwise(df) %>%
  do(hourly_count(.)) %>%
  count(hour, is_playing) %>%
  select(-is_playing, movies_being_played=n)
## Source: local data frame [5 x 2]
## Groups: hour [5]
## 
##                  hour movies_being_played
##                <dttm>               <int>
## 1 2012-11-18 05:00:00                   1
## 2 2012-11-18 06:00:00                   2
## 3 2012-11-18 07:00:00                   2
## 4 2012-11-18 08:00:00                   1
## 5 2012-11-18 09:00:00                   1
库(dplyr)
df%
as.POSIXct()->rng
小时%
do(每小时计数(%)%>%
计数(小时,正在播放)%>%
选择(-正在播放,正在播放的电影=n)
##来源:本地数据帧[5 x 2]
##分组:小时[5]
## 
##每小时播放一部电影
##                               
## 1 2012-11-18 05:00:00                   1
## 2 2012-11-18 06:00:00                   2
## 3 2012-11-18 07:00:00                   2
## 4 2012-11-18 08:00:00                   1
## 5 2012-11-18 09:00:00                   1

没那么糟糕:
sapply(seq(trunc(min)(df$start\u timestamp),'hour')、max(df$end\u timestamp),by='hour')、函数(x){sum(x>=df$start\u timestamp&x比我想象的要优雅得多!