R:如何过滤一个表中的日期、获取计数并返回另一个表中的每一行?使用dplyr和lubridate是否可能?

R:如何过滤一个表中的日期、获取计数并返回另一个表中的每一行?使用dplyr和lubridate是否可能?,r,dplyr,lubridate,R,Dplyr,Lubridate,第一个问题,所以如果我在提供全面信息时遗漏了什么,请告诉我 背景:我有两张桌子。一个是一张技术罚单表,以及它们何时被打开和何时被解决(关闭)。我想创建一个时间表,计算每天有多少张票打开 以下是我迄今为止所做的工作: # load in data tickets <- read.csv("tickets.csv",header=TRUE) #packages library(tidyr) library(dplyr) library(lubridate) tickets <- t

第一个问题,所以如果我在提供全面信息时遗漏了什么,请告诉我

背景:我有两张桌子。一个是一张技术罚单表,以及它们何时被打开和何时被解决(关闭)。我想创建一个时间表,计算每天有多少张票打开

以下是我迄今为止所做的工作:

# load in data
tickets <- read.csv("tickets.csv",header=TRUE)


#packages
library(tidyr)
library(dplyr)
library(lubridate)

tickets <- tbl_df(tickets)
tickets

## A tibble: 10 × 3
#ID Date.Time.Opened Date.Time.Closed
#<int>           <fctr>           <fctr>
#1      1    1/19/17 11:51    1/30/17 14:44
#2      2    1/22/16 12:27    1/30/17 13:36
#3      3    1/20/17 17:07     1/27/17 7:24
#4      4    1/20/17 18:23     1/27/17 7:24
#5      5     1/20/17 8:54    1/26/17 12:09
#6      6    1/24/17 18:54    1/26/17 12:09
#7      7    1/25/17 11:33    1/26/17 12:08
#8      8    1/23/17 11:22    1/25/17 16:31
#9      9    1/20/17 16:48    1/25/17 15:06
#10    10    1/9/17 8:57    1/25/17 13:46


#dates are currently factors; change to dates. 
tickets2 <- 
tickets %>%
mutate(Date.Time.Opened = mdy_hm(Date.Time.Opened)) %>%
mutate(Date.Time.Closed = mdy_hm(Date.Time.Closed))
head(tickets2)


# A tibble: 6 × 3
#ID    Date.Time.Opened    Date.Time.Closed
#<int>              <dttm>              <dttm>
#1     1 2017-01-19 11:51:00 2017-01-30 14:44:00
#2     2 2016-01-22 12:27:00 2017-01-30 13:36:00
#3     3 2017-01-20 17:07:00 2017-01-27 07:24:00
#4     4 2017-01-20 18:23:00 2017-01-27 07:24:00
#5     5 2017-01-20 08:54:00 2017-01-26 12:09:00
#6     6 2017-01-24 18:54:00 2017-01-26 12:09:00
以下是我在设置数据后写的内容:

# write a function which takes a date, searches the tickets table and     
returns the number of tickets that are open

nOpenTickets <- function(x){
nrow(filter(tickets,
x > mdy_hm(Date.Time.Opened) &
x < mdy_hm(Date.Time.Closed)))
}


#Add a column to the timeline with the number returned by the function 
(the number of open tickets on that date)

timeline <- mutate(timeline,ticketsOpen = nOpenTickets(tDates))
timeline


# my results: 
## A tibble: 10 × 2
#tDates ticketsOpen
#<date>       <int>
#1  2017-01-20           0
#2  2017-01-21           0
#3  2017-01-22           0
#4  2017-01-23           0
#5  2017-01-24           0
#6  2017-01-25           0
#7  2017-01-26           0
#8  2017-01-27           0
#9  2017-01-28           0
#10 2017-01-29           0
时间线:

> dput(timeline)
structure(list(tDates = structure(c(17186, 17187, 17188, 17189, 
17190, 17191, 17192, 17193, 17194, 17195), class = "Date")), .Names =       
"tDates", class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))
================================================================

更新

================================================================

以下是最终奏效的方法(谢谢,alistaire!)

#包
图书馆(lubridate)
图书馆(tidyverse)
#载入数据
门票%
变异(n=总和(t日期%在%tickets2$int内))

这里有一种tidyverse方法,但它返回的数字与您发布的略有不同:
库(tidyverse);票证%>%在(-1,as.Date,%m/%d/%y')%%>%变化(tDates=map2(Date.Time.Opened,Date.Time.Closed,seq,by='day'))%%>%unest(tDates)%%>%count(tDates)%%>%right\u加入(时间线)
。如果你说data.table语法的话,我想也有一个很好的方法来处理它。@alistaire,谢谢,我来试试这个!当我试图找出这篇文章中Excel中的数字时,我的数字很可能会有点偏离。我一试用就发表评论!更直接的替代方法是使用
lubridate::`%within%`
tickets2%mutate\u at(-1,mdy\u hm)%>%mutate(int=interval(Date.Time.Opened,Date.Time.Closed));timeline%%>%rowwise()%%>%mutate(n=sum(tDates%在%tickets2$int内))
更新:我收到一个错误:
错误:应该命名其他参数
它可能来自第行的mutate\u?(我不太熟悉的变种。我会试试你的直接选择!啊!出于某种原因,lubridate::
%within%
:正在触发一个错误:
在lubridate中出错…..找不到对象“lubridate”
(即使我加载并运行了它,它仍然声明了这一点。)我从“tickets2”开始运行我得到了结果!谢谢。;-)现在我需要仔细检查你的代码,这样我才能准确地知道它在做什么!:-)这里有一个tidyverse方法,但它返回的数字与您发布的略有不同:
library(tidyverse);票证%>%在(-1,as.Date,%m/%d/%y')%%>%变化(tDates=map2(Date.Time.Opened,Date.Time.Closed,seq,by='day'))%%>%unest(tDates)%%>%count(tDates)%%>%right\u加入(时间线)
。如果你说data.table语法的话,我想也有一个很好的方法来处理它。@alistaire,谢谢,我来试试这个!当我试图找出这篇文章中Excel中的数字时,我的数字很可能会有点偏离。我一试用就发表评论!更直接的替代方法是使用
lubridate::`%within%`
tickets2%mutate\u at(-1,mdy\u hm)%>%mutate(int=interval(Date.Time.Opened,Date.Time.Closed));timeline%%>%rowwise()%%>%mutate(n=sum(tDates%在%tickets2$int内))
更新:我收到一个错误:
错误:应该命名其他参数
它可能来自第行的mutate\u?(我不太熟悉的变种。我会试试你的直接选择!啊!出于某种原因,lubridate::
%within%
:正在触发一个错误:
在lubridate中出错…..找不到对象“lubridate”
(即使我加载并运行了它,它仍然声明了这一点。)我从“tickets2”开始运行我得到了结果!谢谢。;-)现在我需要仔细检查你的代码,这样我才能准确地知道它在做什么!:-)
# write a function which takes a date, searches the tickets table and     
returns the number of tickets that are open

nOpenTickets <- function(x){
nrow(filter(tickets,
x > mdy_hm(Date.Time.Opened) &
x < mdy_hm(Date.Time.Closed)))
}


#Add a column to the timeline with the number returned by the function 
(the number of open tickets on that date)

timeline <- mutate(timeline,ticketsOpen = nOpenTickets(tDates))
timeline


# my results: 
## A tibble: 10 × 2
#tDates ticketsOpen
#<date>       <int>
#1  2017-01-20           0
#2  2017-01-21           0
#3  2017-01-22           0
#4  2017-01-23           0
#5  2017-01-24           0
#6  2017-01-25           0
#7  2017-01-26           0
#8  2017-01-27           0
#9  2017-01-28           0
#10 2017-01-29           0
> dput(tickets)
structure(list(ID = 1:10, Date.Time.Opened = structure(c(1L, 
6L, 3L, 4L, 5L, 8L, 9L, 7L, 2L, 10L), .Label = c("1/19/17 11:51", 
"1/20/17 16:48", "1/20/17 17:07", "1/20/17 18:23", "1/20/17 8:54", 
"1/22/16 12:27", "1/23/17 11:22", "1/24/17 18:54", "1/25/17 11:33", 
"1/9/17 8:57"), class = "factor"), Date.Time.Closed = structure(c(8L, 
7L, 6L, 6L, 5L, 5L, 4L, 3L, 2L, 1L), .Label = c("1/25/17 13:46", 
"1/25/17 15:06", "1/25/17 16:31", "1/26/17 12:08", "1/26/17 12:09", 
"1/27/17 7:24", "1/30/17 13:36", "1/30/17 14:44"), class = "factor")),     
.Names = c("ID", 
"Date.Time.Opened", "Date.Time.Closed"), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
> dput(timeline)
structure(list(tDates = structure(c(17186, 17187, 17188, 17189, 
17190, 17191, 17192, 17193, 17194, 17195), class = "Date")), .Names =       
"tDates", class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))
#packages
library(lubridate)
library(tidyverse)

# load in data
tickets <- read.csv("tickets.csv",header=TRUE)
timeline <- read.csv("timeline.csv",header=TRUE)

#change from factor to date
timeline <- mutate(timeline,tDates = mdy(tDates))

# create new df that shows how many are open each day
tickets2 <- 
tickets %>% 
mutate_at(-1, mdy_hm) %>% 
mutate(int = interval(Date.Time.Opened, Date.Time.Closed));     timeline %>% 
rowwise() %>% 
mutate(n = sum(tDates %within% tickets2$int))