R:如何过滤一个表中的日期、获取计数并返回另一个表中的每一行?使用dplyr和lubridate是否可能?
第一个问题,所以如果我在提供全面信息时遗漏了什么,请告诉我 背景:我有两张桌子。一个是一张技术罚单表,以及它们何时被打开和何时被解决(关闭)。我想创建一个时间表,计算每天有多少张票打开 以下是我迄今为止所做的工作:R:如何过滤一个表中的日期、获取计数并返回另一个表中的每一行?使用dplyr和lubridate是否可能?,r,dplyr,lubridate,R,Dplyr,Lubridate,第一个问题,所以如果我在提供全面信息时遗漏了什么,请告诉我 背景:我有两张桌子。一个是一张技术罚单表,以及它们何时被打开和何时被解决(关闭)。我想创建一个时间表,计算每天有多少张票打开 以下是我迄今为止所做的工作: # load in data tickets <- read.csv("tickets.csv",header=TRUE) #packages library(tidyr) library(dplyr) library(lubridate) tickets <- t
# load in data
tickets <- read.csv("tickets.csv",header=TRUE)
#packages
library(tidyr)
library(dplyr)
library(lubridate)
tickets <- tbl_df(tickets)
tickets
## A tibble: 10 × 3
#ID Date.Time.Opened Date.Time.Closed
#<int> <fctr> <fctr>
#1 1 1/19/17 11:51 1/30/17 14:44
#2 2 1/22/16 12:27 1/30/17 13:36
#3 3 1/20/17 17:07 1/27/17 7:24
#4 4 1/20/17 18:23 1/27/17 7:24
#5 5 1/20/17 8:54 1/26/17 12:09
#6 6 1/24/17 18:54 1/26/17 12:09
#7 7 1/25/17 11:33 1/26/17 12:08
#8 8 1/23/17 11:22 1/25/17 16:31
#9 9 1/20/17 16:48 1/25/17 15:06
#10 10 1/9/17 8:57 1/25/17 13:46
#dates are currently factors; change to dates.
tickets2 <-
tickets %>%
mutate(Date.Time.Opened = mdy_hm(Date.Time.Opened)) %>%
mutate(Date.Time.Closed = mdy_hm(Date.Time.Closed))
head(tickets2)
# A tibble: 6 × 3
#ID Date.Time.Opened Date.Time.Closed
#<int> <dttm> <dttm>
#1 1 2017-01-19 11:51:00 2017-01-30 14:44:00
#2 2 2016-01-22 12:27:00 2017-01-30 13:36:00
#3 3 2017-01-20 17:07:00 2017-01-27 07:24:00
#4 4 2017-01-20 18:23:00 2017-01-27 07:24:00
#5 5 2017-01-20 08:54:00 2017-01-26 12:09:00
#6 6 2017-01-24 18:54:00 2017-01-26 12:09:00
以下是我在设置数据后写的内容:
# write a function which takes a date, searches the tickets table and
returns the number of tickets that are open
nOpenTickets <- function(x){
nrow(filter(tickets,
x > mdy_hm(Date.Time.Opened) &
x < mdy_hm(Date.Time.Closed)))
}
#Add a column to the timeline with the number returned by the function
(the number of open tickets on that date)
timeline <- mutate(timeline,ticketsOpen = nOpenTickets(tDates))
timeline
# my results:
## A tibble: 10 × 2
#tDates ticketsOpen
#<date> <int>
#1 2017-01-20 0
#2 2017-01-21 0
#3 2017-01-22 0
#4 2017-01-23 0
#5 2017-01-24 0
#6 2017-01-25 0
#7 2017-01-26 0
#8 2017-01-27 0
#9 2017-01-28 0
#10 2017-01-29 0
时间线:
> dput(timeline)
structure(list(tDates = structure(c(17186, 17187, 17188, 17189,
17190, 17191, 17192, 17193, 17194, 17195), class = "Date")), .Names =
"tDates", class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
================================================================
更新
================================================================
以下是最终奏效的方法(谢谢,alistaire!)
#包
图书馆(lubridate)
图书馆(tidyverse)
#载入数据
门票%
变异(n=总和(t日期%在%tickets2$int内))
这里有一种tidyverse方法,但它返回的数字与您发布的略有不同:库(tidyverse);票证%>%在(-1,as.Date,%m/%d/%y')%%>%变化(tDates=map2(Date.Time.Opened,Date.Time.Closed,seq,by='day'))%%>%unest(tDates)%%>%count(tDates)%%>%right\u加入(时间线)
。如果你说data.table语法的话,我想也有一个很好的方法来处理它。@alistaire,谢谢,我来试试这个!当我试图找出这篇文章中Excel中的数字时,我的数字很可能会有点偏离。我一试用就发表评论!更直接的替代方法是使用lubridate::`%within%`
:tickets2%mutate\u at(-1,mdy\u hm)%>%mutate(int=interval(Date.Time.Opened,Date.Time.Closed));timeline%%>%rowwise()%%>%mutate(n=sum(tDates%在%tickets2$int内))
更新:我收到一个错误:错误:应该命名其他参数
它可能来自第行的mutate\u?(我不太熟悉的变种。我会试试你的直接选择!啊!出于某种原因,lubridate::%within%
:正在触发一个错误:在lubridate中出错…..找不到对象“lubridate”
(即使我加载并运行了它,它仍然声明了这一点。)我从“tickets2”开始运行我得到了结果!谢谢。;-)现在我需要仔细检查你的代码,这样我才能准确地知道它在做什么!:-)这里有一个tidyverse方法,但它返回的数字与您发布的略有不同:library(tidyverse);票证%>%在(-1,as.Date,%m/%d/%y')%%>%变化(tDates=map2(Date.Time.Opened,Date.Time.Closed,seq,by='day'))%%>%unest(tDates)%%>%count(tDates)%%>%right\u加入(时间线)
。如果你说data.table语法的话,我想也有一个很好的方法来处理它。@alistaire,谢谢,我来试试这个!当我试图找出这篇文章中Excel中的数字时,我的数字很可能会有点偏离。我一试用就发表评论!更直接的替代方法是使用lubridate::`%within%`
:tickets2%mutate\u at(-1,mdy\u hm)%>%mutate(int=interval(Date.Time.Opened,Date.Time.Closed));timeline%%>%rowwise()%%>%mutate(n=sum(tDates%在%tickets2$int内))
更新:我收到一个错误:错误:应该命名其他参数
它可能来自第行的mutate\u?(我不太熟悉的变种。我会试试你的直接选择!啊!出于某种原因,lubridate::%within%
:正在触发一个错误:在lubridate中出错…..找不到对象“lubridate”
(即使我加载并运行了它,它仍然声明了这一点。)我从“tickets2”开始运行我得到了结果!谢谢。;-)现在我需要仔细检查你的代码,这样我才能准确地知道它在做什么!:-)
# write a function which takes a date, searches the tickets table and
returns the number of tickets that are open
nOpenTickets <- function(x){
nrow(filter(tickets,
x > mdy_hm(Date.Time.Opened) &
x < mdy_hm(Date.Time.Closed)))
}
#Add a column to the timeline with the number returned by the function
(the number of open tickets on that date)
timeline <- mutate(timeline,ticketsOpen = nOpenTickets(tDates))
timeline
# my results:
## A tibble: 10 × 2
#tDates ticketsOpen
#<date> <int>
#1 2017-01-20 0
#2 2017-01-21 0
#3 2017-01-22 0
#4 2017-01-23 0
#5 2017-01-24 0
#6 2017-01-25 0
#7 2017-01-26 0
#8 2017-01-27 0
#9 2017-01-28 0
#10 2017-01-29 0
> dput(tickets)
structure(list(ID = 1:10, Date.Time.Opened = structure(c(1L,
6L, 3L, 4L, 5L, 8L, 9L, 7L, 2L, 10L), .Label = c("1/19/17 11:51",
"1/20/17 16:48", "1/20/17 17:07", "1/20/17 18:23", "1/20/17 8:54",
"1/22/16 12:27", "1/23/17 11:22", "1/24/17 18:54", "1/25/17 11:33",
"1/9/17 8:57"), class = "factor"), Date.Time.Closed = structure(c(8L,
7L, 6L, 6L, 5L, 5L, 4L, 3L, 2L, 1L), .Label = c("1/25/17 13:46",
"1/25/17 15:06", "1/25/17 16:31", "1/26/17 12:08", "1/26/17 12:09",
"1/27/17 7:24", "1/30/17 13:36", "1/30/17 14:44"), class = "factor")),
.Names = c("ID",
"Date.Time.Opened", "Date.Time.Closed"), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
> dput(timeline)
structure(list(tDates = structure(c(17186, 17187, 17188, 17189,
17190, 17191, 17192, 17193, 17194, 17195), class = "Date")), .Names =
"tDates", class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
#packages
library(lubridate)
library(tidyverse)
# load in data
tickets <- read.csv("tickets.csv",header=TRUE)
timeline <- read.csv("timeline.csv",header=TRUE)
#change from factor to date
timeline <- mutate(timeline,tDates = mdy(tDates))
# create new df that shows how many are open each day
tickets2 <-
tickets %>%
mutate_at(-1, mdy_hm) %>%
mutate(int = interval(Date.Time.Opened, Date.Time.Closed)); timeline %>%
rowwise() %>%
mutate(n = sum(tDates %within% tickets2$int))