Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/395.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中过滤日期_R - Fatal编程技术网

在R中过滤日期

在R中过滤日期,r,R,是否有一种方法或功能,以观察数据的日期范围为条件,按相同ID对数据进行子集或过滤?我已经浏览了许多使用dplyr和lubridate和 Something similar maybe? DF %>% group_by(ID) %>% filter_if(for i %in% Date, between("Date 1 & Date 2 is at least 6 months")) 或 具体而言,如果在任何6个月的日期范围内至少有3个,则为子集观测值。使用队列_月

是否有一种方法或功能,以观察数据的日期范围为条件,按相同ID对数据进行子集或过滤?我已经浏览了许多使用dplyr和lubridate和

Something similar maybe?
DF %>% 
 group_by(ID) %>% 
  filter_if(for i %in% Date, between("Date 1 & Date 2 is at least 6 months"))

具体而言,如果在任何6个月的日期范围内至少有3个,则为子集观测值。使用队列_月可以(因为它是从日期列中提取的)

我的DF是:


str(DF)
类“tbl_df”、“tbl”和“data.frame”:
25 obs。共有8个变量:
$ID:chr“AbDu”“AbDu”“AbDu”
“阿布杜”。。。
$Reg:num 29179 32039 35151
38359 41509 ...
$Date:POSIXct,格式:“2017-08-
18" ...
$Year:num 2017
2017 ...
$Vol1:num 2.52.52.52.52.52.5
2.5 4.9 2.5 2.5 4.9 ...
$Vol2:num 2.52.52.52.52.52.5
2.5 4.9 2.5 2.5 4.9 ...
$VolT:num 10 20。。。
$COUNT_月:数字8 9 10 11 12 1 3 4
11 ...
DF
#一个tibble:25x8
ID注册日期年份1伏2伏
AbDu 29179 2017-08-18 00:00:00 2017 2.5 2.5 10
AbDu 32039 2017-09-15 00:00:00 2017 2.5 2.5 20
AbDu 35151 2017-10-13 00:00:00 2017 2.5 2.5 20
AbDu 38359 2017-11-10 00:00:00 2017 2.5 2.5 20
AbDu 41509 2017-12-08 00:00:00 2017 2.5 2.5 20
AbDu 44732 2018-01-08 00:00:00 2018 2.5 2.5 20
AbDu 47487 2018-01-31 00:00:00 2018 4.9 4.9.8
AbDu 52537 2018-03-14 00:00:00 2018 2.5 2.5 30
AbDu 57713 2018-05-23 00:00:00 2018 2.5 2.5 30
尝试以下解决方案:

library(tidyverse)
library(lubridate)

df %>%
  group_by(ID) %>%
  nest() %>%
  mutate(
    data_filter = map(
      data,
      ~arrange(.x, Date) %>%
        mutate(
          Date2 = lag(Date, 2),
          MDiff = (difftime(Date, Date2) / 30) %>% as.numeric()
        ) %>%
        filter(MDiff < 6)
    ),
    n_row = map_dbl(
      data_filter,
      nrow
    )
  ) %>%
  filter(n_row > 0) %>%
  select(ID, data_filter) %>%
  unnest() %>%
  select(-MDiff) %>%
  pmap_df(
    ~filter(df, ID == ..1 & Date <= ..2 & Date >= ..3)
  )
库(tidyverse)
图书馆(lubridate)
df%>%
分组依据(ID)%>%
嵌套()%>%
变异(
数据过滤器=映射(
数据,
~arrange(.x,Date)%%>%
变异(
日期2=滞后(日期2),
MDiff=(difftime(Date,Date2)/30)%>%as.numeric()
) %>%
过滤器(MDiff<6)
),
n_row=map_dbl(
数据过滤器,
nrow
)
) %>%
过滤器(n_行>0)%>%
选择(ID,数据过滤器)%>%
unest()%>%
选择(-MDiff)%>%
pmap_-df(
~filter(df,ID=..1和Date=..3)
)

我认为您正在寻找一种滚动窗口操作,您想知道在当前事件之前的
n
个月内发生了多少次。在这种情况下,请看
zoo
包,它提供了滚动功能。您能提供一个吗?
str(DF)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    
25 obs. of  8 variables:
$ ID          : chr  "AbDu" "AbDu" "AbDu" 
"AbDu" ...
$ Reg         : num  29179 32039 35151 
38359 41509 ...
$ Date        : POSIXct, format: "2017-08- 
18" ...
$ Year        : num  2017 2017 2017 2017 
2017 ...
$ Vol1        : num  2.5 2.5 2.5 2.5 2.5 
2.5 4.9 2.5 2.5 4.9 ...
$ Vol2        : num  2.5 2.5 2.5 2.5 2.5 
2.5 4.9 2.5 2.5 4.9 ...
$ VolT        : num  10 20 20 20 20 ...
$ Cohort_month: num  8 9 10 11 12 1 1 3 4 
11 ...

DF
# A tibble: 25 x 8
ID     Reg   Date                Year  Vol1  Vol2  VolT
<chr> <dbl> <dttm>              <dbl> <dbl> <dbl> <dbl>
AbDu  29179 2017-08-18 00:00:00  2017  2.5   2.5  10  
AbDu  32039 2017-09-15 00:00:00  2017   2.5   2.5  20  
AbDu  35151 2017-10-13 00:00:00  2017   2.5   2.5  20  
AbDu  38359 2017-11-10 00:00:00  2017   2.5   2.5  20  
AbDu  41509 2017-12-08 00:00:00  2017   2.5   2.5  20  
AbDu  44732 2018-01-08 00:00:00  2018   2.5   2.5  20  
AbDu  47487 2018-01-31 00:00:00  2018   4.9   4.9  9.8
AbDu  52537 2018-03-14 00:00:00  2018   2.5   2.5  30  
AbDu  57713 2018-05-23 00:00:00  2018   2.5   2.5  30  
library(tidyverse)
library(lubridate)

df %>%
  group_by(ID) %>%
  nest() %>%
  mutate(
    data_filter = map(
      data,
      ~arrange(.x, Date) %>%
        mutate(
          Date2 = lag(Date, 2),
          MDiff = (difftime(Date, Date2) / 30) %>% as.numeric()
        ) %>%
        filter(MDiff < 6)
    ),
    n_row = map_dbl(
      data_filter,
      nrow
    )
  ) %>%
  filter(n_row > 0) %>%
  select(ID, data_filter) %>%
  unnest() %>%
  select(-MDiff) %>%
  pmap_df(
    ~filter(df, ID == ..1 & Date <= ..2 & Date >= ..3)
  )