Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/tfs/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
创建条件和(基于日期)作为R中数据帧的新列_R_Dplyr_Tidyverse - Fatal编程技术网

创建条件和(基于日期)作为R中数据帧的新列

创建条件和(基于日期)作为R中数据帧的新列,r,dplyr,tidyverse,R,Dplyr,Tidyverse,我正在尝试在R中进行一些功能工程。假设我有以下数据帧: events = data.frame(patient = c("A","A","A","A","B","B","B"), date = as.Date(c("2017-12-15", "2018-01-09", "2018-01-31", "2018-02-05", "2017-12-12", "2017-12-12",

我正在尝试在R中进行一些功能工程。假设我有以下数据帧:

events = data.frame(patient = c("A","A","A","A","B","B","B"), 
                    date = as.Date(c("2017-12-15", "2018-01-09", "2018-01-31", "2018-02-05", 
                                     "2017-12-12", "2017-12-12", "2018-02-01")), 
                    type = c("AnE","Inpatient","Inpatient","Inpatient","AnE","AnE",
                             "Inpatient"))`
现在我想添加一个列,其中包含同一患者在过去30天内发生的“住院”事件的总和


有没有一种直接的方法(不涉及for循环)?

给定您的数据集,我将创建一些句柄变量并运行data.table方法

首先,我按患者添加上次月经的日期。然后,我计算“住院患者”在数据集中出现的次数,按患者和最后一个期间的日期计算,这些日期比当前日期早30天

library(data.table)
events = data.table(patient = c("A","A","A","A","B","B","B"), 
                    date = as.Date(c("2017-12-15", "2018-01-09", "2018-01-31", "2018-02-05", 
                                     "2017-12-12", "2017-12-12", "2018-02-01")), 
                    type = c("AnE","Inpatient","Inpatient","Inpatient","AnE","AnE",
                             "Inpatient"))
events = events[order(date), .SD, by = patient]
events[, date_t1 := lag(date), by = patient]
events[, timesInpatient := cumsum(type=="Inpatient"), by = .(patient, date_t1 > date - 30)]
结果是这样的

   patient       date      type      date1 timesInpatient
1:       B 2017-12-12       AnE       <NA>              0
2:       B 2017-12-12       AnE 2017-12-12              0
3:       B 2018-02-01 Inpatient 2017-12-12              1
4:       A 2017-12-15       AnE       <NA>              0
5:       A 2018-01-09 Inpatient 2017-12-15              1
6:       A 2018-01-31 Inpatient 2018-01-09              2
7:       A 2018-02-05 Inpatient 2018-01-31              3
患者日期类型日期1时间住院患者
1:B 2017-12-12 AnE 0
2:B 2017-12-12 AnE 2017-12-12 0
3:B 2018-02-01住院患者2017-12-12 1
4:A 2017年12月15日AnE 0
5:A 2018-01-09住院患者2017-12-15 1
6:A 2018-01-31住院患者2018-01-09 2
7:A 2018-02-05住院患者2018-01-31 3

这可能比
data.table
方法略显简洁,但您可以从
lubridate
包中使用
span
%内%

以下是它们如何工作的示例:

# creating a span object and a vector of dates
span <- lubridate::interval("2018-01-01", "2018-01-30")
dates <- as.Date(c("2018-01-01", "2018-01-30", "2018-01-03", "2018-02-01"))
dates %within% span
[1]  TRUE  TRUE  TRUE FALSE
# adding a vector indicating inpatient visits
inpatient_visit <- c(TRUE, FALSE, TRUE, FALSE)
# counting dates are both fall within the span and are inpatient visits
sum(dates %within% span & visit)
[1] 2

输出应该是什么样子?您预期的结果是什么?它应该是这样的:events$SumpreVinjective=c(0,0,1,2,0,0,0)谢谢,这看起来不错。我没有考虑过使用data.table.噢,只是一件小事:我实际上只想要以前事件的总和。我假设最简单的方法是在最后一行添加
date\u t1
library(dplyr)
library(lubridate)

events = data.frame(patient = c("A","A","A","A","B","B","B"), 
                    date = as.Date(c("2017-12-15", "2018-01-09", "2018-01-31", "2018-02-05", 
                                     "2017-12-12", "2017-12-12", "2018-02-01")), 
                    type = c("AnE","Inpatient","Inpatient","Inpatient","AnE","AnE",
                             "Inpatient"))

count_visits <- function(df) {
  res <- map(df$span, ~ sum(df$date %within% .x & df$inpatient))
  df$count <- res
  return(df)
}

events <- events %>%
  mutate(inpatient = type == "Inpatient",
         span = interval(date - days(30), date)) %>%
  split(.$patient) %>%
  map_df(count_visits) %>%
  select(-inpatient, -span) %>%
  arrange(date)

events
  patient       date      type count
1       B 2017-12-12       AnE     0
2       B 2017-12-12       AnE     0
3       A 2017-12-15       AnE     0
4       A 2018-01-09 Inpatient     1
5       A 2018-01-31 Inpatient     2
6       B 2018-02-01 Inpatient     1
7       A 2018-02-05 Inpatient     3