Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:根据开始和结束日期展开行,并计算两天之间的小时数_R - Fatal编程技术网

R:根据开始和结束日期展开行,并计算两天之间的小时数

R:根据开始和结束日期展开行,并计算两天之间的小时数,r,R,我的问题是: 我有一个关于住院、出院和住院天数的数据集。看起来是这样的: ID Admission Discharge Stay_in_days 1 2020-08-20 15:25:03 2020-08-21 21:09:34 1.239 2 2020-10-04 21:53:43 2020-10-09 11:02:57 4.548 ... 到目前为止,日期是POSIXct格式的 我的目标是: ID Date

我的问题是:

我有一个关于住院、出院和住院天数的数据集。看起来是这样的:

ID Admission           Discharge             Stay_in_days 
1    2020-08-20 15:25:03 2020-08-21 21:09:34 1.239
2    2020-10-04 21:53:43 2020-10-09 11:02:57 4.548
... 
到目前为止,日期是POSIXct格式的

我的目标是:

ID   Date                 Stay_in_days 
1    2020-08-20 15:25:03  0.357 
1    2020-08-21 21:09:49  1.239
2    2020-10-04 21:53:43  0.087
2    2020-10-05 00:00:00  1.087
2    2020-10-06 00:00:00  2.087
2    2020-10-07 00:00:00  3.087
2    2020-10-08 00:00:00  4.087
2    2020-10-09 11:02:57  4.548
...
到目前为止我所做的:

M <- Map(seq, patients$Admission, patients$Discharge, by = "day")
patients2 <- data.frame(
  ID = rep.int(patients$ID, vapply(M, length, 1L)), 
  Date = do.call(c, M)
) 

patients <- patients %>%
mutate(
 Date2=as.Date(Date, format = "%Y-%m-%d"),
 Dat2=Date2+1,
 Diff=difftime(Date2, Date, units = "days")
)


奇怪的是,它增加了两个小时的入院日期,但计算出了正确的停留时间。有人能解释一下吗

以下是一些数据:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), Admission = structure(c(1597937103.872, 
1598717768.704, 1599060521.984, 1599758087.168, 1599815496.704, 
1600702198.784, 1600719631.36, 1601065923.584, 1601119400.96, 
1601215476.736, 1601236710.4, 1601416934.4, 1601499640.832, 1601545647.104, 
1601587328, 1601644868.608, 1601741206.528, 1601848423.424, 1601901245.44, 
1601913828.352), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Discharge = structure(c(1598044189.696, 1598897337.344, 1599144670.208, 
    1599845118.976, 1599842366.464, 1602733683.712, 1603372135.424, 
    1601125168.128, 1601314173.952, 1605193905.152, 1602190259.2, 
    1601560720.384, 1601737143.296, 1602705634.304, 1602410460.16, 
    1602698425.344, 1601770566.656, 1602241377.28, 1602780476.416, 
    1602612048.896), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    Stay_in_days = c(1.239, 2.078, 0.974, 1.007, 0.311, 23.513, 
    30.7, 0.686, 2.254, 46.047, 11.036, 1.664, 2.749, 13.426, 
    9.527, 12.194, 0.34, 4.548, 10.176, 8.081)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))


提前感谢您的帮助

您可以从第三天开始使用pivot_更长的时间来实现这一点。 编辑:带注释:

df1 <- df %>% 
  select(ID = ID, date1 = Admission, date2 = Discharge, Stay_in_days) %>% # prepare for pivoting
  pivot_longer(
    cols = starts_with("date"),
    names_to = "Date1",
    values_to = "Date",
  ) %>% # pivot to longformat
  select(-Date1) %>% # remove temporary Date1
  relocate(Stay_in_days, .after = Date) %>% # change column order
  group_by(ID) %>%
  mutate(idgroup = rep(row_number(), each=1:2, length.out = n())) %>% # id for admission = 1 and for discharge id = 2
  mutate(Stay_in_days = replace(Stay_in_days, row_number() == 1, 0)) %>%  # set Admission to zero
  ungroup() 


虽然有点粗糙,但会有用的

library(tidyverse)
library(lubridate)

df %>% 
  pivot_longer(cols = -c(ID, Stay_in_days), names_to = "Event", values_to = "DATE") %>%
  group_by(ID) %>%
  mutate(dummy = case_when(Event == "Admission" ~ 0,
                           Event == "Discharge" ~ max(floor(Stay_in_days),1))) %>%
  complete(dummy = seq(min(dummy), max(dummy), 1)) %>%
  mutate(Event = ifelse(is.na(Event), "Dummy", Event),
         DATE = if_else(is.na(DATE), first(DATE)+dummy*24*60*60, DATE),
         Stay_in_days = case_when(Event == "Admission" ~ as.numeric(difftime(ceiling_date(DATE, "day"), DATE, units = "days")),
                                   Event == "Discharge" ~ Stay_in_days,
                                   TRUE ~ dummy + as.numeric(difftime(ceiling_date(first(DATE), "day"), first(DATE), units = "days")))) %>%
  select(ID, DATE, Stay_in_days)

# A tibble: 199 x 3
# Groups:   ID [20]
      ID DATE                Stay_in_days
   <dbl> <dttm>                     <dbl>
 1     1 2020-08-20 15:25:03        0.358
 2     1 2020-08-21 21:09:49        1.24 
 3     2 2020-08-29 16:16:08        0.322
 4     2 2020-08-30 16:16:08        1.32 
 5     2 2020-08-31 18:08:57        2.08 
 6     3 2020-09-02 15:28:41        0.355
 7     3 2020-09-03 14:51:10        0.974
 8     4 2020-09-10 17:14:47        0.281
 9     4 2020-09-11 17:25:18        1.01 
10     5 2020-09-11 09:11:36        0.617
# ... with 189 more rows
解释每个ID中的第一个日期的逻辑,停留天数给出了从入院日期到随后24小时的持续时间。对于中间日期,它只会将上一个值加1。对于出院日期,它保留旋转前计算的停留值。希望这是你的追求


代码解释在旋转更长时间后,我使用一个伪列来创建中间日期时间对象。在此之后,我只是按照上面所述对生成输出的列进行变异。

非常感谢,@TarJae,这非常有用。然而,我需要入院和出院之间的每一个日期,以及相应的停留时间,请参见我的《我的目标》中的表格。你对如何解决这个问题有什么想法吗?请看我建议的答案,它考虑了需求。然而,计算停留时间花了很长时间。感谢AnilGoyal确实有点粗糙,但我没有更好的解决方案。我将尝试简化代码。亲爱的@AnilGoyal,再次感谢您在这方面的帮助。我一直在使用代码,现在意识到一个主要问题:如果出院时间早于入院时间,那么代码会忽略患者住院的最后一整天。我希望你明白我的意思-有没有一种方法可以轻松解决这个问题?让我看看。顺便问一下,你能举一个这种类型的例子吗?
library(tidyverse)
library(lubridate)

df %>% 
  pivot_longer(cols = -c(ID, Stay_in_days), names_to = "Event", values_to = "DATE") %>%
  group_by(ID) %>%
  mutate(dummy = case_when(Event == "Admission" ~ 0,
                           Event == "Discharge" ~ max(floor(Stay_in_days),1))) %>%
  complete(dummy = seq(min(dummy), max(dummy), 1)) %>%
  mutate(Event = ifelse(is.na(Event), "Dummy", Event),
         DATE = if_else(is.na(DATE), first(DATE)+dummy*24*60*60, DATE),
         Stay_in_days = case_when(Event == "Admission" ~ as.numeric(difftime(ceiling_date(DATE, "day"), DATE, units = "days")),
                                   Event == "Discharge" ~ Stay_in_days,
                                   TRUE ~ dummy + as.numeric(difftime(ceiling_date(first(DATE), "day"), first(DATE), units = "days")))) %>%
  select(ID, DATE, Stay_in_days)

# A tibble: 199 x 3
# Groups:   ID [20]
      ID DATE                Stay_in_days
   <dbl> <dttm>                     <dbl>
 1     1 2020-08-20 15:25:03        0.358
 2     1 2020-08-21 21:09:49        1.24 
 3     2 2020-08-29 16:16:08        0.322
 4     2 2020-08-30 16:16:08        1.32 
 5     2 2020-08-31 18:08:57        2.08 
 6     3 2020-09-02 15:28:41        0.355
 7     3 2020-09-03 14:51:10        0.974
 8     4 2020-09-10 17:14:47        0.281
 9     4 2020-09-11 17:25:18        1.01 
10     5 2020-09-11 09:11:36        0.617
# ... with 189 more rows