R或SQL中键内的循环id

R或SQL中键内的循环id,sql,r,Sql,R,假设我有以下数据: key id value Class duration Cond Start End ----- ---- -------- -------- --------- --------- ----------- ----------- 30 1 A A,B NA NA 2018-02-27 2

假设我有以下数据:

  key  id   value     Class        duration     Cond         Start        End
----- ---- --------   --------    ---------    ---------    -----------   -----------
   30  1    A         A,B          NA           NA          2018-02-27     2018-03-07     
   30  2    B         B            19           20          2018-02-27     2018-03-26

   40  1    C         C,D          NA           NA          2018-12-17     2018-12-25
   40  2    D         D            168          30          2018-12-17     2019-06-11

   50  1    A         A,C,D        NA           NA          2018-04-10     2018-06-21
   50  2    C         C,D          16           30          2018-04-10     2018-07-07
   50  3    D         D            28           20          2018-04-10     2018-08-04

   60  1    B         B,C,D        NA           NA          2016-05-13     2016-05-18
   60  2    C         C,D          49           20          2016-05-13     2016-07-06
   60  3    D         D            47           30          2016-05-13     2016-08-22

   70  1    A         A,C,D        NA           NA          2017-01-09     2017-11-01
   70  2    C         C,D          60            5          2017-01-09     2017-12-31
   70  3    D         D            17           28          2017-01-09     2018-01-17

   80  1    A         A,C,D        NA           NA          2019-09-18     2020-01-07
   80  2    C         C,D           2           20          2019-09-18     2020-01-09
   80  3    D         D             2           30          2019-09-18     2020-01-11

   90  1    A         A,B,C,D      NA           NA          2017-01-17     2017-02-15
   90  2    B         B,C,D        21           30          2017-01-17     2017-03-08
   90  3    C         C,D          23           20          2017-01-17     2017-03-31
   90  4    D         D           299           28          2017-01-17     2018-01-24
可以使用以下代码生成数据:

df <- as.data.frame(cbind(key = c(30, 30, 40, 40, 50, 50, 50, 60, 60, 60, 
                             70, 70, 70, 80, 80, 80, 90, 90, 90, 90), 
                      id = c(1, 2, 1, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4), 
                      value = c("A", "B", "C", "D", "A", "C", "D", "B", "C", "D", "A", "C", "D","A", "C", "D", 
                                "A", "B", "C", "D"),
                      Class = c("A,B", "B", "C,D", "D", "A,C,D", "C,D", "D", "B,C,D", "C,D", "D", "A,C,D", "C,D", "D",
                                "A,C,D", "C,D", "D", "A,B,C,D", "B,C,D", "C,D", "D"),
                      duration = c(NA, 19, NA, 168, NA, 16, 28, NA, 49, 47, 
                                   NA, 60, 17, NA, 2, 2, NA, 21, 23, 299),
                      Cond = c(NA, 20, NA, 30, NA, 30, 20, NA, 20, 30, 
                                   NA, 5, 28, NA, 20, 30, NA, 30, 20, 28),
                      Start = c("2018-02-27", "2018-02-27", "2018-12-17", "2018-12-17", "2018-04-10", "2018-04-10", "2018-04-10", 
                                "2016-05-13", "2016-05-13", "2016-05-13", "2017-01-09", "2017-01-09", "2017-01-09",
                                "2020-09-08", "2019-09-18", "2019-09-18", "2017-01-17", "2017-01-17", "2017-01-17", "2017-01-17"),
                      End =   c("2018-03-07", "2018-03-26", "2018-12-25", "2019-06-11", "2018-06-21", "2018-07-07", "2018-08-04", 
                                "2016-05-18", "2016-07-06", "2016-08-22", "2017-11-01", "2017-12-31", "2018-01-17",
                                "2020-01-07", "2020-01-09", "2020-01-11", "2017-02-15", "2017-03-08", "2017-03-31", "2018-01-24")
                      ))
基于此逻辑,然后生成此新数据:

  key  id   value     Class        duration     Cond         Start        End
----- ---- --------   --------    ---------    ---------    -----------   -----------
   30  1    A         A,B          NA           NA          2018-02-27     2018-03-26

   40  1    C         C,D          NA           NA          2018-12-17     2018-12-25
   40  2    D         D            168          30          2018-12-26     2019-06-11

   50  1    A         A,C,D        NA           NA          2018-04-10     2018-07-07
   50  3    D         D            28           20          2018-07-08     2018-08-04

   60  1    B         B,C,D        NA           NA          2016-05-13     2016-05-18
   60  2    C         C,D          49           20          2016-05-19     2016-07-06
   60  3    D         D            47           30          2016-07-07     2016-08-22

   70  1    A         A,C,D        NA           NA          2017-01-09     2017-11-01
   70  2    C         C,D          60            5          2017-11-02     2018-01-17

   80  1    A         A,C,D        NA           NA          2019-09-18     2020-01-11

   90  1    A         A,B,C,D      NA           NA          2017-01-17     2017-03-08
   90  3    C         C,D          23           20          2017-03-09     2017-03-31
   90  4    D         D           299           28          2017-04-01     2018-01-24
您可以尝试:

library(dplyr)

df %>%
  mutate(across(duration:Cond, ~ as.integer(as.character(.))),
         across(Start:End, ~ as.Date(as.character(.)))) %>%
  group_by(key, idx = cumsum((is.na(duration) & is.na(Cond)) | duration >= Cond)) %>%
  summarise(across(id:Start, first), End = last(End)) %>% 
  mutate(Start = case_when(row_number() == 1 ~ Start, TRUE ~ lag(End) + 1L)) %>%
  ungroup() %>%
  select(-idx)
输出:

# A tibble: 14 x 8
   key   id    value Class   duration  Cond Start      End       
   <fct> <fct> <fct> <fct>      <int> <int> <date>     <date>    
 1 30    1     A     A,B           NA    NA 2018-02-27 2018-03-26
 2 40    1     C     C,D           NA    NA 2018-12-17 2018-12-25
 3 40    2     D     D            168    30 2018-12-26 2019-06-11
 4 50    1     A     A,C,D         NA    NA 2018-04-10 2018-07-07
 5 50    3     D     D             28    20 2018-07-08 2018-08-04
 6 60    1     B     B,C,D         NA    NA 2016-05-13 2016-05-18
 7 60    2     C     C,D           49    20 2016-05-19 2016-07-06
 8 60    3     D     D             47    30 2016-07-07 2016-08-22
 9 70    1     A     A,C,D         NA    NA 2017-01-09 2017-11-01
10 70    2     C     C,D           60     5 2017-11-02 2018-01-17
11 80    1     A     A,C,D         NA    NA 2020-09-08 2020-01-11
12 90    1     A     A,B,C,D       NA    NA 2017-01-17 2017-03-08
13 90    3     C     C,D           23    20 2017-03-09 2017-03-31
14 90    4     D     D            299    28 2017-04-01 2018-01-24
#一个tible:14 x 8
密钥id值类持续时间秒开始结束
1301A,B NA NA 2018-02-27 2018-03-26
2 40 1 C,D NA NA 2018-12-17 2018-12-25
3 40 2 D 168 30 2018-12-26 2019-06-11
4 50 1 A、C、D NA 2018-04-10 2018-07-07
5 50 3 D 28 20 2018-07-08 2018-08-04
6 60 1 B,C,D NA 2016-05-13 2016-05-18
7 60 2 C,D 49 20 2016-05-19 2016-07-06
8 60 3 D 47 30 2016-07-07 2016-08-22
9 70 1 A、C、D NA NA 2017-01-09 2017-11-01
10 70 2 C,D 60 5 2017-11-02 2018-01-17
11 80 1 A、C、D NA 2020-09-08 2020-01-11
12 901 A、B、C、D NA NA 2017-01-17 2017-03-08
13 90 3 C,D 23 20 2017-03-09 2017-03-31
14 90 4 D 299 28 2017-04-01 2018-01-24

但是,请注意,对于
90,还有一行-如
23>20
。如果这不正确,您需要提供一些额外的解释。

谢谢!你是对的,最后一个是我这边的错误。这个解决方案太棒了。非常感谢!!!
# A tibble: 14 x 8
   key   id    value Class   duration  Cond Start      End       
   <fct> <fct> <fct> <fct>      <int> <int> <date>     <date>    
 1 30    1     A     A,B           NA    NA 2018-02-27 2018-03-26
 2 40    1     C     C,D           NA    NA 2018-12-17 2018-12-25
 3 40    2     D     D            168    30 2018-12-26 2019-06-11
 4 50    1     A     A,C,D         NA    NA 2018-04-10 2018-07-07
 5 50    3     D     D             28    20 2018-07-08 2018-08-04
 6 60    1     B     B,C,D         NA    NA 2016-05-13 2016-05-18
 7 60    2     C     C,D           49    20 2016-05-19 2016-07-06
 8 60    3     D     D             47    30 2016-07-07 2016-08-22
 9 70    1     A     A,C,D         NA    NA 2017-01-09 2017-11-01
10 70    2     C     C,D           60     5 2017-11-02 2018-01-17
11 80    1     A     A,C,D         NA    NA 2020-09-08 2020-01-11
12 90    1     A     A,B,C,D       NA    NA 2017-01-17 2017-03-08
13 90    3     C     C,D           23    20 2017-03-09 2017-03-31
14 90    4     D     D            299    28 2017-04-01 2018-01-24