R 更新数据帧';s列基于其他列

R 更新数据帧';s列基于其他列,r,dplyr,R,Dplyr,我有一个包含个人舞台的数据框,如下所示(这只是一个非常大的舞台的样本): 右边的5列是一个人的舞台,但还没有包含所有信息。我需要在前两列中包含信息,其中数字以年为单位,如下所示: 如果第1列中的值小于一年,则第一个年龄段应为“已死亡”,接下来的所有列(此人仍然死亡…)也应为“已死亡”;如果该值介于1和2之间,则第二个阶段应为“已死亡”,依此类推 如果第2列中的值小于一年,则SecondYStage应为“EndOfEvents”;如果该值介于1和2之间,则它应该是“EndOfEvents”,依此

我有一个包含个人舞台的数据框,如下所示(这只是一个非常大的舞台的样本):

右边的5列是一个人的舞台,但还没有包含所有信息。我需要在前两列中包含信息,其中数字以年为单位,如下所示:

  • 如果第1列中的值小于一年,则第一个年龄段应为“已死亡”,接下来的所有列(此人仍然死亡…)也应为“已死亡”;如果该值介于1和2之间,则第二个阶段应为“已死亡”,依此类推

  • 如果第2列中的值小于一年,则SecondYStage应为“EndOfEvents”;如果该值介于1和2之间,则它应该是“EndOfEvents”,依此类推

因此,这种情况下的预期输出应为:

df_updated = structure(list(DeceasedDate = c(0.283219178082192, 
1.12678843226788, 
2.02865296803653, 0.892465753424658, NA, 0.88013698630137, NA
), LastClinicalEventMonthEnd = c(0.244862981988838, 1.03637744165398, 
10.9464611555048, 0.763698598427194, 3.35011412354135, 0.677397228564181, 
3.83687211440893), FirstYStage = c("Deceased", "2", "2", "Deceased", 
"2", "Deceased", "3.1"), SecondYStage = c("Deceased", "Deceased", 
"2", "Deceased", "2", "Deceased", "3.1"), ThirdYStage = c("Deceased", 
"Deceased", "Deceased", "Deceased", "2", "Deceased", "3.1"), 
FourthYStage = c("Deceased", "Deceased", "Deceased", "Deceased", 
"2", "Deceased", "3.1"), FifthYStage = c("Deceased", "Deceased", 
"Deceased", "Deceased", "LastEvent", "Deceased", "LastEvent"
)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"
))
重要的一点是,应该优先考虑“死亡”,换句话说,如果有冲突,一方面有数字,“死亡”与之相矛盾,我们应该选择死亡


我如何以最有效的方式完成这项工作?目前我正在做if,但我认为这不是最好的行动方案

我会这样做:

  • 从宽格式重塑为长格式
  • 从列名计算年份
  • 有选择地更新
  • 重塑为宽格式
  • 数据表 由于我在
    data.table
    方面比在
    dplyr
    方面更熟练,因此这里介绍了在
    data.table
    语法中实现的方法。(很抱歉,如果时间允许,我将添加一个
    dplyr
    解决方案。)

    dplyr/tidyr 正如所承诺的,这里也是相同方法的
    dplyr
    /
    tidyr
    实现:

    library(tidyr)
    library(dplyr)
    df %>% 
      mutate(rn = row_number()) %>% 
      gather(key, val, ends_with("Stage"), factor_key = TRUE) %>% 
      mutate(year = as.integer(key)) %>% 
      mutate(val = if_else(!is.na(DeceasedDate) & floor(DeceasedDate) < year, "Deceased", val)) %>% 
      mutate(val = if_else(is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < year, "EndOfEvents", val)) %>% 
      select(-year) %>% 
      spread(key, val) %>% 
      arrange(rn) 
    
    或者不创建
    年份
    列:

    df %>% 
      mutate(rn = row_number()) %>% 
      gather(key, val, ends_with("Stage"), factor_key = TRUE) %>% 
      mutate(val = if_else(!is.na(DeceasedDate) & floor(DeceasedDate) < as.integer(key), 
                           "Deceased", val)) %>% 
      mutate(val = if_else(is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < as.integer(key), 
                           "EndOfEvents", val)) %>% 
      spread(key, val) %>% 
      arrange(rn) 
    
    df%>%
    变异(rn=行数())%>%
    聚集(键、值、以(“阶段”)结束),因子键=真)%>%
    mutate(val=if_else(!is.na(decesedDate))和floor(decesedDate)%
    变异(val=if_else(is.na(decesedDate))和地板(LastClinicalEventMonthEnd)+1%
    排列(键,值)%>%
    安排(注册护士)
    
    你能提供一个你正在做的事情的可复制的例子吗?@AdamWheeler举个例子,
    df$FirstYStage=if_else(df$deceseddate
    
       rn DeceasedDate LastClinicalEventMonthEnd FirstYStage SecondYStage ThirdYStage FourthYStage FifthYStage
    1:  1    0.2832192                 0.2448630    Deceased     Deceased    Deceased     Deceased    Deceased
    2:  2    1.1267884                 1.0363774           2     Deceased    Deceased     Deceased    Deceased
    3:  3    2.0286530                10.9464612           2            2    Deceased     Deceased    Deceased
    4:  4    0.8924658                 0.7636986    Deceased     Deceased    Deceased     Deceased    Deceased
    5:  5           NA                 3.3501141           2            2           2            2 EndOfEvents
    6:  6    0.8801370                 0.6773972    Deceased     Deceased    Deceased     Deceased    Deceased
    7:  7           NA                 3.8368721         3.1          3.1         3.1          3.1 EndOfEvents
    
    library(tidyr)
    library(dplyr)
    df %>% 
      mutate(rn = row_number()) %>% 
      gather(key, val, ends_with("Stage"), factor_key = TRUE) %>% 
      mutate(year = as.integer(key)) %>% 
      mutate(val = if_else(!is.na(DeceasedDate) & floor(DeceasedDate) < year, "Deceased", val)) %>% 
      mutate(val = if_else(is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < year, "EndOfEvents", val)) %>% 
      select(-year) %>% 
      spread(key, val) %>% 
      arrange(rn) 
    
      DeceasedDate LastClinicalEventMonthEnd rn FirstYStage SecondYStage ThirdYStage FourthYStage FifthYStage
    1    0.2832192                 0.2448630  1    Deceased     Deceased    Deceased     Deceased    Deceased
    2    1.1267884                 1.0363774  2           2     Deceased    Deceased     Deceased    Deceased
    3    2.0286530                10.9464612  3           2            2    Deceased     Deceased    Deceased
    4    0.8924658                 0.7636986  4    Deceased     Deceased    Deceased     Deceased    Deceased
    5           NA                 3.3501141  5           2            2           2            2 EndOfEvents
    6    0.8801370                 0.6773972  6    Deceased     Deceased    Deceased     Deceased    Deceased
    7           NA                 3.8368721  7         3.1          3.1         3.1          3.1 EndOfEvents
    
    df %>% 
      mutate(rn = row_number()) %>% 
      gather(key, val, ends_with("Stage"), factor_key = TRUE) %>% 
      mutate(val = if_else(!is.na(DeceasedDate) & floor(DeceasedDate) < as.integer(key), 
                           "Deceased", val)) %>% 
      mutate(val = if_else(is.na(DeceasedDate) & floor(LastClinicalEventMonthEnd) + 1 < as.integer(key), 
                           "EndOfEvents", val)) %>% 
      spread(key, val) %>% 
      arrange(rn)