R 在以下情况下跳过NA使用Case\u_R_Dplyr

R 在以下情况下跳过NA使用Case\u

R 在以下情况下跳过NA使用Case\u,r,dplyr,R,Dplyr,如果我的代码对某一列使用arrange，比如col1，但是如果该行没有该列的可用数据，那么我希望它使用col2，如果col2不可用，那么我希望它使用col3，以此类推，直到col6 因此，目前： df <- data.frame(col1 = c("NA", "1999-07-01", "NA"), col2 = c("NA", "09-22-2011", "01-12-2009"), col3 = c("04-01-2

如果我的代码对某一列使用

arrange

，比如

col1

，但是如果该行没有该列的可用数据，那么我希望它使用

col2

，如果

col2

不可用，那么我希望它使用

col3

，以此类推，直到

col6

因此，目前：

df <- data.frame(col1 = c("NA", "1999-07-01", "NA"), 
                 col2 = c("NA", "09-22-2011", "01-12-2009"),
                 col3 = c("04-01-2015", "09-22-2011", "01-12-2009"),
                 col4 = c("04-01-2015", "NA", "01-12-2009"),
                 col5 = c("NA", "09-22-2011", "01-12-2009"),
                 col6 = c("04-01-2015", "09-22-2011", "NA"),
                 id = c(1251,16121,1209))

我在

arrange

中考虑使用case_，但不确定如何将其转换为

mutate

方面

或者，我正在考虑创建另一个专栏，即：

    df <- df %>%
      mutate(earliestDate = case_when(
        !is.na(col1) ~ col1,
        is.na(col1) ~ col2,
        is.na(col2) ~ col3,
        is.na(col3) ~ col4, 
        is.na(col4) ~ col5))

df%
突变（earliestDate=case_，当(
！is.na（col1）~col1，
is.na（col1）~col2，
是.na（col2）~col3，
是.na（col3）~col4，
is.na（col4）~col5））

但是上面没有更新新的earliestDate列以获得最早的日期，只是获取第一列

我假设您希望按

最早的日期对行进行排序

；为什么不这样做呢

df %>%
    gather(key, date, starts_with("col")) %>%
    group_by(id) %>%
    mutate(earliestDate = min(as.Date(date, format = "%m-%d-%Y"), na.rm = TRUE)) %>%
    spread(key, date)
## A tibble: 3 x 8
## Groups:   id [3]
#      id earliestDate col1       col2       col3       col4       col5   col6
#   <dbl> <date>       <chr>      <chr>      <chr>      <chr>      <chr>  <chr>
#1  1209. 2009-01-12   NA         01-12-2009 01-12-2009 01-12-2009 01-12… NA
#2  1251. 2015-04-01   NA         NA         04-01-2015 04-01-2015 NA     04-01…
#3 16121. 1999-07-01   07-01-1999 09-22-2011 09-22-2011 NA         09-22… 09-22…

要开始，当前的“NA”值实际上不是R的

NA

值，请转换它们

df[df == "NA"] <- NA

我可以看出OP提供的数据中存在两个挑战

日期格式不一致。一年的某个时候部分在开始，某个时候在结束

列的首选顺序。首先考虑

Col1

，然后考虑

Col2

，依此类推

要以异构格式处理日期，可以使用

dplyr

中的

parse\u date\u time

函数。使用

coalesce

对列进行分组是这样一种方式，即

col1

数据获得优先权，然后

col2

等等

library(dplyr)
library(lubridate)

df %>% 
mutate_at(vars(1:6), funs(parse_date_time(., orders=c("ymd","mdy"),quiet=TRUE))) %>%
mutate(col = coalesce(col1,col2,col3,col4,col5,col6)) %>%
  select(id, col)

#      id        col
# 1  1251 2015-04-01
# 2 16121 1999-07-01
# 3  1209 2009-01-12

数据：

df <- data.frame(col1 = c("NA", "1999-07-01", "NA"), 
                 col2 = c("NA", "09-22-2011", "01-12-2009"),
                 col3 = c("04-01-2015", "09-22-2011", "01-12-2009"),
                 col4 = c("04-01-2015", "NA", "01-12-2009"),
                 col5 = c("NA", "09-22-2011", "01-12-2009"),
                 col6 = c("04-01-2015", "09-22-2011", "NA"),
                 id = c(1251,16121,1209))

df我不知道你想做什么<代码>排列

根据特定列中的值对行进行排序。它不会更改/操作数据。你能为你提供的样本数据发布你的预期输出吗？应该澄清的是，不要总体上寻找最早的日期！但给出的最早日期是hierarchy@S31在上面的代码中，我不是在寻找最早的日期，而是根据

id

查找最早的日期。这就是我们按

id

分组的原因。

df$left_most <- apply(df[-7], 1, function(x) x[which.min(is.na(x))])

df
        col1       col2       col3       col4       col5       col6    id left_most
1       <NA>       <NA> 04-01-2015 04-01-2015       <NA> 04-01-2015  1251     04-01-2015
2 07-01-1999 09-22-2011 09-22-2011       <NA> 09-22-2011 09-22-2011 16121     07-01-1999
3       <NA> 01-12-2009 01-12-2009 01-12-2009 01-12-2009       <NA>  1209     01-12-2009

library(dplyr)
library(lubridate)

df %>% 
mutate_at(vars(1:6), funs(parse_date_time(., orders=c("ymd","mdy"),quiet=TRUE))) %>%
mutate(col = coalesce(col1,col2,col3,col4,col5,col6)) %>%
  select(id, col)

#      id        col
# 1  1251 2015-04-01
# 2 16121 1999-07-01
# 3  1209 2009-01-12

df <- data.frame(col1 = c("NA", "1999-07-01", "NA"), 
                 col2 = c("NA", "09-22-2011", "01-12-2009"),
                 col3 = c("04-01-2015", "09-22-2011", "01-12-2009"),
                 col4 = c("04-01-2015", "NA", "01-12-2009"),
                 col5 = c("NA", "09-22-2011", "01-12-2009"),
                 col6 = c("04-01-2015", "09-22-2011", "NA"),
                 id = c(1251,16121,1209))