R 折叠行,其中一些行全部为NA,列混合了因子、字符和数字类

R 折叠行,其中一些行全部为NA,列混合了因子、字符和数字类,r,R,在堆栈溢出方面也有一些类似的问题,但我还没有找到一个解决方案,可以用一个混合了列类的数据帧来解决这个问题 我有一个数据帧,df: df <- structure(list(ID = c("ID1", "ID1", "ID1", "ID1", "ID1", "ID1", "ID1", "ID1", "ID1"),

在堆栈溢出方面也有一些类似的问题,但我还没有找到一个解决方案,可以用一个混合了列类的数据帧来解决这个问题

我有一个数据帧,df:

df <- structure(list(ID = c("ID1", "ID1", "ID1", "ID1", "ID1", "ID1", 
"ID1", "ID1", "ID1"), COLOUR = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("BLUE", "RED"), class = "factor"), 
    DATE = structure(c(17378, 17378, 17378, 17378, 17378, 17400, 
    17925, 17925, 17925), class = "Date"), size1 = c(NA, 496.4647, 
    332.4, NA, NA, NA, NA, 23, NA), size2 = c(NA, NA, 90, NA, NA, 
    NA, NA, NA, NA), length1 = c(NA, NA, NA, NA, 343.8446, NA, 
    NA, NA, NA), length2 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), width1 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), width2 = c(NA, NA, NA, NA, NA, NA, NA, 
    34.682, NA), group1 = c(NA, NA, NA, NA, NA, NA, NA, NA, "CAT!"
    )), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))

# A tibble: 9 x 10
  ID    COLOUR DATE        siz1 size2 length1 length2 width1 width2 group1
  <chr> <fct>  <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <chr> 
1 ID1   RED    2017-07-31   NA     NA     NA       NA     NA   NA   NA    
2 ID1   RED    2017-07-31  496.    NA     NA       NA     NA   NA   NA    
3 ID1   RED    2017-07-31  332.    90     NA       NA     NA   NA   NA    
4 ID1   RED    2017-07-31   NA     NA     NA       NA     NA   NA   NA    
5 ID1   RED    2017-07-31   NA     NA    344.      NA     NA   NA   NA    
6 ID1   RED    2017-08-22   NA     NA     NA       NA     NA   NA   NA    
7 ID1   RED    2019-01-29   NA     NA     NA       NA     NA   NA   NA    
8 ID1   RED    2019-01-29   23     NA     NA       NA     NA   34.7 NA    
9 ID1   RED    2019-01-29   NA     NA     NA       NA     NA   NA   CAT!
^我得到一个以上的错误


有人能帮您找到解决方案吗?

根据以上评论,您可以尝试以下方法,看看当可能存在多个值时,它是否会返回预期结果:

library(dplyr)

df %>%
  group_by(ID, COLOUR, DATE) %>%
  summarise(across(everything(), ~ na.omit(.x)[1:pmax(first(max(colSums(!is.na(cur_data())))), 1)]), .groups = "drop")

# A tibble: 3 x 10
  ID    COLOUR DATE        siz1 size2 length1 length2 width1 width2 group1
  <chr> <fct>  <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <chr> 
1 ID1   RED    2017-07-31  496.    90    344.      NA     NA   NA   NA    
2 ID1   RED    2017-08-22   NA     NA     NA       NA     NA   NA   NA    
3 ID1   RED    2019-01-29   23     NA     NA       NA     NA   34.7 CAT!  
库(dplyr)
df%>%
分组依据(ID、颜色、日期)%>%
总结(跨越(everything(),~na.omit(.x)[1:pmax(first(max)(colSums(!is.na(cur_data())))),1)],.groups=“drop”)
#一个tibble:3x10
ID颜色日期尺寸1尺寸2长度1长度2宽度1宽度2组1
1 ID1 RED 2017-07-31 496。90    344.      娜娜娜娜
2 ID1红色2017-08-22不适用
3 ID1红色2019-01-29 23 NA 34.7猫!

每个组中每个变量是否只有一个或一个有效值?否,每个组中每个变量可能有多个有效值如果一个组中有两个或多个有效值,则预期的输出是什么?您的示例中每个组的每个变量只有一个(或没有)有效值,因此可以将这些值折叠到一行。抱歉-重复ID和日期我已更新了我的问题以反映问题的答案是否可以使用不需要跨行的解决方案?
sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}

df %>%
    group_by(ID, DATE) %>%
    summarise_all(funs(sum_NA))

Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), na.rm = TRUE) : 
  ‘sum’ not meaningful for factors
df %>%
    group_by(ID, DATE) %>%
    summarise_if(is.numeric, funs(sum_NA))

  ID    DATE       size1 size2 length1 length2 width1 width2
  <chr> <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
1 ID1   2017-07-31  829.    90    344.      NA     NA   NA  
2 ID1   2017-08-22   NA     NA     NA       NA     NA   NA  
3 ID1   2019-01-29   23     NA     NA       NA     NA   34.7
df <- setDT(df)[, lapply(.SD, na.omit), by = c("ID", "DATE")]
Error in `[.data.table`(setDT(df), , lapply(.SD, na.omit), by = c("ID",  : 
  Supplied 2 items for column 2 of group 1 which has 5 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
library(dplyr)

df %>%
  group_by(ID, COLOUR, DATE) %>%
  summarise(across(everything(), ~ na.omit(.x)[1:pmax(first(max(colSums(!is.na(cur_data())))), 1)]), .groups = "drop")

# A tibble: 3 x 10
  ID    COLOUR DATE        siz1 size2 length1 length2 width1 width2 group1
  <chr> <fct>  <date>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <chr> 
1 ID1   RED    2017-07-31  496.    90    344.      NA     NA   NA   NA    
2 ID1   RED    2017-08-22   NA     NA     NA       NA     NA   NA   NA    
3 ID1   RED    2019-01-29   23     NA     NA       NA     NA   34.7 CAT!