r仅在两个组中的一个组和两个组中查找成员_R_Dplyr

r仅在两个组中的一个组和两个组中查找成员

r仅在两个组中的一个组和两个组中查找成员,r,dplyr,R,Dplyr,如果这是我的数据 Number Group Length 4432 1 NA 4432 2 2.34 4564 1 5.89 4389 1 NA 6578 2 3.12 4389 2 NA 4355

如果这是我的数据

Number        Group  Length    
4432          1      NA        
4432          2      2.34      
4564          1      5.89      
4389          1      NA        
6578          2      3.12       
4389          2      NA            
4355          1      4.11      
4355          2      6.15       
4689          1      6.22      
4689          1      NA

我试图找到只在第1组或第2组中的船号，以及同时在第1组和第2组中的船号

Number        Group  Length    Results
4432          1      NA        Both 1 &2
4432          2      2.34      Both 1 &2
4564          1      5.89      1
4389          1      NA        1
6578          2      3.12      2 
4389          2      NA        2    
4355          1      4.11      Both 1 & 2
4355          2      6.15      Both 1 & 2 
4689          1      6.22      1
4689          1      NA        1

我可以使用循环和子集来实现这一点，我对dplyr或其他创建结果列的方法感兴趣。感谢您的帮助。谢谢。

我们可以使用n\u distinct检查唯一“组”的编号，并将唯一“组”粘贴到前缀“两者”上

library(stringr)
library(dplyr)
library(data.table)
df1 %>% 
   group_by(grp = rleid(Number)) %>%
   mutate(Results = case_when(n_distinct(Group) >1 ~ 
                      str_c("Both ", str_c(unique(Group), collapse=" & ")),
     TRUE ~ as.character(unique(Group)))) %>%
   ungroup %>%
   select(-grp)
# A tibble: 10 x 4
#   Number Group Length Results   
#    <int> <int>  <dbl> <chr>     
# 1   4432     1  NA    Both 1 & 2
# 2   4432     2   2.34 Both 1 & 2
# 3   4564     1   5.89 1         
# 4   4389     1  NA    1         
# 5   6578     2   3.12 2         
# 6   4389     2  NA    2         
# 7   4355     1   4.11 Both 1 & 2
# 8   4355     2   6.15 Both 1 & 2
# 9   4689     1   6.22 1         
#10   4689     1  NA    1

数据基本R解决方案：

# Row-wise concatenate the Group vector by the number separating it with an " & "

aggregated_df <- aggregate(list(Results = df$Group), list(Number = df$Number), paste0, collapse = " & ")

# Preserve unique elements (removing the ampersand if elements are duplicated): 

aggregated_df$Results <- sapply(strsplit(aggregated_df$Results, " & "),

                               function(x){paste0(unique(x), collapse = " & ")})

# If the string contains an ampersand concatenate both infront of the grouping string: 

aggregated_df$Group <- ifelse(grepl(" & ", aggregated_df$Results), paste0("Both ", aggregated_df$Results),
                              aggregated_df$Results)

# Merge the two dataframes together: 

df <- merge(df, aggregated_df, by = "Number", all.x = T, sort = F)

基本R解决方案2拆分、应用、合并：

# Split dataframe by number, apply group concatenation function, combine as data.frame:

df2 <- data.frame(do.call("rbind", lapply(split(df, df$Number), function(x){

        res <- paste0(unique(x$Group), collapse = " & ")

        x$Result <- ifelse(grepl(" & ", res), paste0("Both ", res), res)

        x

      }

    )

  ),

 row.names = NULL

)

# Reorder the new dataframe using the old df order: 

df2 <- df2[order(df$Number),]

数据：

df <- structure(
  list(
    Number = c(
      4432L,
      4432L,
      4564L,
      4389L,
      6578L,
      4389L,
      4355L,
      4355L,
      4689L,
      4689L
    ),
    Group = c(1L, 2L, 1L, 1L,
              2L, 2L, 1L, 2L, 1L, 1L),
    Length = c(NA, 2.34, 5.89, NA, 3.12,
               NA, 4.11, 6.15, 6.22, NA)
  ),
  class = "data.frame",
  row.names = c(NA,-10L)
)

你最初的解决方案奏效了。我没有尝试最新版本。我也会试试。谢谢。@Science11我最初的解决方案是基于数字分组，但当我检查您的预期值时，我发现您仅基于相同的相邻元素进行分组，因此，我将其更改为RLEIDNAME

# Split dataframe by number, apply group concatenation function, combine as data.frame:

df2 <- data.frame(do.call("rbind", lapply(split(df, df$Number), function(x){

        res <- paste0(unique(x$Group), collapse = " & ")

        x$Result <- ifelse(grepl(" & ", res), paste0("Both ", res), res)

        x

      }

    )

  ),

 row.names = NULL

)

# Reorder the new dataframe using the old df order: 

df2 <- df2[order(df$Number),]

df <- structure(
  list(
    Number = c(
      4432L,
      4432L,
      4564L,
      4389L,
      6578L,
      4389L,
      4355L,
      4355L,
      4689L,
      4689L
    ),
    Group = c(1L, 2L, 1L, 1L,
              2L, 2L, 1L, 2L, 1L, 1L),
    Length = c(NA, 2.34, 5.89, NA, 3.12,
               NA, 4.11, 6.15, 6.22, NA)
  ),
  class = "data.frame",
  row.names = c(NA,-10L)
)