r仅在两个组中的一个组和两个组中查找成员
如果这是我的数据r仅在两个组中的一个组和两个组中查找成员,r,dplyr,R,Dplyr,如果这是我的数据 Number Group Length 4432 1 NA 4432 2 2.34 4564 1 5.89 4389 1 NA 6578 2 3.12 4389 2 NA 4355
Number Group Length
4432 1 NA
4432 2 2.34
4564 1 5.89
4389 1 NA
6578 2 3.12
4389 2 NA
4355 1 4.11
4355 2 6.15
4689 1 6.22
4689 1 NA
我试图找到只在第1组或第2组中的船号,以及同时在第1组和第2组中的船号
Number Group Length Results
4432 1 NA Both 1 &2
4432 2 2.34 Both 1 &2
4564 1 5.89 1
4389 1 NA 1
6578 2 3.12 2
4389 2 NA 2
4355 1 4.11 Both 1 & 2
4355 2 6.15 Both 1 & 2
4689 1 6.22 1
4689 1 NA 1
我可以使用循环和子集来实现这一点,我对dplyr或其他创建结果列的方法感兴趣。感谢您的帮助。谢谢。我们可以使用n\u distinct检查唯一“组”的编号,并将唯一“组”粘贴到前缀“两者”上
library(stringr)
library(dplyr)
library(data.table)
df1 %>%
group_by(grp = rleid(Number)) %>%
mutate(Results = case_when(n_distinct(Group) >1 ~
str_c("Both ", str_c(unique(Group), collapse=" & ")),
TRUE ~ as.character(unique(Group)))) %>%
ungroup %>%
select(-grp)
# A tibble: 10 x 4
# Number Group Length Results
# <int> <int> <dbl> <chr>
# 1 4432 1 NA Both 1 & 2
# 2 4432 2 2.34 Both 1 & 2
# 3 4564 1 5.89 1
# 4 4389 1 NA 1
# 5 6578 2 3.12 2
# 6 4389 2 NA 2
# 7 4355 1 4.11 Both 1 & 2
# 8 4355 2 6.15 Both 1 & 2
# 9 4689 1 6.22 1
#10 4689 1 NA 1
数据
基本R解决方案:
# Row-wise concatenate the Group vector by the number separating it with an " & "
aggregated_df <- aggregate(list(Results = df$Group), list(Number = df$Number), paste0, collapse = " & ")
# Preserve unique elements (removing the ampersand if elements are duplicated):
aggregated_df$Results <- sapply(strsplit(aggregated_df$Results, " & "),
function(x){paste0(unique(x), collapse = " & ")})
# If the string contains an ampersand concatenate both infront of the grouping string:
aggregated_df$Group <- ifelse(grepl(" & ", aggregated_df$Results), paste0("Both ", aggregated_df$Results),
aggregated_df$Results)
# Merge the two dataframes together:
df <- merge(df, aggregated_df, by = "Number", all.x = T, sort = F)
基本R解决方案2拆分、应用、合并:
# Split dataframe by number, apply group concatenation function, combine as data.frame:
df2 <- data.frame(do.call("rbind", lapply(split(df, df$Number), function(x){
res <- paste0(unique(x$Group), collapse = " & ")
x$Result <- ifelse(grepl(" & ", res), paste0("Both ", res), res)
x
}
)
),
row.names = NULL
)
# Reorder the new dataframe using the old df order:
df2 <- df2[order(df$Number),]
数据:
df <- structure(
list(
Number = c(
4432L,
4432L,
4564L,
4389L,
6578L,
4389L,
4355L,
4355L,
4689L,
4689L
),
Group = c(1L, 2L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 1L),
Length = c(NA, 2.34, 5.89, NA, 3.12,
NA, 4.11, 6.15, 6.22, NA)
),
class = "data.frame",
row.names = c(NA,-10L)
)
你最初的解决方案奏效了。我没有尝试最新版本。我也会试试。谢谢。@Science11我最初的解决方案是基于数字分组,但当我检查您的预期值时,我发现您仅基于相同的相邻元素进行分组,因此,我将其更改为RLEIDNAME
# Split dataframe by number, apply group concatenation function, combine as data.frame:
df2 <- data.frame(do.call("rbind", lapply(split(df, df$Number), function(x){
res <- paste0(unique(x$Group), collapse = " & ")
x$Result <- ifelse(grepl(" & ", res), paste0("Both ", res), res)
x
}
)
),
row.names = NULL
)
# Reorder the new dataframe using the old df order:
df2 <- df2[order(df$Number),]
df <- structure(
list(
Number = c(
4432L,
4432L,
4564L,
4389L,
6578L,
4389L,
4355L,
4355L,
4689L,
4689L
),
Group = c(1L, 2L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 1L),
Length = c(NA, 2.34, 5.89, NA, 3.12,
NA, 4.11, 6.15, 6.22, NA)
),
class = "data.frame",
row.names = c(NA,-10L)
)