如何通过分组合并行并仅保留R中的最高值
假设以下数据帧:如何通过分组合并行并仅保留R中的最高值,r,dplyr,grouping,R,Dplyr,Grouping,假设以下数据帧: dfX <- data.frame('a' = c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'), 'b' = c('c2', 'c2', 'c8', 'c8', 'c4', 'c7', 'c7', 'c9', 'c9','c9'), 'c' = c('f34', 'f34', 'm92', 'm92', 'm92', 'g22', 'g22', 'i41', '
dfX <- data.frame('a' = c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'),
'b' = c('c2', 'c2', 'c8', 'c8', 'c4', 'c7', 'c7', 'c9', 'c9','c9'),
'c' = c('f34', 'f34', 'm92', 'm92', 'm92', 'g22', 'g22', 'i41', 'i41', 'i41'),
'd' = c('Check', 'Check', 'Check', 'Check', 'UnCheck', 'Check', 'Check', 'Check', 'Check','Check'),
'val1' = c(54, '', 37, '', '', 51, '', 74, '', ''),
'val2' = c('', 59, '', 87, 84, '', 62, '', 27, 85))
dfX
a b c d val1 val2
1 A1 c2 f34 Check 54
2 A1 c2 f34 Check 59
3 A2 c8 m92 Check 37
4 A2 c8 m92 Check 87
5 A2 c4 m92 UnCheck 84
6 A3 c7 g22 Check 51
7 A3 c7 g22 Check 62
8 A4 c9 i41 Check 74
9 A4 c9 i41 Check 27
10 A4 c9 i41 Check 85
但是添加了d
列和所述条件,即如果此列标记为“未选中”
,则应将其删除,我找不到解决方法。仅将A4
行的max()
值作为输出也失败
所需输出应如下所示:
dfY
a b c d val1 val2
1 A c2 f34 Check 54 59
2 B c8 m92 Check 37 87
3 C c7 g22 Check 51 62
4 D c9 i41 Check 74 85
我们需要将其转换为数值以获得
max
。“val1”、“val2”是character
类(注意:我们在data.frame
构造中默认使用r4.0.0
其中stringsAFactors=FALSE
。如果R版本是<4.0,则默认情况下是stringsAFactors=TRUE
,然后是as.numeric(.)下面的
应更改为as.numeric(as.character(.)))
在
base R
中,我们可以使用aggregate
dfX[c('val1', 'val2')] <- lapply(dfX[c('val1', 'val2')], as.numeric)
aggregate(. ~ a + b + c+ d, dfX,subset = d == 'Check', max,
na.rm = TRUE, na.action = NULL)
目前,您的data.frame将val1和val2作为因子,我们可以这样做:
dfX %>%
mutate_at(c("val1","val2"),~replace(as.character(.x),.x=="",NA)) %>%
filter(d=="Check") %>%
group_by(a,b,c,d) %>%
summarize_all(~max(as.numeric(.x),na.rm=TRUE))
# A tibble: 4 x 6
# Groups: a, b, c [4]
a b c d val1 val2
<fct> <fct> <fct> <fct> <chr> <chr>
1 A c2 f34 Check 54 59
2 B c8 m92 Check 37 87
3 C c7 g22 Check 51 62
4 D c9 i41 Check 74 85
dfX%>%
在(c(“val1”,“val2”),~replace(as.character(.x),.x==”,NA))%>%
过滤器(d==“检查”)%>%
(a、b、c、d)组%>%
汇总所有内容(~max(如.numeric(.x),na.rm=TRUE))
#一个tibble:4x6
#组:a、b、c[4]
a b c d val1 val2
1 A c2 f34检查54 59
2 B c8 m92检查37 87
3 C c7 g22检查51 62
4 D c9 i41检查74 85
快速回答。您还可以进行聚合(cbind(val1.val2)~…)
dfX %>%
filter(d == 'Check') %>%
group_by(a, b, c, d) %>%
summarise(across(starts_with('val'), ~ max(as.numeric(.), na.rm = TRUE)))
dfX[c('val1', 'val2')] <- lapply(dfX[c('val1', 'val2')], as.numeric)
aggregate(. ~ a + b + c+ d, dfX,subset = d == 'Check', max,
na.rm = TRUE, na.action = NULL)
aggregate(cbind(val1, val2) ~ ., dfX,subset = d == 'Check', max,
na.rm = TRUE, na.action = NULL)
dfX %>%
mutate_at(c("val1","val2"),~replace(as.character(.x),.x=="",NA)) %>%
filter(d=="Check") %>%
group_by(a,b,c,d) %>%
summarize_all(~max(as.numeric(.x),na.rm=TRUE))
# A tibble: 4 x 6
# Groups: a, b, c [4]
a b c d val1 val2
<fct> <fct> <fct> <fct> <chr> <chr>
1 A c2 f34 Check 54 59
2 B c8 m92 Check 37 87
3 C c7 g22 Check 51 62
4 D c9 i41 Check 74 85