如何通过分组合并行并仅保留R中的最高值

如何通过分组合并行并仅保留R中的最高值,r,dplyr,grouping,R,Dplyr,Grouping,假设以下数据帧: dfX <- data.frame('a' = c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'), 'b' = c('c2', 'c2', 'c8', 'c8', 'c4', 'c7', 'c7', 'c9', 'c9','c9'), 'c' = c('f34', 'f34', 'm92', 'm92', 'm92', 'g22', 'g22', 'i41', '

假设以下数据帧:

dfX <- data.frame('a' = c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'),
              'b' = c('c2', 'c2', 'c8', 'c8', 'c4', 'c7', 'c7', 'c9', 'c9','c9'),
              'c' = c('f34', 'f34', 'm92', 'm92', 'm92', 'g22', 'g22', 'i41', 'i41', 'i41'),
              'd' = c('Check', 'Check', 'Check', 'Check', 'UnCheck', 'Check', 'Check', 'Check', 'Check','Check'),
              'val1' = c(54, '', 37, '', '', 51, '', 74, '', ''),
              'val2' = c('', 59, '', 87, 84, '', 62, '', 27, 85))

dfX
    a   b   c    d       val1  val2
1   A1  c2  f34  Check   54 
2   A1  c2  f34  Check         59
3   A2  c8  m92  Check   37 
4   A2  c8  m92  Check         87
5   A2  c4  m92  UnCheck       84
6   A3  c7  g22  Check   51 
7   A3  c7  g22  Check         62
8   A4  c9  i41  Check   74 
9   A4  c9  i41  Check         27
10  A4  c9  i41  Check         85
但是添加了
d
列和所述条件,即如果此列标记
为“未选中”
,则应将其删除,我找不到解决方法。仅将
A4
行的
max()
值作为输出也失败

所需输出应如下所示:

dfY
    a   b   c    d       val1  val2
1   A   c2  f34  Check   54    59
2   B   c8  m92  Check   37    87
3   C   c7  g22  Check   51    62
4   D   c9  i41  Check   74    85

我们需要将其转换为数值以获得
max
。“val1”、“val2”是
character
类(注意:我们在
data.frame
构造中默认使用
r4.0.0
其中
stringsAFactors=FALSE
。如果R版本是<4.0,则默认情况下是
stringsAFactors=TRUE
,然后是
as.numeric(.)下面的
应更改为as.numeric(as.character(.)))


base R
中,我们可以使用
aggregate

dfX[c('val1', 'val2')] <- lapply(dfX[c('val1', 'val2')], as.numeric)
aggregate(. ~ a + b + c+ d, dfX,subset = d == 'Check', max,
      na.rm = TRUE, na.action = NULL)

目前,您的data.frame将val1和val2作为因子,我们可以这样做:

dfX %>% 
mutate_at(c("val1","val2"),~replace(as.character(.x),.x=="",NA)) %>% 
filter(d=="Check") %>% 
group_by(a,b,c,d) %>% 
summarize_all(~max(as.numeric(.x),na.rm=TRUE))

# A tibble: 4 x 6
# Groups:   a, b, c [4]
  a     b     c     d     val1  val2 
  <fct> <fct> <fct> <fct> <chr> <chr>
1 A     c2    f34   Check 54    59   
2 B     c8    m92   Check 37    87   
3 C     c7    g22   Check 51    62   
4 D     c9    i41   Check 74    85 
dfX%>%
在(c(“val1”,“val2”),~replace(as.character(.x),.x==”,NA))%>%
过滤器(d==“检查”)%>%
(a、b、c、d)组%>%
汇总所有内容(~max(如.numeric(.x),na.rm=TRUE))
#一个tibble:4x6
#组:a、b、c[4]
a b c d val1 val2
1 A c2 f34检查54 59
2 B c8 m92检查37 87
3 C c7 g22检查51 62
4 D c9 i41检查74 85

快速回答。您还可以进行
聚合(cbind(val1.val2)~…)
dfX %>%
 filter(d == 'Check') %>% 
 group_by(a, b, c, d) %>% 
 summarise(across(starts_with('val'), ~ max(as.numeric(.), na.rm = TRUE)))
dfX[c('val1', 'val2')] <- lapply(dfX[c('val1', 'val2')], as.numeric)
aggregate(. ~ a + b + c+ d, dfX,subset = d == 'Check', max,
      na.rm = TRUE, na.action = NULL)
aggregate(cbind(val1, val2) ~ ., dfX,subset = d == 'Check', max,
      na.rm = TRUE, na.action = NULL)
dfX %>% 
mutate_at(c("val1","val2"),~replace(as.character(.x),.x=="",NA)) %>% 
filter(d=="Check") %>% 
group_by(a,b,c,d) %>% 
summarize_all(~max(as.numeric(.x),na.rm=TRUE))

# A tibble: 4 x 6
# Groups:   a, b, c [4]
  a     b     c     d     val1  val2 
  <fct> <fct> <fct> <fct> <chr> <chr>
1 A     c2    f34   Check 54    59   
2 B     c8    m92   Check 37    87   
3 C     c7    g22   Check 51    62   
4 D     c9    i41   Check 74    85