R-按2个变量分组,并将最高值设置为新列?

R-按2个变量分组,并将最高值设置为新列?,r,R,我有以下数据 df <- data.frame("group1" = c("A","B","B","C","D","D","C","E","E","A","B","B","C","D",&quo

我有以下数据

df <- data.frame("group1" = c("A","B","B","C","D","D","C","E","E","A","B","B","C","D","D","C","E","E"),
                 "group2" = c("X","Y","Z","Z","W","F","Z","N","M","D","F","U","T","R","R","S","S","O"),
                 "val" = c(232,200,3321,400,600,500,22,33,1200,555,200,888,43,600,500,800,900,3213))
df%group_by(group1,group2)%%>%summary(“totalval”=sum(val))%%>%arrange(group1,desc(totalval))
#一个tibble:16 x 3
#分组:第1组[5]
组1组2 totalval
公元555年
2x232
3 B Z 3321
4 B U 888
5b F 200
6B Y 200
7 C S 800
8CZ422
9 C T 43
10 D R 1100
11 D W 600
12 D F 500
13 E O 3213
14英米1200
15 E S 900
16 E N 33
因此,我希望列1的值为“D”,因为group2列中的值“D”在所有组1中的值最高。第2列将显示组1列中值为“a”的所有行的值555/(555+232)=0.70

我找到了一种获得第1列的方法,创建一个具有最高值的临时表,然后将其连接回主表,但我认为这相当复杂-我相信有一种更干净的方法。我也不确定如何获得要添加的百分比(如上所述的第2列)

到目前为止,我的解决办法是:

#add in overall val to use for percentages
df <- df %>% group_by(group1) %>% mutate("g1_total_val" = sum(val)) %>% ungroup()

#create temp table with selected values
t2 <- df %>% group_by(group1,group2) %>% summarise("totalval" = sum(val)) %>% arrange(group1, desc(totalval)) %>% 
        slice(1:1) %>% mutate("highest_g2" = group2) %>% select(group1, highest_g2)

df <- df %>% left_join(t2,on = "group1")
#添加用于百分比的总val
df%group\U by(group1)%%>%mutate(“g1\U total\U val”=总和(val))%%>%ungroup()
#使用选定值创建临时表
t2%group_by(group1,group2)%%>%SUMMARY(“TOTALLVAL”=总和(val))%%>%RANGE(group1,desc(TOTALLVAL))%%>%
切片(1:1)%%>%突变(“最高的\u g2”=组2)%%>%选择(组1,最高的\u g2)
df%左联合(t2,on=“group1”)

关于如何获取第2列和更简单地添加第1列的方法的任何帮助都将非常有用。

您可以使用
which.max
获取第一列的最大值索引,并将
max
除以第二列的
和,如下所示:

library(tidyverse)

df %>%
  group_by(group1, group2) %>%
  summarise(totalval = sum(val)) %>%
  arrange(group1, desc(totalval)) %>% 
  mutate(col1 = group2[which.max(totalval)],
         col2 = max(totalval) / sum(totalval))
其中:

   group1 group2 totalval col1   col2
   <fct>  <fct>     <dbl> <fct> <dbl>
 1 A      D           555 D     0.705
 2 A      X           232 D     0.705
 3 B      Z          3321 Z     0.721
 4 B      U           888 Z     0.721
 5 B      F           200 Z     0.721
 6 B      Y           200 Z     0.721
 7 C      S           800 S     0.632
 8 C      Z           422 S     0.632
 9 C      T            43 S     0.632
10 D      R          1100 R     0.5  
11 D      W           600 R     0.5  
12 D      F           500 R     0.5  
13 E      O          3213 O     0.601
14 E      M          1200 O     0.601
15 E      S           900 O     0.601
16 E      N            33 O     0.601
请注意,
summary
会自动“剥离”第二个分组变量,但
mutate
不会,因此我会手动重新进行分组。

这将提供原始的18行,并添加了2列。

感谢您的回答。我的实际数据有更多的列,所以Summary会把数据弄乱(对不起,我应该提到它)。有没有没有没有没有摘要的方法?
摘要
是您提供的代码的复制粘贴;我只是添加了
mutate
。总结有什么问题?共享您的输入数据和预期输出,以便我能更好地帮助您。原始数据集有18行,上面的摘要有16行-我希望最终结果返回给我原始数据集,但基本上是新的两列。对不起,我应该说清楚我对我的答案做了修改。这就是你想要的吗?谢谢!是的,几乎可以,第1列是完全正确的,但是col2公式在分母中应该是max(totalval)/sum(val),而不是sum(totalval)。不过很容易修复-再次感谢!
   group1 group2 totalval col1   col2
   <fct>  <fct>     <dbl> <fct> <dbl>
 1 A      D           555 D     0.705
 2 A      X           232 D     0.705
 3 B      Z          3321 Z     0.721
 4 B      U           888 Z     0.721
 5 B      F           200 Z     0.721
 6 B      Y           200 Z     0.721
 7 C      S           800 S     0.632
 8 C      Z           422 S     0.632
 9 C      T            43 S     0.632
10 D      R          1100 R     0.5  
11 D      W           600 R     0.5  
12 D      F           500 R     0.5  
13 E      O          3213 O     0.601
14 E      M          1200 O     0.601
15 E      S           900 O     0.601
16 E      N            33 O     0.601
df %>%
  group_by(group1, group2) %>%
  mutate(totalval = sum(val)) %>%
  group_by(group1) %>% 
  arrange(group1, desc(totalval)) %>% 
  mutate(col1 = group2[which.max(totalval)],
         col2 = max(totalval) / sum(totalval))