r中不同数据帧行的每列的最大值_R_Dataframe_Dplyr_Row_Multiple Columns

r中不同数据帧行的每列的最大值

r dataframe

r中不同数据帧行的每列的最大值,r,dataframe,dplyr,row,multiple-columns,R,Dataframe,Dplyr,Row,Multiple Columns,假设有一个带有ID列的有序df，其他列包含数字数据，按最后一列排序 ID <- c(123, 142, 21, 562, 36, 721, 847, 321) A <- c(96, 83, 73, 47, 88, 65, 72, 67) B <- c(72, 69, 88, 75, 63, 89, 48, 80) C <- c(95, 94, 94, 94, 65, 81, 75, 75) D <- c(63, 88, 89, 88, 89, 79, 88, 79)

假设有一个带有ID列的有序df，其他列包含数字数据，按最后一列排序

ID <- c(123, 142, 21, 562, 36, 721, 847, 321)
A <- c(96, 83, 73, 47, 88, 65, 72, 67)
B <- c(72, 69, 88, 75, 63, 89, 48, 80)
C <- c(95, 94, 94, 94, 65, 81, 75, 75)
D <- c(63, 88, 89, 88, 89, 79, 88, 79)
Rating <- c(97, 95, 92, 87, 85, 83, 79, 77)
df <- data.frame(ID, A, B, C, D, Rating)
df
#   ID  A  B  C  D Rating
#1 123 96 72 95 63     97
#2 142 83 69 94 88     95
#3  21 73 88 94 89     92
#4 562 47 75 94 88     87
#5  36 88 63 65 89     85
#6 721 65 89 81 79     83
#7 847 72 48 75 88     79
#8 321 67 80 75 79     77

有没有更干净、简洁/高效的方法？不一定是相同的方式，可能只是ID和角色，如：

#   ID  Group
#1 123    A
#2 721    B
#3 142    C
#4 21     D

这里有一个处理重复ID的dplyr解决方案。首先，我们将把所有字母旋转到一列中。然后我们用这些字母把你分组。最后，在每个字母中，我们根据值和等级对值中的关系进行排序，并选择第一个元素来获得每个ID

library(dplyr)

df %>% 
  pivot_longer(cols = c("A", "B", "C", "D")) %>% 
  group_by(Group = name) %>%
  summarise(ID = ID[order(-value, -Rating)[1]])
#> # A tibble: 4 x 2
#>   Group    ID
#>   <chr> <dbl>
#> 1 A       123
#> 2 B       721
#> 3 C       123
#> 4 D        21

另一个dplyr/purrr解决方案，不如艾伦的简洁

find_max <- function(gg){
    tibble(
        group=gg, 
        ID= df %>% select(all_of(c(gg,"Rating","ID"))) %>% 
        arrange_all(desc) %>% slice(1) %>% pull(ID))
}

c("A","B","C","D") %>% map_dfr(find_max)

基于并使用dplyr：

df%>% 组\按ID%>% mutatemax.val=pmaxA，B，C，D[which.maxRating]>% 汇总每个列表最大%>% mutatetop.col=apply。[，2:5]，1，functionx namesx[which.maxx]]>% 选择cA、B、C、D、额定值你得到

一个tibble:8x3 ID max.val top.col 12194C 23689D 312396A 414294摄氏度 5321 80 B 656294摄氏度 772189 B 884788 D

我发现有些解决方案不处理重复的ID。例如，A组和C组的ID都是123

为了获得与问题的最终结果类似的输出，处理重复ID的另一个解决方案如下

# initialization
variables <- c("A", "B", "C", "D")
df_max <- data.frame(ID = numeric(length(variables)), Group = variables)

for(column in variables){
  temp_id <- df %>% 
    filter(!(ID %in% df_max$ID)) %>% 
    arrange(desc(!!rlang::sym(column)), desc(Rating)) %>% 
    slice(1) %>% 
    select(ID) %>%
    as.numeric(ID)
  df_max[df_max$Group == column, "ID"] <- temp_id
}

您可以简化将所有最大值放入一行applydf[，2:5]，2，functionx df$ID[which.maxx]中的过程，但在重复ID的情况下，您仍然会遇到问题。我不理解您的问题。您说您想要每个组/列的最大值及其ID，并且每对都需要来自不同的行唯一ID。因此，对于列a，最大值为96，ID为123，但是对于列C，最大值再次是ID为123。什么决定了ID123应该分配给A还是C？你能澄清一下你想要什么吗？这将建议您首先按行获取最大组值，但在这种情况下，应将ID 23分配给C，因为这是该行的最大值。？我们希望最大化其值的总和，因此A+B+C+D将是每个ID的最大值。因此，我为该ID选择96作为A，因为它大于C-95的值。如果除第一行外的所有行的C值都很低，您是否仍愿意为第一行选择A而不是C？换句话说，在选择一组唯一ID时，您是否试图最大化值A-D之和？我不确定您的输出Roland，然后我重新阅读OP的问题，意识到我按错误的变量分组。我也喜欢你的答案+1

find_max <- function(gg){
    tibble(
        group=gg, 
        ID= df %>% select(all_of(c(gg,"Rating","ID"))) %>% 
        arrange_all(desc) %>% slice(1) %>% pull(ID))
}

c("A","B","C","D") %>% map_dfr(find_max)

# A tibble: 4 x 2
  group    ID
  <chr> <dbl>
1 A       123
2 B       721
3 C       123
4 D        21

# initialization
variables <- c("A", "B", "C", "D")
df_max <- data.frame(ID = numeric(length(variables)), Group = variables)

for(column in variables){
  temp_id <- df %>% 
    filter(!(ID %in% df_max$ID)) %>% 
    arrange(desc(!!rlang::sym(column)), desc(Rating)) %>% 
    slice(1) %>% 
    select(ID) %>%
    as.numeric(ID)
  df_max[df_max$Group == column, "ID"] <- temp_id
}

# > df_max
#
#    ID Group
# 1 123     A
# 2 721     B
# 3 142     C
# 4  21     D