R 对于每行，返回最大值的列名_R

R 对于每行，返回最大值的列名

R 对于每行，返回最大值的列名,r,R,我有一份员工名册，我需要知道他们最常在哪个部门工作。将员工ID与部门名称制成表格很简单，但从频率表中返回部门名称（而不是花名册计数的数量）更为棘手。下面是一个简单的示例（列名=部门，行名=员工ID）使用您的数据的一个选项（为了将来的参考，请使用set.seed（）使用sample复制示例）： …其中方法可以是随机的第一个或最后一个如果恰好有两个等于最大值的列，这当然会导致问题。我不确定您在该实例中想要做什么，因为对于某些行，您将有多个结果。例如： DF <- data.frame(V1

我有一份员工名册，我需要知道他们最常在哪个部门工作。将员工ID与部门名称制成表格很简单，但从频率表中返回部门名称（而不是花名册计数的数量）更为棘手。下面是一个简单的示例（列名=部门，行名=员工ID）

使用您的数据的一个选项（为了将来的参考，请使用

set.seed（）

使用

sample

复制示例）：

…其中

方法可以是随机的第一个或最后一个
如果恰好有两个等于最大值的列，这当然会导致问题。我不确定您在该实例中想要做什么，因为对于某些行，您将有多个结果。例如：
DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
apply(DF,1,function(x) which(x==max(x)))

[[1]]
V2 V3 
 2  3 

[[2]]
V1 
 1 

[[3]]
V2 
 2 

DF如果您对data.table
解决方案感兴趣，这里有一个。这有点棘手，因为您更喜欢获取第一个最大值的id。如果你想要最后一个最大值，那就容易多了。然而，它并没有那么复杂，而且速度很快
这里我已经生成了您的维度数据（26746*18）
资料
标杆管理：
基于以上建议，以下数据表
解决方案对我来说非常有效：
library(data.table)

set.seed(45)
DT <- data.table(matrix(sample(10, 10^7, TRUE), ncol=10))

system.time(
  DT[, col_max := colnames(.SD)[max.col(.SD, ties.method = "first")]]
)
#>    user  system elapsed 
#>    0.15    0.06    0.21
DT[]
#>          V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 col_max
#>       1:  7  4  1  2  3  7  6  6  6   1      V1
#>       2:  4  6  9 10  6  2  7  7  1   3      V4
#>       3:  3  4  9  8  9  9  8  8  6   7      V3
#>       4:  4  8  8  9  7  5  9  2  7   1      V4
#>       5:  4  3  9 10  2  7  9  6  6   9      V4
#>      ---                                       
#>  999996:  4  6 10  5  4  7  3  8  2   8      V3
#>  999997:  8  7  6  6  3 10  2  3 10   1      V6
#>  999998:  2  3  2  7  4  7  5  2  7   3      V4
#>  999999:  8 10  3  2  3  4  5  1  1   4      V2
#> 1000000: 10  4  2  6  6  2  8  4  7   4      V1


如果我们需要@lwshang建议的最小值的列名，只需使用-.SD
：
DT[, col_min := colnames(.SD)[max.col(-.SD, ties.method = "first")]]

一种解决方案是将日期从宽改为长，将所有部门放在一列中，计数在另一列中，按雇主id（在本例中为行号）分组，然后用最大值过滤到部门。使用这种方法处理关系也有两种选择
library(tidyverse)

# sample data frame with a tie
df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))

# If you aren't worried about ties:  
df %>% 
  rownames_to_column('id') %>%  # creates an ID number
  gather(dept, cnt, V1:V3) %>% 
  group_by(id) %>% 
  slice(which.max(cnt)) 

# A tibble: 3 x 3
# Groups:   id [3]
  id    dept    cnt
  <chr> <chr> <dbl>
1 1     V3       9.
2 2     V1       8.
3 3     V2       5.


# If you're worried about keeping ties:
df %>% 
  rownames_to_column('id') %>%
  gather(dept, cnt, V1:V3) %>% 
  group_by(id) %>% 
  filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
  arrange(id)

# A tibble: 4 x 3
# Groups:   id [3]
  id    dept    cnt
  <chr> <chr> <dbl>
1 1     V3       9.
2 2     V1       8.
3 3     V2       5.
4 3     V3       5.


# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
df %>% 
  rownames_to_column('id') %>%
  gather(dept, cnt, V1:V3) %>% 
  group_by(id) %>% 
  mutate(dept_rank  = rank(-cnt, ties.method = "first")) %>% # or 'last'
  filter(dept_rank == 1) %>% 
  select(-dept_rank) 

# A tibble: 3 x 3
# Groups:   id [3]
  id    dept    cnt
  <chr> <chr> <dbl>
1 2     V1       8.
2 3     V2       5.
3 1     V3       9.

# if you wanted to keep the original wide data frame
df %>% 
  rownames_to_column('id') %>%
  left_join(
    df %>% 
      rownames_to_column('id') %>%
      gather(max_dept, max_cnt, V1:V3) %>% 
      group_by(id) %>% 
      slice(which.max(max_cnt)), 
    by = 'id'
  )

# A tibble: 3 x 6
  id       V1    V2    V3 max_dept max_cnt
  <chr> <dbl> <dbl> <dbl> <chr>      <dbl>
1 1        2.    7.    9. V3            9.
2 2        8.    3.    6. V1            8.
3 3        1.    5.    5. V2            5.

库（tidyverse）
#带领带的示例数据框
df%
行名到列（'id'）%>%
聚集（部门，cnt，V1:V3）%>%
分组依据（id）%>%
过滤器（cnt==max（cnt））%>%#top_n（cnt，n=1）也可以工作
安排（id）
#一个tibble:4x3
#组别:id[3]
id部门cnt
1 V3 9。
2 v18。
3 v25。
4 3 V3 5。
#如果您担心领带，但只想要某个部门，可以使用rank（）并选择“first”或“last”
df%>%
行名到列（'id'）%>%
聚集（部门，cnt，V1:V3）%>%
分组依据（id）%>%
变异（dept_rank=rank（-cnt，ties.method=“first”）%>%#或“last”
筛选（部门排名==1）%>%
选择（-dept\u rank）
#一个tibble:3x3
#组别:id[3]
id部门cnt
1 2 v18。
2 3 V2 5。
3 1 V3 9。
#如果要保留原始宽数据帧
df%>%
行名到列（'id'）%>%
左联合(
df%>%
行名到列（'id'）%>%
聚集（最大部门，最大控制，V1:V3）%>%
分组依据（id）%>%
切片（哪个.max（max_cnt）），
by='id'
)
#一个tibble:3x6
id V1 V2 V3最大部门最大值
1 1        2.    7.9V39。
2 2        8.    3.6.v18。
3 3        1.    5.5.v25。
一个简单的for
循环也很方便：
> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
> df
  V1 V2 V3
1  2  7  9
2  8  3  6
3  1  5  4
> df2<-data.frame()
> for (i in 1:nrow(df)){
+   df2[i,1]<-colnames(df[which.max(df[i,])])
+ }
> df2
  V1
1 V3
2 V1
3 V2

>测向
V1 V2 V3
1  2  7  9
2  8  3  6
3  1  5  4
>df2（1中的i:nrow（df））{
+df2[i，1]df2
V1
1 V3
2 V1
3 V2
Adplyr解决方案：
想法：

将行ID添加为列
重塑为长格式
筛选每个组中的最大值

代码：
DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
DF %>% 
  rownames_to_column() %>%
  gather(column, value, -rowname) %>%
  group_by(rowname) %>% 
  filter(rank(-value) == 1) 

结果:
# A tibble: 3 x 3
# Groups:   rowname [3]
  rowname column value
  <chr>   <chr>  <dbl>
1 2       V1         8
2 3       V2         5
3 1       V3         9

# A tibble: 6 x 4
# Groups:   rowname [3]
  rowname column value    rk
  <chr>   <chr>  <dbl> <dbl>
1 1       V3         9     1
2 1       V2         7     2
3 2       V1         8     1
4 2       V3         6     2
5 3       V2         5     1
6 3       V3         4     2

结果:
# A tibble: 3 x 3
# Groups:   rowname [3]
  rowname column value
  <chr>   <chr>  <dbl>
1 2       V1         8
2 3       V2         5
3 1       V3         9

# A tibble: 6 x 4
# Groups:   rowname [3]
  rowname column value    rk
  <chr>   <chr>  <dbl> <dbl>
1 1       V3         9     1
2 1       V2         7     2
3 2       V1         8     1
4 2       V3         6     2
5 3       V2         5     1
6 3       V3         4     2

#一个tible:6 x 4
#分组：行名称[3]
行名称列值rk
11v391
2 1 V2 7 2
3 2 V1 8 1
4 2 V3 6 2
5 3 V2 5 1
6 3 V3 4 2
以下是一个与data.table一起使用且更简单的答案。假设您的data.table名为yourDF
：
j1 <- max.col(yourDF[, .(V1, V2, V3, V4)], "first")
yourDF$newCol <- c("V1", "V2", "V3", "V4")[j1]

j1dplyr 1.0.0
中的一个选项可以是：
DF %>%
 rowwise() %>%
 mutate(row_max = names(.)[which.max(c_across(everything()))])

     V1    V2    V3 row_max
  <dbl> <dbl> <dbl> <chr>  
1     2     7     9 V3     
2     8     3     6 V1     
3     1     5     4 V2     

样本数据：
DF <- structure(list(V1 = c(2, 8, 1), V2 = c(7, 3, 5), V3 = c(9, 6, 
4)), class = "data.frame", row.names = c(NA, -3L))

DF这个很快：
with(DF, {
  names(DF)[(V1 > V2 & V1 > V3) * 1 + (V2 > V3 & V2 > V1) * 2 + (V3 > V1 & V3 > V2)*3]
})

您的实际数据有多大？@Arun>dim（test）[1]26746 18一个有趣的概括是，如果我有两个相等的列，我通常只选择第一列，那么每行的最大n个值的列名将是最大的。这些是边界情况，不会打乱我的统计分析。@dmvianna-使用which.max
就可以了。我假设保留了顺序，所以我可以用t创建一个新列他的向量将与员工ID正确对齐。正确吗？apply
在内部将数据框
转换为矩阵
。但是您可能看不到这些维度上的性能差异。@pankajkandal-假设不同的值，这个colnames（DF）[max.col（替换）DF，cbind（序列号）（nrow（DF）），max.col（DF，ties.method=“first”），-Inf），“first”）]
实际上，我不在乎它是第一个最大值还是最后一个最大值。我首先考虑的是简单性，但我相信data.table解决方案将来会派上用场，谢谢！我也有类似的要求，但希望列名称的每一行都具有最小值……我们似乎在R中没有min.col……您知道等效值是什么吗解决方案？你好@user1412。谢谢你提出的有趣的问题。我现在除了使用which.min
之外，没有其他任何想法，它看起来像：DT[，min:=colnames（.SD）[apply（.SD，1，which.min）]
或DT[，MIN2:=colnames（.SD）[which.min（.SD）]，by=1:nrow（DT）在上面的虚构数据上，这不考虑领带，只返回第一个最小值。也许会考虑问一个单独的问题。我也很好奇你会得到什么答案。得到最小列的技巧是把DATA帧的负数发送到Max。CL，比如：<代码> CalNeX（.SD）[Max .CL（-SD，Tyth.Frase=“第一”）]。。你能评论一下这种方法和sbha上面的答案之间的区别吗？我觉得它们差不多。如果我
DF = data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
DF %>% 
  rownames_to_column() %>%
  gather(column, value, -rowname) %>%
  group_by(rowname) %>% 
  filter(rank(-value) == 1) 

# A tibble: 3 x 3
# Groups:   rowname [3]
  rowname column value
  <chr>   <chr>  <dbl>
1 2       V1         8
2 3       V2         5
3 1       V3         9

DF %>% 
  rownames_to_column() %>%
  gather(column, value, -rowname) %>%
  group_by(rowname) %>% 
  mutate(rk = rank(-value)) %>%
  filter(rk <= 2) %>% 
  arrange(rowname, rk) 

# A tibble: 6 x 4
# Groups:   rowname [3]
  rowname column value    rk
  <chr>   <chr>  <dbl> <dbl>
1 1       V3         9     1
2 1       V2         7     2
3 2       V1         8     1
4 2       V3         6     2
5 3       V2         5     1
6 3       V3         4     2

j1 <- max.col(yourDF[, .(V1, V2, V3, V4)], "first")
yourDF$newCol <- c("V1", "V2", "V3", "V4")[j1]

DF %>%
 rowwise() %>%
 mutate(row_max = names(.)[which.max(c_across(everything()))])

     V1    V2    V3 row_max
  <dbl> <dbl> <dbl> <chr>  
1     2     7     9 V3     
2     8     3     6 V1     
3     1     5     4 V2     

DF %>%
    mutate(row_max = pmap(across(everything()), ~ names(c(...)[which.max(c(...))])))

DF <- structure(list(V1 = c(2, 8, 1), V2 = c(7, 3, 5), V3 = c(9, 6, 
4)), class = "data.frame", row.names = c(NA, -3L))

with(DF, {
  names(DF)[(V1 > V2 & V1 > V3) * 1 + (V2 > V3 & V2 > V1) * 2 + (V3 > V1 & V3 > V2)*3]
})