R 查找最大颜色&;计数

R 查找最大颜色&;计数,r,dataframe,R,Dataframe,我有以下格式的矩阵: [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] "blue" "red" "blue" "blue" "blue" "red" "green" "blue" "blue" [2,] "green" "red" "blue" "blue" "blue" "red" "green" "blue" "blue" [3,] "yellow" "red" "blue" "

我有以下格式的矩阵:

     [,1]     [,2]  [,3]    [,4]   [,5]   [,6]  [,7]    [,8]   [,9]  
[1,] "blue"   "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[2,] "green"  "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[3,] "yellow" "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[4,] "red"    "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[5,] "blue"   "red" "green" "blue" "blue" "red" "green" "blue" "blue"
[6,] "green"  "red" "green" "blue" "blue" "red" "green" "blue" "blue"
 ...
如何快速计算每行的最大颜色和计数

例如,对于第1行,它将是“蓝色,6”。我是通过一个调用“table”的apply命令来实现的


但是,我的矩阵有190万行,因此需要的时间太长。如何将其矢量化?

矩阵的每个单元格有多少种不同的可能性?这和你的例子一样吗?如果是的话,像下面这样的事情可能会更快

dat <- structure(c("blue", "green", "yellow", "red", "blue", "green", 
    "red", "red", "red", "red", "red", "red", "red", "red", "blue", 
    "blue", "blue", "blue", "green", "green", "red", "blue", "blue", 
    "blue", "blue", "blue", "blue", "red", "blue", "blue", "blue", 
    "blue", "blue", "blue", "blue", "red", "red", "red", "red", "red", 
    "red", "blue", "green", "green", "green", "green", "green", "green", 
    "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", 
    "blue", "blue", "blue", "blue", "blue", "blue", "green"), .Dim = c(7L, 
    9L))

values <- c("blue", "red", "green", "yellow")
counts <- vapply(values, function(value) rowSums(dat == value), 
    numeric(nrow(dat))) # Thanks to @RichardScriven for the improvement :)
counts 
#      blue red green yellow
# [1,]    6   2     1      0
# [2,]    5   2     2      0
# [3,]    5   2     1      1
# [4,]    5   3     1      0
# [5,]    5   2     2      0
# [6,]    4   2     3      0
# [7,]    4   4     1      0

max.value.col <- max.col(counts)
max.value <- colnames(counts)[max.value.col]
max.counts <- counts[cbind(1:nrow(counts), max.value.col)]
paste(max.value, max.counts, sep = ", ")
# [1] "blue, 6" "blue, 5" "blue, 5" "blue, 5" "blue, 5" "blue, 4"

矩阵的每个单元格有多少种不同的可能性?这和你的例子一样吗?如果是的话,像下面这样的事情可能会更快

dat <- structure(c("blue", "green", "yellow", "red", "blue", "green", 
    "red", "red", "red", "red", "red", "red", "red", "red", "blue", 
    "blue", "blue", "blue", "green", "green", "red", "blue", "blue", 
    "blue", "blue", "blue", "blue", "red", "blue", "blue", "blue", 
    "blue", "blue", "blue", "blue", "red", "red", "red", "red", "red", 
    "red", "blue", "green", "green", "green", "green", "green", "green", 
    "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", 
    "blue", "blue", "blue", "blue", "blue", "blue", "green"), .Dim = c(7L, 
    9L))

values <- c("blue", "red", "green", "yellow")
counts <- vapply(values, function(value) rowSums(dat == value), 
    numeric(nrow(dat))) # Thanks to @RichardScriven for the improvement :)
counts 
#      blue red green yellow
# [1,]    6   2     1      0
# [2,]    5   2     2      0
# [3,]    5   2     1      1
# [4,]    5   3     1      0
# [5,]    5   2     2      0
# [6,]    4   2     3      0
# [7,]    4   4     1      0

max.value.col <- max.col(counts)
max.value <- colnames(counts)[max.value.col]
max.counts <- counts[cbind(1:nrow(counts), max.value.col)]
paste(max.value, max.counts, sep = ", ")
# [1] "blue, 6" "blue, 5" "blue, 5" "blue, 5" "blue, 5" "blue, 4"

我想这是一个实际的data.table解决方案。利用data.table的fast
.N
计算行频率

library(data.table)

flip <- data.table(t(mat))

tally <- lapply(names(flip), 
                function(x) {
                  setnames(flip[, .N, by=eval(x)][order(-N)][1,],
                           c('clr', 'N')) } )
do.call(rbind, tally)

#     clr N
# 1: blue 6
# 2: blue 5
# 3: blue 5
# 4: blue 5
# 5: blue 5
# 6: blue 4

我想这是一个实际的data.table解决方案。利用data.table的fast
.N
计算行频率

library(data.table)

flip <- data.table(t(mat))

tally <- lapply(names(flip), 
                function(x) {
                  setnames(flip[, .N, by=eval(x)][order(-N)][1,],
                           c('clr', 'N')) } )
do.call(rbind, tally)

#     clr N
# 1: blue 6
# 2: blue 5
# 3: blue 5
# 4: blue 5
# 5: blue 5
# 6: blue 4

你能显示你目前正在使用的代码作为比较吗?多长时间是“太长”?你需要多快完成?如果你不能回答这个问题,那么我不认为你可以说“太长”有多长。尽管有人发布了解决方案,大大加快了速度——代码通常在40秒左右的时间内运行。这个解决方案只需要大约一秒钟,这是完美的:-)。您可以显示当前用作比较的代码吗?多长时间是“太长”?你需要多快完成?如果你不能回答这个问题,那么我不认为你可以说“太长”有多长。尽管有人发布了解决方案,大大加快了速度——代码通常在40秒左右的时间内运行。这个解决方案只需要一秒钟,这是完美的:-)。
vapply(值、函数(值)行和(dat==value)、numeric(nrow(dat))
甚至可能比
sapply
@konvas更快。如果最大计数之间存在关联,
max.col
似乎可以任意选择其中一个。有没有找到所有最大值的
max.col
等价物?@RichardScriven说得好!这将大大提高速度。@Khashaa看一看
?max.col
。您可以调整
ties.method
参数,但只有三个选项可用-随机、第一个和最后一个。你到底想到了什么?只是如果它有
ties就好了。method=all
vapply(值,函数(值)行和(dat==value),numeric(nrow(dat))
甚至可能比
sapply
@konvas更快,如果最大计数之间有联系,
max.col
似乎任意选择其中一个。有没有找到所有最大值的
max.col
等价物?@RichardScriven说得好!这将大大提高速度。@Khashaa看一看
?max.col
。您可以调整
ties.method
参数,但只有三个选项可用-随机、第一个和最后一个。你到底在想什么?只是如果它有
ties.method=all
.eh就太好了,但是t()运算在大矩阵上很慢。没关系,但是t()运算在大型矩阵上很慢。不要介意