R 识别行中不同元素数量的有效方法_R_Performance_Loops_Vectorization

R 识别行中不同元素数量的有效方法

r performance loops

R 识别行中不同元素数量的有效方法,r,performance,loops,vectorization,R,Performance,Loops,Vectorization,我有以下数据集 library(dplyr) 然而，由于这种方法正在实现一个循环，所以速度很慢。你对如何加快这一进程有什么建议吗我的笔记本电脑是个垃圾，所以 sapply(as.data.frame(t(d)), function(x) n_distinct(x)) 这里有一些比OP的方法（包括其他文章中的方法）更快的选项（在我的机器上） system.time（{#@nicola函数你可以试试 system.time({ #@nicola's function d<-as.ma

我有以下数据集

library(dplyr)

然而，由于这种方法正在实现一个循环，所以速度很慢。你对如何加快这一进程有什么建议吗

我的笔记本电脑是个垃圾，所以

sapply(as.data.frame(t(d)), function(x) n_distinct(x))

这里有一些比OP的方法（包括其他文章中的方法）更快的选项（在我的机器上）

system.time（{#@nicola函数
你可以试试
system.time({ #@nicola's function
 d<-as.matrix(d)
 uniqueValues<-unique(as.vector(d))
 Reduce("+",lapply(uniqueValues,function(x) rowSums(d==x)>0))
})
#   user  system elapsed 
#  0.61    0.00    0.61 

system.time(colSums(apply(d, 1, function(i) !duplicated(i)))) #@Sotos function
#   user  system elapsed 
#  8.16    0.00    8.18 


system.time(apply(d, 1, function(x) sum(!duplicated(x))))
#  user  system elapsed 
#  8.19    0.01    8.25 



system.time(apply(d, 1, uniqueN)) #uniqueN from `data.table`
#   user  system elapsed 
#  15.59    0.03   15.74 


system.time(apply(d, 1, n_distinct)) #n_distinct from `dplyr`
#  user  system elapsed 
# 31.50    0.04   53.82 

system.time(sapply(as.data.frame(t(d)), function(x) n_distinct(x)))
#   user  system elapsed 
# 70.12    0.36   72.03 

如果不同的值不太多，您可以尝试：
system.time(colSums(apply(d, 1, function(i) !duplicated(i))))
#user  system elapsed 
#6.50    0.02    6.53 

dclose…非常接近：）我把你的也包括在混合中，因为不同系统的计时不同。现在更接近了。
system.time({ #@nicola's function
 d<-as.matrix(d)
 uniqueValues<-unique(as.vector(d))
 Reduce("+",lapply(uniqueValues,function(x) rowSums(d==x)>0))
})
#   user  system elapsed 
#  0.61    0.00    0.61 

system.time(colSums(apply(d, 1, function(i) !duplicated(i)))) #@Sotos function
#   user  system elapsed 
#  8.16    0.00    8.18 


system.time(apply(d, 1, function(x) sum(!duplicated(x))))
#  user  system elapsed 
#  8.19    0.01    8.25 



system.time(apply(d, 1, uniqueN)) #uniqueN from `data.table`
#   user  system elapsed 
#  15.59    0.03   15.74 


system.time(apply(d, 1, n_distinct)) #n_distinct from `dplyr`
#  user  system elapsed 
# 31.50    0.04   53.82 

system.time(sapply(as.data.frame(t(d)), function(x) n_distinct(x)))
#   user  system elapsed 
# 70.12    0.36   72.03 

system.time(colSums(apply(d, 1, function(i) !duplicated(i))))
#user  system elapsed 
#6.50    0.02    6.53 

d<-as.matrix(d)
uniqueValues<-unique(as.vector(d))
Reduce("+",lapply(uniqueValues,function(x) rowSums(d==x)>0))