Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 用于在多个试验中删除重复项的循环_R_Loops_Duplicates - Fatal编程技术网

R 用于在多个试验中删除重复项的循环

R 用于在多个试验中删除重复项的循环,r,loops,duplicates,R,Loops,Duplicates,我有一个数据集(称为eyeData),在非常短的版本中如下所示: sNumber runningTrialNo wordTar 1 1 vital 1 1 raccoon 1 1 vital

我有一个数据集(称为eyeData),在非常短的版本中如下所示:

sNumber runningTrialNo  wordTar                             
1       1               vital       
1       1               raccoon                             
1       1               vital                               
1       1               accumulates                             
1       2               tornado                             
1       2               destroys                                
1       2               tornado                             
1       2               destroys                                
1       2               property                                
4       51              denounces                               
4       51              brings                              
4       51              illegible                               
4       51              frequently                              
4       51              brings                          
4       61              cerebrum
4       61              vital
4       61              knowledge
4       61              vital
4       61              cerebrum
我编写了一个循环,分别删除每个试验的wordTar列的所有重复项(相同的单词),因此数据如下所示:

   sNumber  runningTrialNo  wordTar                             
1           1               vital       
1           1               raccoon                         
1           1               accumulates                             
1           2               tornado                             
1           2               destroys                                
1           2               property                                
4           51              denounces                               
4           51              brings                              
4           51              illegible                               
4           51              frequently                  
4           61              cerebrum
4           61              vital
4           61              knowledge
4           61              cerebrum                        
代码如下:

for (sno in eyeData$sNumber) {
for(trial in eyeData$runningTrialNo) {
ss <- subset(eyeData, sNumber == sno & runningTrialNo == trial)
ss.s <- ss[!duplicated(ss$wordTar), ]
 }
}
for(eyeData$sNumber中的sno){
用于(eyeData$runningTrialNo中的试验){

ssFor循环在R中通常很慢。您通常希望这样做。有很多方法可以做到这一点,下面是一个使用库
dplyr
的示例:

library(dplyr)
eyeData %>% group_by(runningTrialNo) %>%
            distinct(wordTar)
这要快得多,我们可以通过使用
microbenchmark
看到,在这里,我们运行代码100次,看看需要多长时间:

library(microbenchmark)

microbenchmark(dplyr = eyeData %>% group_by(runningTrialNo) %>%
                   distinct(wordTar), 
               old = for (sno in eyeData$sNumber) {
                       for(trial in eyeData$runningTrialNo) {
                           ss <- subset(eyeData, sNumber == sno & runningTrialNo == trial)
                           ss.s <- ss[!duplicated(ss$wordTar), ]
                       }
                   })

Unit: milliseconds
  expr        min         lq       mean     median         uq       max neval
 dplyr   1.256438   1.287158   1.567518   1.495092   1.550579  12.29212   100
   old 102.203029 110.265423 112.664063 111.789698 113.166710 304.58312   100
库(微基准)
微基准(dplyr=eyeData%%>%groupby(runningTrialNo)%%>%
独特的(wordTar),
old=for(eyeData$sNumber中的sno){
用于(eyeData$runningTrialNo中的试验){

SS102分钟用于循环…这确实是很多。%>%在代码中指的是什么?粘贴我的代码?它是毫秒,所以102不是太糟糕,只要数据不太大。
%%>%
magrittr
中的链式运算符,它使用
dplyr
将以前的输出作为第一个参数t传递o下一个函数-请尝试阅读以了解更多详细信息