Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在不修改原始顺序的情况下计算出现次数_R_Dataframe - Fatal编程技术网

R 在不修改原始顺序的情况下计算出现次数

R 在不修改原始顺序的情况下计算出现次数,r,dataframe,R,Dataframe,我目前正在寻找一种简单的方法来计算发生的次数,而无需修改日期的顺序。我的数据框中有一列包含很多日期,我想计算一个日期出现的次数 假设我有以下清单: data[,1] 18/12/2015 18/12/2015 18/12/2015 01/01/2016 02/02/2016 02/02/2016 我可以使用函数table() 但结果如下所示: Var freq 01/01/2016 1 02/02/2016 2 18/12/2015 3 我不想要这个订单,

我目前正在寻找一种简单的方法来计算发生的次数,而无需修改日期的顺序。我的数据框中有一列包含很多日期,我想计算一个日期出现的次数

假设我有以下清单:

data[,1]
18/12/2015
18/12/2015
18/12/2015
01/01/2016
02/02/2016
02/02/2016
我可以使用函数
table()

但结果如下所示:

   Var       freq
01/01/2016    1
02/02/2016    2
18/12/2015    3
我不想要这个订单,我想保留上面显示的原始订单。我正在寻找一个可以取消函数排序的选项,但它似乎不存在。(函数
aggregate()
也一样)


有人有主意吗?

这里有两种选择

首先,我将创建一些数据:

> set.seed(123)
> x <- sample(LETTERS[1:5], 10, TRUE)
> x
 [1] "B" "D" "C" "E" "E" "A" "C" "E" "C" "C"
@akrun建议创建一个具有指定级别的因子,以获得所需的顺序:

> y <- factor(x, levels=unique(x))
> table(y)
y
B D C E A 
1 1 4 3 1 
感谢@lmo,更简洁的方法是:

> table(x)[unique(x)]
x
B D C E A 
1 1 4 3 1 

使用dplyr的另一个想法

library(dplyr)
unique(df %>% 
          group_by(Var) %>% 
          mutate(count = n()))

#Source: local data frame [3 x 2]
#Groups: V1 [3]

#          V1 count
#      (fctr) (int)
#1 18/12/2015     3
#2 01/01/2016     1
#3 02/02/2016     2
数据

dput(df)
structure(list(Var = structure(c(3L, 3L, 3L, 1L, 2L, 2L), .Label = c("01/01/2016", 
"02/02/2016", "18/12/2015"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-6L))
编辑

一个更简单的方法(由@lukeA指出)是

#您的数据

数据生成时间测试有点困难,因为并非所有答案都采用
数据表输入。以下是我所做的:

sotos <-function(testdat){
#library(dplyr)
return(count(testdat, V1,sort = TRUE))
}

simon <-function(testdat){
#require(data.table)
dt <- data.table( testdat )

return(dt[ , .N , by = V1 ])
}

mrip <-function(x){
return(table(x)[unique(x)])
}

# make a dataset
set.seed(42)
x<-sample(LETTERS[1:15],1e4,TRUE)
x2 <- data.table(x)
colnames(x2) <- 'V1'

library(microbenchmark)
microbenchmark(sotos(x2),simon(x2),mrip(x),times=10)

Unit: microseconds
      expr      min       lq      mean    median       uq      max neval
 sotos(x2) 2183.645 2256.855 2984.7473 2352.6430 2507.616 8629.209    10
 simon(x2)  770.417  780.338  831.5502  784.7845  846.021 1116.624    10
   mrip(x)  745.101  827.206  844.3107  850.4685  865.863  898.021    10
# compare the answers:
> mrip(x)
x
  N   O   E   M   J   H   L   C   K   G   D   B   I   F   A 
666 676 659 656 669 631 679 734 677 665 592 672 674 654 696 
    > t(simon(x2))
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
    V1 "N"   "O"   "E"   "M"   "J"   "H"   "L"   "C"   "K"   "G"   "D"   "B"  
    N  "666" "676" "659" "656" "669" "631" "679" "734" "677" "665" "592" "672"
       [,13] [,14] [,15]
    V1 "I"   "F"   "A"  
    N  "674" "654" "696"
    > t(sotos(x2))
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
    V1 "C"   "A"   "L"   "K"   "O"   "I"   "B"   "J"   "N"   "G"   "E"   "M"  
    n  "734" "696" "679" "677" "676" "674" "672" "669" "666" "665" "659" "656"
       [,13] [,14] [,15]
    V1 "F"   "H"   "D"  
    n  "654" "631" "592"

您可以转换为列中级别指定为唯一元素的因子,然后执行
table
我想您需要ave,它是split
库(data.table)替换函数的包装器;dt库(dplyr);yourdf%%>%groupby(V1)%%>%summary(c\u freq=n())用日期列的名称替换
Var
!!!或者,你的
data.frame
不是真正的
data
。我想你也可以使用
count(df,V1)
来进行
dplyr
操作
count
有一个
sort
参数,默认为
FALSE
。啊。。。对我在
table
中寻找
sort
参数,但没有找到参数,因此假设
count
也没有:)上一个版本中的
Var
是什么?是的,我尝试了这个方法,效果非常好。我以为我一开始就这样做了,但我想我失败了。@lmo更短更快:')@Frank-检查“生成数据集”的代码,我从
x
@Frank ooops构建
x2
,抱歉。我会修复并重新运行
dput(df)
structure(list(Var = structure(c(3L, 3L, 3L, 1L, 2L, 2L), .Label = c("01/01/2016", 
"02/02/2016", "18/12/2015"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-6L))
library(dplyr)
count(df, Var, sort = TRUE)

#Source: local data frame [3 x 2]

#         Var     n
#      (fctr) (int)
#1 18/12/2015     3
#2 02/02/2016     2
#3 01/01/2016     1
# Your data
data <- read.table(text="18/12/2015
18/12/2015
18/12/2015
01/01/2016
02/02/2016
02/02/2016")

require(data.table)
dt <- data.table( data )

#  Your data looks like this:
dt
#           V1
#1: 18/12/2015
#2: 18/12/2015
#3: 18/12/2015
#4: 01/01/2016
#5: 02/02/2016

#  The result is this:
dt[ , .N , by = V1 ]
#          V1 N
#1: 18/12/2015 3
#2: 01/01/2016 1
#3: 02/02/2016 2
sotos <-function(testdat){
#library(dplyr)
return(count(testdat, V1,sort = TRUE))
}

simon <-function(testdat){
#require(data.table)
dt <- data.table( testdat )

return(dt[ , .N , by = V1 ])
}

mrip <-function(x){
return(table(x)[unique(x)])
}

# make a dataset
set.seed(42)
x<-sample(LETTERS[1:15],1e4,TRUE)
x2 <- data.table(x)
colnames(x2) <- 'V1'

library(microbenchmark)
microbenchmark(sotos(x2),simon(x2),mrip(x),times=10)

Unit: microseconds
      expr      min       lq      mean    median       uq      max neval
 sotos(x2) 2183.645 2256.855 2984.7473 2352.6430 2507.616 8629.209    10
 simon(x2)  770.417  780.338  831.5502  784.7845  846.021 1116.624    10
   mrip(x)  745.101  827.206  844.3107  850.4685  865.863  898.021    10
# compare the answers:
> mrip(x)
x
  N   O   E   M   J   H   L   C   K   G   D   B   I   F   A 
666 676 659 656 669 631 679 734 677 665 592 672 674 654 696 
    > t(simon(x2))
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
    V1 "N"   "O"   "E"   "M"   "J"   "H"   "L"   "C"   "K"   "G"   "D"   "B"  
    N  "666" "676" "659" "656" "669" "631" "679" "734" "677" "665" "592" "672"
       [,13] [,14] [,15]
    V1 "I"   "F"   "A"  
    N  "674" "654" "696"
    > t(sotos(x2))
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
    V1 "C"   "A"   "L"   "K"   "O"   "I"   "B"   "J"   "N"   "G"   "E"   "M"  
    n  "734" "696" "679" "677" "676" "674" "672" "669" "666" "665" "659" "656"
       [,13] [,14] [,15]
    V1 "F"   "H"   "D"  
    n  "654" "631" "592"
Unit: microseconds
      expr      min       lq      mean   median       uq      max neval
 sotos(x2) 2533.274 2708.089 3067.2971 2804.391 2947.218 5598.176    10
 simon(x2)  500.154  518.286  621.3618  577.641  740.995  787.179    10
   mrip(x)  816.942  950.020 1065.2408  969.007 1282.887 1459.755    10