R 在不修改原始顺序的情况下计算出现次数
我目前正在寻找一种简单的方法来计算发生的次数,而无需修改日期的顺序。我的数据框中有一列包含很多日期,我想计算一个日期出现的次数 假设我有以下清单:R 在不修改原始顺序的情况下计算出现次数,r,dataframe,R,Dataframe,我目前正在寻找一种简单的方法来计算发生的次数,而无需修改日期的顺序。我的数据框中有一列包含很多日期,我想计算一个日期出现的次数 假设我有以下清单: data[,1] 18/12/2015 18/12/2015 18/12/2015 01/01/2016 02/02/2016 02/02/2016 我可以使用函数table() 但结果如下所示: Var freq 01/01/2016 1 02/02/2016 2 18/12/2015 3 我不想要这个订单,
data[,1]
18/12/2015
18/12/2015
18/12/2015
01/01/2016
02/02/2016
02/02/2016
我可以使用函数table()
但结果如下所示:
Var freq
01/01/2016 1
02/02/2016 2
18/12/2015 3
我不想要这个订单,我想保留上面显示的原始订单。我正在寻找一个可以取消函数排序的选项,但它似乎不存在。(函数aggregate()
也一样)
有人有主意吗?这里有两种选择
首先,我将创建一些数据:
> set.seed(123)
> x <- sample(LETTERS[1:5], 10, TRUE)
> x
[1] "B" "D" "C" "E" "E" "A" "C" "E" "C" "C"
@akrun建议创建一个具有指定级别的因子,以获得所需的顺序:
> y <- factor(x, levels=unique(x))
> table(y)
y
B D C E A
1 1 4 3 1
感谢@lmo,更简洁的方法是:
> table(x)[unique(x)]
x
B D C E A
1 1 4 3 1
使用dplyr的另一个想法
library(dplyr)
unique(df %>%
group_by(Var) %>%
mutate(count = n()))
#Source: local data frame [3 x 2]
#Groups: V1 [3]
# V1 count
# (fctr) (int)
#1 18/12/2015 3
#2 01/01/2016 1
#3 02/02/2016 2
数据
dput(df)
structure(list(Var = structure(c(3L, 3L, 3L, 1L, 2L, 2L), .Label = c("01/01/2016",
"02/02/2016", "18/12/2015"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))
编辑
一个更简单的方法(由@lukeA指出)是
#您的数据
数据生成时间测试有点困难,因为并非所有答案都采用数据表输入。以下是我所做的:
sotos <-function(testdat){
#library(dplyr)
return(count(testdat, V1,sort = TRUE))
}
simon <-function(testdat){
#require(data.table)
dt <- data.table( testdat )
return(dt[ , .N , by = V1 ])
}
mrip <-function(x){
return(table(x)[unique(x)])
}
# make a dataset
set.seed(42)
x<-sample(LETTERS[1:15],1e4,TRUE)
x2 <- data.table(x)
colnames(x2) <- 'V1'
library(microbenchmark)
microbenchmark(sotos(x2),simon(x2),mrip(x),times=10)
Unit: microseconds
expr min lq mean median uq max neval
sotos(x2) 2183.645 2256.855 2984.7473 2352.6430 2507.616 8629.209 10
simon(x2) 770.417 780.338 831.5502 784.7845 846.021 1116.624 10
mrip(x) 745.101 827.206 844.3107 850.4685 865.863 898.021 10
# compare the answers:
> mrip(x)
x
N O E M J H L C K G D B I F A
666 676 659 656 669 631 679 734 677 665 592 672 674 654 696
> t(simon(x2))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
V1 "N" "O" "E" "M" "J" "H" "L" "C" "K" "G" "D" "B"
N "666" "676" "659" "656" "669" "631" "679" "734" "677" "665" "592" "672"
[,13] [,14] [,15]
V1 "I" "F" "A"
N "674" "654" "696"
> t(sotos(x2))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
V1 "C" "A" "L" "K" "O" "I" "B" "J" "N" "G" "E" "M"
n "734" "696" "679" "677" "676" "674" "672" "669" "666" "665" "659" "656"
[,13] [,14] [,15]
V1 "F" "H" "D"
n "654" "631" "592"
您可以转换为列中级别指定为唯一元素的因子,然后执行table
我想您需要ave,它是split库(data.table)替换函数的包装器;dt库(dplyr);yourdf%%>%groupby(V1)%%>%summary(c\u freq=n())用日期列的名称替换Var
!!!或者,你的data.frame
不是真正的data
。我想你也可以使用count(df,V1)
来进行dplyr
操作count
有一个sort
参数,默认为FALSE
。啊。。。对我在table
中寻找sort
参数,但没有找到参数,因此假设count
也没有:)上一个版本中的Var
是什么?是的,我尝试了这个方法,效果非常好。我以为我一开始就这样做了,但我想我失败了。@lmo更短更快:')@Frank-检查“生成数据集”的代码,我从x
@Frank ooops构建x2
,抱歉。我会修复并重新运行
dput(df)
structure(list(Var = structure(c(3L, 3L, 3L, 1L, 2L, 2L), .Label = c("01/01/2016",
"02/02/2016", "18/12/2015"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))
library(dplyr)
count(df, Var, sort = TRUE)
#Source: local data frame [3 x 2]
# Var n
# (fctr) (int)
#1 18/12/2015 3
#2 02/02/2016 2
#3 01/01/2016 1
# Your data
data <- read.table(text="18/12/2015
18/12/2015
18/12/2015
01/01/2016
02/02/2016
02/02/2016")
require(data.table)
dt <- data.table( data )
# Your data looks like this:
dt
# V1
#1: 18/12/2015
#2: 18/12/2015
#3: 18/12/2015
#4: 01/01/2016
#5: 02/02/2016
# The result is this:
dt[ , .N , by = V1 ]
# V1 N
#1: 18/12/2015 3
#2: 01/01/2016 1
#3: 02/02/2016 2
sotos <-function(testdat){
#library(dplyr)
return(count(testdat, V1,sort = TRUE))
}
simon <-function(testdat){
#require(data.table)
dt <- data.table( testdat )
return(dt[ , .N , by = V1 ])
}
mrip <-function(x){
return(table(x)[unique(x)])
}
# make a dataset
set.seed(42)
x<-sample(LETTERS[1:15],1e4,TRUE)
x2 <- data.table(x)
colnames(x2) <- 'V1'
library(microbenchmark)
microbenchmark(sotos(x2),simon(x2),mrip(x),times=10)
Unit: microseconds
expr min lq mean median uq max neval
sotos(x2) 2183.645 2256.855 2984.7473 2352.6430 2507.616 8629.209 10
simon(x2) 770.417 780.338 831.5502 784.7845 846.021 1116.624 10
mrip(x) 745.101 827.206 844.3107 850.4685 865.863 898.021 10
# compare the answers:
> mrip(x)
x
N O E M J H L C K G D B I F A
666 676 659 656 669 631 679 734 677 665 592 672 674 654 696
> t(simon(x2))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
V1 "N" "O" "E" "M" "J" "H" "L" "C" "K" "G" "D" "B"
N "666" "676" "659" "656" "669" "631" "679" "734" "677" "665" "592" "672"
[,13] [,14] [,15]
V1 "I" "F" "A"
N "674" "654" "696"
> t(sotos(x2))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
V1 "C" "A" "L" "K" "O" "I" "B" "J" "N" "G" "E" "M"
n "734" "696" "679" "677" "676" "674" "672" "669" "666" "665" "659" "656"
[,13] [,14] [,15]
V1 "F" "H" "D"
n "654" "631" "592"
Unit: microseconds
expr min lq mean median uq max neval
sotos(x2) 2533.274 2708.089 3067.2971 2804.391 2947.218 5598.176 10
simon(x2) 500.154 518.286 621.3618 577.641 740.995 787.179 10
mrip(x) 816.942 950.020 1065.2408 969.007 1282.887 1459.755 10