如何将计数器分配给由因子组合定义的data.frame的特定子集？_R_Indexing_Combinations_Tapply

如何将计数器分配给由因子组合定义的data.frame的特定子集？

r indexing

如何将计数器分配给由因子组合定义的data.frame的特定子集？,r,indexing,combinations,tapply,R,Indexing,Combinations,Tapply,我的问题是：我有一个包含一些因子变量的数据框架。我现在想给这个数据框分配一个新的向量，它为这些因子变量的每个子集创建一个索引 data <-data.frame(fac1=factor(rep(1:2,5)), fac2=sample(letters[1:3],10,rep=T)) 我想要的是一个组合计数器，它计算每个因子组合的发生率。像这样 fac1 fac2 counter 1 1 a 1 2 2

我的问题是：我有一个包含一些因子变量的数据框架。我现在想给这个数据框分配一个新的向量，它为这些因子变量的每个子集创建一个索引

   data <-data.frame(fac1=factor(rep(1:2,5)), fac2=sample(letters[1:3],10,rep=T))

我想要的是一个组合计数器，它计算每个因子组合的发生率。像这样

        fac1 fac2  counter
     1     1    a        1
     2     2    c        1
     3     1    b        1
     4     2    a        1
     5     1    c        1
     6     2    b        1
     7     1    a        2
     8     2    a        2
     9     1    b        2
     10    1    a        3

到目前为止，我考虑过使用tapply获得所有因子组合的计数器，这很好

counter <-tapply(data$fac1, list(data$fac1,data$fac2), function(x) 1:length(x))

计数器这是一种避免（显式）循环的基本R方式
data$counter这是用于ave（）
函数的作业：
# Use set.seed for reproducible examples 
#   when random number generation is involved
set.seed(1) 
myDF <- data.frame(fac1 = factor(rep(1:2, 7)), 
                   fac2 = sample(letters[1:3], 14, replace = TRUE), 
                   stringsAsFactors=FALSE)
myDF$counter <- ave(myDF$fac2, myDF$fac1, myDF$fac2, FUN = seq_along)
myDF
#    fac1 fac2 counter
# 1     1    a       1
# 2     2    b       1
# 3     1    b       1
# 4     2    c       1
# 5     1    a       2
# 6     2    c       2
# 7     1    c       1
# 8     2    b       2
# 9     1    b       2
# 10    2    a       1
# 11    1    a       3
# 12    2    a       2
# 13    1    c       2
# 14    2    b       3

#使用set.seed可复制示例
#当涉及随机数生成时
种子（1）
myDF这里是一个带有一点循环的变量（我已经将您的变量重命名为“x”，因为“data”正被其他方式使用）：
xA data.table解决方案
library(data.table)
DT <- data.table(data)
DT[, counter := seq_len(.N), by = list(fac1, fac2)]

库（data.table）
DT是否需要按顺序排列，还是只需要净计数？如果您只需要计数，表（粘贴（数据$fac1，数据$fac2，sep=“-”）可能会有所帮助。您好！在每个fac1 x fac2组合中，顺序很重要。（人们可以将其视为一个人“fac1”看到字母“fac2”）您可以使用相同的基本策略，但从plyr的tapply
切换到ddply
，或者如果您的数据庞大且性能成问题，data.table。在效率方面，可能与比较的mrdwab和我的解决方案重复（无法让@mplourde工作）并且mrdwab的速度是原来的两倍。对于1000000行，它是1.693秒，而不是3.382秒
# Use set.seed for reproducible examples 
#   when random number generation is involved
set.seed(1) 
myDF <- data.frame(fac1 = factor(rep(1:2, 7)), 
                   fac2 = sample(letters[1:3], 14, replace = TRUE), 
                   stringsAsFactors=FALSE)
myDF$counter <- ave(myDF$fac2, myDF$fac1, myDF$fac2, FUN = seq_along)
myDF
#    fac1 fac2 counter
# 1     1    a       1
# 2     2    b       1
# 3     1    b       1
# 4     2    c       1
# 5     1    a       2
# 6     2    c       2
# 7     1    c       1
# 8     2    b       2
# 9     1    b       2
# 10    2    a       1
# 11    1    a       3
# 12    2    a       2
# 13    1    c       2
# 14    2    b       3

x <-data.frame(fac1=rep(1:2,5), fac2=sample(letters[1:3],10,rep=T))
x$fac3 <- paste( x$fac1, x$fac2, sep="" )
x$ctr <- 1
y <- table( x$fac3 )
for( i in 1 : length( rownames( y ) ) )
  x$ctr[x$fac3 == rownames(y)[i]] <- 1:length( x$ctr[x$fac3 == rownames(y)[i]] )
x <- x[-3]

library(data.table)
DT <- data.table(data)
DT[, counter := seq_len(.N), by = list(fac1, fac2)]