如何在R中分组时创建列组_R

如何在R中分组时创建列组

如何在R中分组时创建列组,r,R,我使用的是R，我想创建一个列来显示序列或排名，同时按两个因素（hhid和period）进行分组例如，我有以下数据集： hhid perid 1000 1 1000 1 1000 1 1000 2 1000 2 2000 1 2000 1 2000 1 2000 1 2000 2 2000 2 我想添加一个名为“actno”的列，如下所示： hhid perid actno 1000 1 1 1000 1 2 1000 1 3 1000 2 1 100

我使用的是R，我想创建一个列来显示序列或排名，同时按两个因素（hhid和period）进行分组

例如，我有以下数据集：

我想添加一个名为“actno”的列，如下所示：

hhid perid actno
1000 1     1
1000 1     2
1000 1     3
1000 2     1
1000 2     2
2000 1     1
2000 1     2
2000 1     3
2000 1     4
2000 2     1
2000 2     2

plyr

软件包可以很好地做到这一点：

library(plyr)
dat <- structure(list(hhid = c(1000L, 1000L, 1000L, 1000L, 1000L, 2000L, 
2000L, 2000L, 2000L, 2000L, 2000L), perid = c(1L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("hhid", "perid"), class = "data.frame", row.names = c(NA, 
-11L))

ddply(dat, .(hhid, perid), transform, actno=seq_along(perid))

   hhid perid actno
1  1000     1     1
2  1000     1     2
3  1000     1     3
4  1000     2     1
5  1000     2     2
6  2000     1     1
7  2000     1     2
8  2000     1     3
9  2000     1     4
10 2000     2     1
11 2000     2     2

库（plyr）
dat如果您的数据被调用为urdat
，那么在没有plyr
的情况下，您可以执行以下操作：
df <- urdat[order(urdat$hhid, urdat$perid),]
df$actno <- sequence(rle(df$perid)$lengths)

df不需要plyr。只需使用ave
和seq
：
> dat$actno <- with( dat, ave(hhid, hhid, perid, FUN=seq))
> dat
   hhid perid actno
1  1000     1     1
2  1000     1     2
3  1000     1     3
4  1000     2     1
5  1000     2     2
6  2000     1     1
7  2000     1     2
8  2000     1     3
9  2000     1     4
10 2000     2     1
11 2000     2     2

>dat$actno dat
hhid perid actno
1  1000     1     1
2  1000     1     2
3  1000     1     3
4  1000     2     1
5  1000     2     2
6  2000     1     1
7  2000     1     2
8  2000     1     3
9  2000     1     4
10 2000     2     1
11 2000     2     2

本例中的第一个参数可以是column，也可以使用稍微不那么优雅但可能更清楚的：
dat$actno <- with( dat, ave(hhid, hhid, perid, FUN=function(x) seq(length(x) ) ) )

dat$actno如果您有很多组或大数据，data.table
是提高时间和内存效率的方法
# assuming your data is in a data.frame called DF
library(data.table)
DT <- data.table(DF)


DT[, ActNo := seq_len(.N), by = list(hhid,perid)]

#假设您的数据位于名为DF的data.frame中
库（数据表）
DT伪码：
For each unique value of `hhid` `h`
    For each unique value of `perid` `p`
        counter = 0;
        For each row of table where `hhid==h && perid==p`
            counter++;
            Assign counter to `actno` of this column

实现起来应该很简单，尤其是使用。
非常感谢您，贾斯汀。。。它适用于我的数据集，但由于有大量的组，这需要很长时间，在运行代码后，我的计算机速度明显减慢。你有什么建议吗？@user1663986plyr
是一种很好的方法，只要数据很小，就可以浏览数据。其他任何一个答案，特别是DWin的答案都会非常快，并且在大数据上运行良好。@user1663986那么你是如何理解mnel的答案的？有没有一种快速的方法来处理data.table中的关系？