R 基于data.table中的计数创建列_R_Data.table_Data Manipulation

R 基于data.table中的计数创建列

R 基于data.table中的计数创建列,r,data.table,data-manipulation,R,Data.table,Data Manipulation,我的数据如下： Col1 Col2 Col3 Col4 1 7000 73 6 1 7000 73 7 1 7000 73 8 1 7000 73 9 1 7000 73 10 1 7000 73 11 1 7000 73 12 1 4000 117 6 1

我的数据如下：

Col1  Col2   Col3  Col4 
1     7000     73     6  
1     7000     73     7   
1     7000     73     8   
1     7000     73     9   
1     7000     73    10   
1     7000     73    11   
1     7000     73    12   
1     4000    117     6 
1     4000    117     9

我想用Col1和Col2来计算这个数字。然后，根据计数创建5个新列。我知道如何计数，但如何根据计数创建5个新列

Col1  Col2   Count   NewCol1  NewCol2  NewCol3  NewCol4  NewCol5  
1     7000       7         6        7        8        9       10  
1     4000       2         6        9        NA      NA       NA

Col3实际上可以忽略

有一件事，计数范围从1到5。因此，如果计数>5，我不需要使用NewCol6、NewCol7等。我们创建一个带有“add_Count”的频率列，按“Col1”、“Col2”分组，然后创建一个序列命名列“nm1”，使用complete扩展缺失组合的数据，并使用pivot_加宽将其重塑为“wide”格式

数据

我们创建一个带有“add_count”的频率列，按“Col1”、“Col2”分组，然后创建一个序列命名列“nm1”，使用complete扩展缺失组合的数据，并使用pivot_wide将其重塑为“wide”格式

数据另一个data.table选项：

输出：

   Col1 Col2 V1 V2 V3 V4 V5
1:    1 7000  6  7  8  9 10
2:    1 4000  6  9 NA NA NA

数据：

另一个data.table选项：

输出：

   Col1 Col2 V1 V2 V3 V4 V5
1:    1 7000  6  7  8  9 10
2:    1 4000  6  9 NA NA NA

数据：

这是我随机创建的简单示例。在实际数据中，在相同的Col1和Col2下，将有5个以上。但是如果超过5，我不需要NewCol6…这是我随机创建的简单示例。在实际数据中，在相同的Col1和Col2下，将有5个以上。但是如果它大于5，我不需要NewCol6…我在我的数据上尝试了代码，但是它显示了NewCol6，NewCol7。。。。而且，它很耗时。uniquedt[，.N，.Col1，Col2]$N>>[1]123461058227112。最大值为22。否。Col3是无用的。我更新了示例，使其计数超过5。@PeterChen用数据更新了帖子。Table我在数据上尝试了代码，但显示了NewCol6、NewCol7等。。。。而且，它很耗时。uniquedt[，.N，.Col1，Col2]$N>>[1]123461058227112。最大值为22。否。Col3没用。我更新了示例，使其计数超过5。@彼得森用data.table更新了帖子

library(data.table)
dcast(setDT(df2)[, n  := .N, .(Col1, Col2)][,
   head(.SD, 5), .(Col1, Col2)], Col1 + Col2 + n ~  
  factor(paste0("NewCol", rowid(Col1, Col2)), 
       levels = paste0("NewCol", 1:5)), value.var = 'Col4')
#   Col1 Col2 n NewCol1 NewCol2 NewCol3 NewCol4 NewCol5
#1:    1 4000 2       6       9      NA      NA      NA
#2:    1 7000 7       6       7       8       9      10

df1 <- structure(list(Col1 = c(1L, 1L, 1L), Col2 = c(7000L, 7000L, 4000L
), Col3 = c(73L, 73L, 117L), Col4 = c(6L, 7L, 6L)), 
 class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(Col1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    Col2 = c(7000L, 7000L, 7000L, 7000L, 7000L, 7000L, 7000L, 
    4000L, 4000L), Col3 = c(73L, 73L, 73L, 73L, 73L, 73L, 73L, 
    117L, 117L), Col4 = c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 
    9L)), class = "data.frame", row.names = c(NA, -9L))

DT[, as.list(head(c(Col4, rep(NA_real_, 5L)), 5L)), .(Col1, Col2)]

   Col1 Col2 V1 V2 V3 V4 V5
1:    1 7000  6  7  8  9 10
2:    1 4000  6  9 NA NA NA

library(data.table)
DT <- fread("Col1  Col2   Col3  Col4 
1     7000     73     6  
1     7000     73     7   
1     7000     73     8   
1     7000     73     9   
1     7000     73    10   
1     7000     73    11   
1     7000     73    12   
1     4000    117     6 
1     4000    117     9")