R 如何通过对序列号进行分组来创建id变量？_R_Database_Dplyr

R 如何通过对序列号进行分组来创建id变量？

r database

R 如何通过对序列号进行分组来创建id变量？,r,database,dplyr,R,Database,Dplyr,我想在此数据中添加ID变量。如果收据ID为序列号，则这些ID相同 CUST_NO_ID receipt_id dollar 12 29 20.84 12 30 20.21 12 86 24.50 12 87 20.68 12 108 25.79 12

我想在此数据中添加ID变量。如果收据ID为序列号，则这些ID相同

CUST_NO_ID  receipt_id      dollar
  12         29             20.84
  12         30             20.21
  12         86             24.50
  12         87             20.68
  12        108             25.79
  12        109             24.93
  12        125             20.63
  12        126              9.90
  19        193             69.48
  19        194             46.88

这是我想要的结果

CUST_NO_ID  receipt_id      dollar       ID
  12         29             20.84        1
  12         30             20.21        1
  12         86             24.50        2
  12         87             20.68        2
  12        108             25.79        3
  12        109             24.93        3
  12        110             24.93        3
  12        125             20.63        4
  12        126              9.90        4
  19        193             69.48        5
  19        194             46.88        6

就这样

id <- 1

for(row in 1:nrow(data)){
  if(row == 1){
    dif <- 1
  }else{
    dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
  }

  if(dif != 1){
    id = id + 1
  }

  data[row,'ID'] = id
}

id就是这样
id <- 1

for(row in 1:nrow(data)){
  if(row == 1){
    dif <- 1
  }else{
    dif <- data[row,'receipt_id'] - data[row-1,'receipt_id']
  }

  if(dif != 1){
    id = id + 1
  }

  data[row,'ID'] = id
}

id假设您的数据帧已按CUST\u NO\u id
和receipt\u id
排序，您可以在条件向量上使用cumsum
，其中TRUE表示应创建新id的位置：
df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))

df
#   CUST_NO_ID receipt_id dollar ID
#1          12         29  20.84  1
#2          12         30  20.21  1
#3          12         86  24.50  2
#4          12         87  20.68  2
#5          12        108  25.79  3
#6          12        109  24.93  3
#7          12        125  20.63  4
#8          12        126   9.90  4
#9          19        193  69.48  5
#10         19        194  46.88  5

假设您的数据帧已按CUST\u NO\u ID
和receipt\u ID
排序，您可以在条件向量上使用cumsum
，其中TRUE表示应创建新ID的位置：
df$ID = cumsum(c(T, diff(df$receipt_id) != 1 | diff(df$CUST_NO_ID) != 0)))

df
#   CUST_NO_ID receipt_id dollar ID
#1          12         29  20.84  1
#2          12         30  20.21  1
#3          12         86  24.50  2
#4          12         87  20.68  2
#5          12        108  25.79  3
#6          12        109  24.93  3
#7          12        125  20.63  4
#8          12        126   9.90  4
#9          19        193  69.48  5
#10         19        194  46.88  5

与@Psidom有类似的概念，但他用cumsum
击败了我。这里有一个dplyr
解决方案。如果您想按客户编号重新启动ids，则通过添加group\u by
可以增加灵活性
df %>% 
  mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))

与@Psidom有类似的概念，但他用cumsum
击败了我。这里有一个dplyr
解决方案。如果您想按客户编号重新启动ids，则通过添加group\u by
可以增加灵活性
df %>% 
  mutate(id = cumsum(c(TRUE, diff(receipt_id) != 1)))

我们可以使用data.table

library(data.table)
setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]

或者使用shift

setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]

我们可以使用data.table

library(data.table)
setDT(df)[, id := cumsum(c(TRUE, diff(receipt_id)!=1))]

或者使用shift

setDT(df)[, id := cumsum((receipt_id - shift(receipt_id, fill=receipt_id[1]))!=1)]