当R中分类变量的值发生变化时,如何开始新计数
mydataset的示例当R中分类变量的值发生变化时,如何开始新计数,r,data.table,tidyr,R,Data.table,Tidyr,mydataset的示例 datatrain=structure(list(DELT = c(10266L, 10266L, 10266L, 9635L, 9635L, 9635L, 10334L, 10334L, 10061L, 10061L, 10061L, 9512L, 9512L, 9512L, 10394L, 10394L, 9631L, 10376L, 10376L, 10376L, 10376L, 10046L, 9678L, 10332L, 103
datatrain=structure(list(DELT = c(10266L, 10266L, 10266L, 9635L, 9635L,
9635L, 10334L, 10334L, 10061L, 10061L, 10061L, 9512L, 9512L,
9512L, 10394L, 10394L, 9631L, 10376L, 10376L, 10376L, 10376L,
10046L, 9678L, 10332L, 10332L, 9985L, 9850L, 9850L, 10074L, 9746L,
9746L), EP_OBJECTID = c(86913544L, 86913544L, 86913544L, 86913544L,
86913544L, 86913544L, 86913544L, 86913544L, 86913544L, 86913544L,
86913544L, 86913544L, 86913544L, 86913544L, 86913544L, 86913544L,
86913544L, 86913544L, 86913544L, 86913544L, 90093693L, 90093693L,
90093693L, 90093693L, 90093693L, 90093693L, 90093693L, 90093693L,
90093693L, 90093693L, 90093693L), DELTDMR = c(0L, 0L, 0L, 8L,
8L, 8L, 0L, 0L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 65L, 65L,
65L, 65L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 3L)), class = "data.frame", row.names = c(NA,
-31L))
EP_OBJECTID是分类变量。这里只有两类
86913544
90093693
如何将序列号从类别86913544
的第一个值设置为新类别90093693
之前10000范围内的最后一个数字?
i、 e.结果
DELT EP_OBJECTID DELTDMR
10000 86913544 0
20000 86913544 0
30000 86913544 0
40000 86913544 8
50000 86913544 8
60000 86913544 8
70000 86913544 0
80000 86913544 0
90000 86913544 2
100000 86913544 2
110000 86913544 2
120000 86913544 0
130000 86913544 0
140000 86913544 0
150000 86913544 0
160000 86913544 0
170000 86913544 0
180000 86913544 65
190000 86913544 65
200000 86913544 65
然后如何删除category=86913544
(=65)的最后一个值,该值位于新类别的第一个值之前。在此示例中,新类别90093693
的值也为65,但新计数从类别90093693
的第二个值开始(=0)。也在10000范围内
即结果
DELT EP_OBJECTID DELTDMR
10000 86913544 0
20000 86913544 0
30000 86913544 0
40000 86913544 8
50000 86913544 8
60000 86913544 8
70000 86913544 0
80000 86913544 0
90000 86913544 2
100000 86913544 2
110000 86913544 2
120000 86913544 0
130000 86913544 0
140000 86913544 0
150000 86913544 0
160000 86913544 0
170000 86913544 0
180000 86913544 65
190000 86913544 65
200000 86913544 65
10000 90093693 0
20000 90093693 0
30000 90093693 0
40000 90093693 0
50000 90093693 0
60000 90093693 0
70000 90093693 0
80000 90093693 0
90000 90093693 3
100000 90093693 3
每个类别都有这个。
如何执行?删除
EP_OBJECTID
不等于EP_OBJECTID
先前值且DELTDMR
等于DELTDMR
先前值的行。为每个EP_OBJECTID
创建一个序列,从10000开始,步长为10000
library(dplyr)
datatrain %>%
filter(!(EP_OBJECTID != lag(EP_OBJECTID) & DELTDMR == lag(DELTDMR))) %>%
group_by(EP_OBJECTID) %>%
mutate(DELT = seq(10000, length.out = n(), by = 10000))
这将返回:
# DELT EP_OBJECTID DELTDMR
#1 10000 86913544 0
#2 20000 86913544 0
#3 30000 86913544 8
#4 40000 86913544 8
#5 50000 86913544 8
#6 60000 86913544 0
#7 70000 86913544 0
#8 80000 86913544 2
#9 90000 86913544 2
#10 100000 86913544 2
#11 110000 86913544 0
#12 120000 86913544 0
#13 130000 86913544 0
#14 140000 86913544 0
#15 150000 86913544 0
#16 160000 86913544 0
#17 170000 86913544 65
#18 180000 86913544 65
#19 190000 86913544 65
#20 10000 90093693 0
#21 20000 90093693 0
#22 30000 90093693 0
#23 40000 90093693 0
#24 50000 90093693 0
#25 60000 90093693 0
#26 70000 90093693 0
#27 80000 90093693 0
#28 90000 90093693 3
#29 100000 90093693 3