R 如何仅对相同值的序列进行分组
我在R 如何仅对相同值的序列进行分组,r,dplyr,R,Dplyr,我在data.frame中有一列,它由相同值的序列组成。我想按此列对adata.frame进行分组,但对于我来说,如果后续行中没有相同的值,则不会将其分组。因此,使用如下数据: structure(list(var = c(0.753821034682915, 0.753821034682915, 0.846493156161159, 0.140008716611192, 0.140008716611192, 0.140008716611192, 0.140008716611192, 0.7
data.frame
中有一列,它由相同值的序列组成。我想按此列对adata.frame
进行分组,但对于我来说,如果后续行中没有相同的值,则不会将其分组。因此,使用如下数据:
structure(list(var = c(0.753821034682915, 0.753821034682915,
0.846493156161159, 0.140008716611192, 0.140008716611192, 0.140008716611192,
0.140008716611192, 0.753821034682915, 0.846493156161159, 0.770532198715955,
0.846493156161159, 0.140008716611192, 0.770532198715955, 0.770532198715955,
0.770532198715955, 0.846493156161159, 0.770532198715955, 0.846493156161159,
0.770532198715955, 0.846493156161159)), class = "data.frame", row.names = c(NA,
-20L))
我希望这些小组是:
structure(list(var = c(0.753821034682915, 0.753821034682915,
0.846493156161159, 0.140008716611192, 0.140008716611192, 0.140008716611192,
0.140008716611192, 0.753821034682915, 0.846493156161159, 0.770532198715955,
0.846493156161159, 0.140008716611192, 0.770532198715955, 0.770532198715955,
0.770532198715955, 0.846493156161159, 0.770532198715955, 0.846493156161159,
0.770532198715955, 0.846493156161159), group = c(1, 1, 2, 3,
3, 3, 3, 4, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 14)), class = "data.frame", row.names = c(NA,
-20L))
然后我就可以使用
groupby(group)
。如何实现这一点?如果您只想使用base R,则可以执行以下操作:
rep(seq_along(rle(df$var)$lengths), rle(df$var)$lengths)
[1] 1 1 2 3 3 3 3 4 5 6 7 8 9 9 9 10 11 12 13 14
但我更喜欢
data.table
解决方案: 如果您只想使用base R,您可以这样做:
rep(seq_along(rle(df$var)$lengths), rle(df$var)$lengths)
[1] 1 1 2 3 3 3 3 4 5 6 7 8 9 9 9 10 11 12 13 14
但我更喜欢
data.table
解决方案: Adplyr
选项
library(dplyr)
df %>% mutate(group = c(0, cumsum(diff(var) != 0)) + 1)
# var group
#1 0.7538210 1
#2 0.7538210 1
#3 0.8464932 2
#4 0.1400087 3
#5 0.1400087 3
#6 0.1400087 3
#7 0.1400087 3
#8 0.7538210 4
#9 0.8464932 5
#10 0.7705322 6
#11 0.8464932 7
#12 0.1400087 8
#13 0.7705322 9
#14 0.7705322 9
#15 0.7705322 9
#16 0.8464932 10
#17 0.7705322 11
#18 0.8464932 12
#19 0.7705322 13
#20 0.8464932 14
样本数据
dfAdplyr
选项
library(dplyr)
df %>% mutate(group = c(0, cumsum(diff(var) != 0)) + 1)
# var group
#1 0.7538210 1
#2 0.7538210 1
#3 0.8464932 2
#4 0.1400087 3
#5 0.1400087 3
#6 0.1400087 3
#7 0.1400087 3
#8 0.7538210 4
#9 0.8464932 5
#10 0.7705322 6
#11 0.8464932 7
#12 0.1400087 8
#13 0.7705322 9
#14 0.7705322 9
#15 0.7705322 9
#16 0.8464932 10
#17 0.7705322 11
#18 0.8464932 12
#19 0.7705322 13
#20 0.8464932 14
样本数据
dflibrary(data.table);rleid(df$var)
将创建一系列ID,当var
值更改时,这些ID会更改。更多信息请点击这里:library(data.table);rleid(df$var)
将创建一系列ID,当var
值更改时,这些ID会更改。这里的更多信息:dplyr
根本不是必需的,因为cumsum()是基R:df[[“组”]]dplyr
根本不是必需的,因为cumsum()是基R:df[[“组”]]