Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何仅对相同值的序列进行分组_R_Dplyr - Fatal编程技术网

R 如何仅对相同值的序列进行分组

R 如何仅对相同值的序列进行分组,r,dplyr,R,Dplyr,我在data.frame中有一列,它由相同值的序列组成。我想按此列对adata.frame进行分组,但对于我来说,如果后续行中没有相同的值,则不会将其分组。因此,使用如下数据: structure(list(var = c(0.753821034682915, 0.753821034682915, 0.846493156161159, 0.140008716611192, 0.140008716611192, 0.140008716611192, 0.140008716611192, 0.7

我在
data.frame
中有一列,它由相同值的序列组成。我想按此列对a
data.frame
进行分组,但对于我来说,如果后续行中没有相同的值,则不会将其分组。因此,使用如下数据:

structure(list(var = c(0.753821034682915, 0.753821034682915, 
0.846493156161159, 0.140008716611192, 0.140008716611192, 0.140008716611192, 
0.140008716611192, 0.753821034682915, 0.846493156161159, 0.770532198715955, 
0.846493156161159, 0.140008716611192, 0.770532198715955, 0.770532198715955, 
0.770532198715955, 0.846493156161159, 0.770532198715955, 0.846493156161159, 
0.770532198715955, 0.846493156161159)), class = "data.frame", row.names = c(NA, 
-20L))
我希望这些小组是:

structure(list(var = c(0.753821034682915, 0.753821034682915, 
0.846493156161159, 0.140008716611192, 0.140008716611192, 0.140008716611192, 
0.140008716611192, 0.753821034682915, 0.846493156161159, 0.770532198715955, 
0.846493156161159, 0.140008716611192, 0.770532198715955, 0.770532198715955, 
0.770532198715955, 0.846493156161159, 0.770532198715955, 0.846493156161159, 
0.770532198715955, 0.846493156161159), group = c(1, 1, 2, 3, 
3, 3, 3, 4, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 14)), class = "data.frame", row.names = c(NA, 
-20L))

然后我就可以使用
groupby(group)
。如何实现这一点?

如果您只想使用base R,则可以执行以下操作:

rep(seq_along(rle(df$var)$lengths), rle(df$var)$lengths)
[1]  1  1  2  3  3  3  3  4  5  6  7  8  9  9  9 10 11 12 13 14

但我更喜欢
data.table
解决方案:

如果您只想使用base R,您可以这样做:

rep(seq_along(rle(df$var)$lengths), rle(df$var)$lengths)
[1]  1  1  2  3  3  3  3  4  5  6  7  8  9  9  9 10 11 12 13 14

但我更喜欢
data.table
解决方案:

A
dplyr
选项

library(dplyr)
df %>% mutate(group = c(0, cumsum(diff(var) != 0)) + 1)
#         var group
#1  0.7538210     1
#2  0.7538210     1
#3  0.8464932     2
#4  0.1400087     3
#5  0.1400087     3
#6  0.1400087     3
#7  0.1400087     3
#8  0.7538210     4
#9  0.8464932     5
#10 0.7705322     6
#11 0.8464932     7
#12 0.1400087     8
#13 0.7705322     9
#14 0.7705322     9
#15 0.7705322     9
#16 0.8464932    10
#17 0.7705322    11
#18 0.8464932    12
#19 0.7705322    13
#20 0.8464932    14

样本数据
dfA
dplyr
选项

library(dplyr)
df %>% mutate(group = c(0, cumsum(diff(var) != 0)) + 1)
#         var group
#1  0.7538210     1
#2  0.7538210     1
#3  0.8464932     2
#4  0.1400087     3
#5  0.1400087     3
#6  0.1400087     3
#7  0.1400087     3
#8  0.7538210     4
#9  0.8464932     5
#10 0.7705322     6
#11 0.8464932     7
#12 0.1400087     8
#13 0.7705322     9
#14 0.7705322     9
#15 0.7705322     9
#16 0.8464932    10
#17 0.7705322    11
#18 0.8464932    12
#19 0.7705322    13
#20 0.8464932    14

样本数据
df
library(data.table);rleid(df$var)
将创建一系列ID,当
var
值更改时,这些ID会更改。更多信息请点击这里:
library(data.table);rleid(df$var)
将创建一系列ID,当
var
值更改时,这些ID会更改。这里的更多信息:
dplyr
根本不是必需的,因为cumsum()是基R:
df[[“组”]]
dplyr
根本不是必需的,因为cumsum()是基R:
df[[“组”]]