如何使用dplyr计算首次出现值的重复次数_R_Dplyr

如何使用dplyr计算首次出现值的重复次数

如何使用dplyr计算首次出现值的重复次数,r,dplyr,R,Dplyr,我有一个包含组的数据框架，基本上如下所示 DF <- data.frame(state = c(rep("A", 3), rep("B",2), rep("A",2))) DF state 1 A 2 A 3 A 4 B 5 B 6 A 7 A 本例中的结果为5。因此，建议（最好是）dplyr解决方案将不胜感激。您可以尝试： rle(as.character(DF$state))$lengths[1] [1] 3 在您的dp

我有一个包含组的数据框架，基本上如下所示

DF <- data.frame(state = c(rep("A", 3), rep("B",2), rep("A",2)))

DF
  state
1     A
2     A
3     A
4     B
5     B
6     A
7     A

本例中的结果为5。因此，建议（最好是）dplyr解决方案将不胜感激。

您可以尝试：

rle(as.character(DF$state))$lengths[1]
[1] 3

在您的

dplyr

链中：

DF %>% summarize(count_first = rle(as.character(state))$lengths[1])

#   count_first
# 1           3

或者过度使用管道，使用

dplyr

和

magrittr

：

library(dplyr)
library(magrittr)
DF %>% summarize(count_first = state %>%
                   as.character %>%
                   rle %$%
                   lengths %>%
                   first)

#   count_first
# 1           3

也适用于分组数据：

DF <- data.frame(group = c(rep(1,4),rep(2,3)),state = c(rep("A", 3), rep("B",2), rep("A",2)))

#   group state
# 1     1     A
# 2     1     A
# 3     1     A
# 4     1     B
# 5     2     B
# 6     2     A
# 7     2     A

DF %>% group_by(group) %>% summarize(count_first = rle(as.character(state))$lengths[1])

# # A tibble: 2 x 2
#    group count_first
#    <dbl>       <int>
#  1     1           3
#  2     2           1

DF%group\u by（group）%%>%summary（count\u first=rle（as.character（state））$length[1]）
##tibble:2x2
#先进行分组计数
#           
#  1     1           3
#  2     2           1

此处不需要

dplyr

，但您可以修改此示例以将其与

dplyr

一起使用。关键是功能

rle

state = c(rep("A", 3), rep("B",2), rep("A",2))

x = rle(state)
DF = data.frame(len = x$lengths, state = x$values)
DF

# get the longest run of consecutive "A"
max(DF[DF$state == "A",]$len)

好的，谢谢。这似乎有效。我现在需要研究如何将其应用于子组。假设我有一个分组变量

ID

，并希望每个ID值都有这个计数。您调用的

$state

只是一个向量，所以组处理不正确，只需使用

state

，这样

dplyr

就可以操作它的magicOK了，非常感谢！恐怕我无意中删除了您在这里提到的评论，但我想我是建议Df%>%group_by（ID）%%>%SUMMARY（r=rle（.$state）$LENGS[1]），

state = c(rep("A", 3), rep("B",2), rep("A",2))

x = rle(state)
DF = data.frame(len = x$lengths, state = x$values)
DF

# get the longest run of consecutive "A"
max(DF[DF$state == "A",]$len)