R 基于字符串的分区行

R 基于字符串的分区行,r,stringr,R,Stringr,我希望在下面的数据框中创建一个新列,该列取决于某些字符串——在本例中为“下一节” 库(tidyverse) 种子集(123) df1我们可以使用lag从上一行获取text,并使用cumsum增加计数,每当我们观察到当前行的'section',以及前一行的'next'中的每一篇文章时 library(dplyr) final_df %>% group_by(article) %>% mutate(temp = lag(cumsum(text == 'section' &

我希望在下面的数据框中创建一个新列,该列取决于某些字符串——在本例中为“下一节”

库(tidyverse)
种子集(123)

df1我们可以使用
lag
从上一行获取
text
,并使用
cumsum
增加计数,每当我们观察到当前行的
'section'
,以及前一行的
'next'
中的每一篇
文章时

library(dplyr)

final_df %>%
  group_by(article) %>%
  mutate(temp = lag(cumsum(text == 'section' & lag(text) == 'next'),
                     default = 0) + 1)

#  text         article label
#   <chr>        <chr>   <dbl>
# 1 cantaloupe   df1         1
# 2 quince       df1         1
# 3 kiwi fruit   df1         1
# 4 next         df1         1
# 5 section      df1         1
# 6 cantaloupe   df1         2
# 7 date         df1         2
# 8 rambutan     df2         1
# 9 passionfruit df2         1
#10 next         df2         1
#11 section      df2         1
#12 rock melon   df2         2
#13 blood orange df3         1
#14 guava        df3         1
#15 next         df3         1
#16 section      df3         1
#17 strawberry   df3         2
#18 cherimoya    df3         2
如果需要以该形式输出,可以将1、2替换为
'first'
'second'

final_df %>% 
  mutate(label = c("first","first","first","first","first", "second", "second",
                   "first","first","first","first","second",
                   "first","first","first","first","second","second"))

# A tibble: 18 x 3
   text         article label 
   <chr>        <chr>   <chr> 
 1 cantaloupe   df1     first 
 2 quince       df1     first 
 3 kiwi fruit   df1     first 
 4 next         df1     first 
 5 section      df1     first 
 6 cantaloupe   df1     second
 7 date         df1     second
 8 rambutan     df2     first 
 9 passionfruit df2     first 
10 next         df2     first 
11 section      df2     first 
12 rock melon   df2     second
13 blood orange df3     first 
14 guava        df3     first 
15 next         df3     first 
16 section      df3     first 
17 strawberry   df3     second
18 cherimoya    df3     second
library(dplyr)

final_df %>%
  group_by(article) %>%
  mutate(temp = lag(cumsum(text == 'section' & lag(text) == 'next'),
                     default = 0) + 1)

#  text         article label
#   <chr>        <chr>   <dbl>
# 1 cantaloupe   df1         1
# 2 quince       df1         1
# 3 kiwi fruit   df1         1
# 4 next         df1         1
# 5 section      df1         1
# 6 cantaloupe   df1         2
# 7 date         df1         2
# 8 rambutan     df2         1
# 9 passionfruit df2         1
#10 next         df2         1
#11 section      df2         1
#12 rock melon   df2         2
#13 blood orange df3         1
#14 guava        df3         1
#15 next         df3         1
#16 section      df3         1
#17 strawberry   df3         2
#18 cherimoya    df3         2
library(data.table)
setDT(final_df)[, label := shift(cumsum(text == 'section' & 
                            shift(text) == 'next'), fill = 0) + 1, article]