R 基于字符串的分区行
我希望在下面的数据框中创建一个新列,该列取决于某些字符串——在本例中为“下一节”R 基于字符串的分区行,r,stringr,R,Stringr,我希望在下面的数据框中创建一个新列,该列取决于某些字符串——在本例中为“下一节” 库(tidyverse) 种子集(123) df1我们可以使用lag从上一行获取text,并使用cumsum增加计数,每当我们观察到当前行的'section',以及前一行的'next'中的每一篇文章时 library(dplyr) final_df %>% group_by(article) %>% mutate(temp = lag(cumsum(text == 'section' &
库(tidyverse)
种子集(123)
df1我们可以使用lag
从上一行获取text
,并使用cumsum
增加计数,每当我们观察到当前行的'section'
,以及前一行的'next'
中的每一篇文章时
library(dplyr)
final_df %>%
group_by(article) %>%
mutate(temp = lag(cumsum(text == 'section' & lag(text) == 'next'),
default = 0) + 1)
# text article label
# <chr> <chr> <dbl>
# 1 cantaloupe df1 1
# 2 quince df1 1
# 3 kiwi fruit df1 1
# 4 next df1 1
# 5 section df1 1
# 6 cantaloupe df1 2
# 7 date df1 2
# 8 rambutan df2 1
# 9 passionfruit df2 1
#10 next df2 1
#11 section df2 1
#12 rock melon df2 2
#13 blood orange df3 1
#14 guava df3 1
#15 next df3 1
#16 section df3 1
#17 strawberry df3 2
#18 cherimoya df3 2
如果需要以该形式输出,可以将1、2替换为'first'
,'second'
final_df %>%
mutate(label = c("first","first","first","first","first", "second", "second",
"first","first","first","first","second",
"first","first","first","first","second","second"))
# A tibble: 18 x 3
text article label
<chr> <chr> <chr>
1 cantaloupe df1 first
2 quince df1 first
3 kiwi fruit df1 first
4 next df1 first
5 section df1 first
6 cantaloupe df1 second
7 date df1 second
8 rambutan df2 first
9 passionfruit df2 first
10 next df2 first
11 section df2 first
12 rock melon df2 second
13 blood orange df3 first
14 guava df3 first
15 next df3 first
16 section df3 first
17 strawberry df3 second
18 cherimoya df3 second
library(dplyr)
final_df %>%
group_by(article) %>%
mutate(temp = lag(cumsum(text == 'section' & lag(text) == 'next'),
default = 0) + 1)
# text article label
# <chr> <chr> <dbl>
# 1 cantaloupe df1 1
# 2 quince df1 1
# 3 kiwi fruit df1 1
# 4 next df1 1
# 5 section df1 1
# 6 cantaloupe df1 2
# 7 date df1 2
# 8 rambutan df2 1
# 9 passionfruit df2 1
#10 next df2 1
#11 section df2 1
#12 rock melon df2 2
#13 blood orange df3 1
#14 guava df3 1
#15 next df3 1
#16 section df3 1
#17 strawberry df3 2
#18 cherimoya df3 2
library(data.table)
setDT(final_df)[, label := shift(cumsum(text == 'section' &
shift(text) == 'next'), fill = 0) + 1, article]