Regex 高效编码-R正则表达式为每个匹配复制行_Regex_R_Reshape

Regex 高效编码-R正则表达式为每个匹配复制行

regex r

Regex 高效编码-R正则表达式为每个匹配复制行,regex,r,reshape,Regex,R,Reshape,我一直在根据一个包含自由文本的专栏讨论一些数据争论。我想从这个文本中识别一组特定的字符串，创建一个列来指定一个匹配项，然后如果一个特定字段中有多个字符串匹配项，则复制一行。我已经做到了这一点（为没有节日气氛的人道歉）： #示例数据帧要求（stringr） dats试试这个： library(quanteda) s <- "rudolph the red nosed reindeer" words <- strsplit(s, " ")[[1]] do.call(rbind, l

我一直在根据一个包含自由文本的专栏讨论一些数据争论。我想从这个文本中识别一组特定的字符串，创建一个列来指定一个匹配项，然后如果一个特定字段中有多个字符串匹配项，则复制一行。我已经做到了这一点（为没有节日气氛的人道歉）：

#示例数据帧
要求（stringr）
dats试试这个：
library(quanteda)

s <- "rudolph the red nosed reindeer"

words <- strsplit(s, " ")[[1]]
do.call(rbind, lapply(words, kwic, x = s))

从splitstackshape
包中尝试cSplit
：
library(splitstackshape)
dats$value <- lapply(str_extract_all(dats$text, reg.patt), toString)
cSplit(dats, 'value', direction="long")
# ID                           text    value
#  1:  1                        rudolph  rudolph
#  2:  2                    rudolph the  rudolph
#  3:  2                    rudolph the      the
#  4:  3                rudolph the red  rudolph
#  5:  3                rudolph the red      the
#  6:  3                rudolph the red      red
#  7:  4          rudolph the red nosed  rudolph
#  8:  4          rudolph the red nosed      the
#  9:  4          rudolph the red nosed      red
# 10:  4          rudolph the red nosed    nosed
# 11:  5 rudolph the red nosed reindeer  rudolph
# 12:  5 rudolph the red nosed reindeer      the
# 13:  5 rudolph the red nosed reindeer      red
# 14:  5 rudolph the red nosed reindeer    nosed
# 15:  5 rudolph the red nosed reindeer reindeer

库（splitstackshape）
dats美元价值
                        contextPre  keyword              contextPost
[text1, 1]                       [  rudolph ] the red nosed reindeer
[text1, 2]               rudolph [      the     ] red nosed reindeer
[text1, 3]           rudolph the [      red         ] nosed reindeer
[text1, 4]       rudolph the red [    nosed               ] reindeer
[text1, 5] rudolph the red nosed [ reindeer                       ] 

library(splitstackshape)
dats$value <- lapply(str_extract_all(dats$text, reg.patt), toString)
cSplit(dats, 'value', direction="long")
# ID                           text    value
#  1:  1                        rudolph  rudolph
#  2:  2                    rudolph the  rudolph
#  3:  2                    rudolph the      the
#  4:  3                rudolph the red  rudolph
#  5:  3                rudolph the red      the
#  6:  3                rudolph the red      red
#  7:  4          rudolph the red nosed  rudolph
#  8:  4          rudolph the red nosed      the
#  9:  4          rudolph the red nosed      red
# 10:  4          rudolph the red nosed    nosed
# 11:  5 rudolph the red nosed reindeer  rudolph
# 12:  5 rudolph the red nosed reindeer      the
# 13:  5 rudolph the red nosed reindeer      red
# 14:  5 rudolph the red nosed reindeer    nosed
# 15:  5 rudolph the red nosed reindeer reindeer