如何有效地解析R中正则表达式的数字部分？_R_Regex

如何有效地解析R中正则表达式的数字部分？

r regex

如何有效地解析R中正则表达式的数字部分？,r,regex,R,Regex,抱歉，我从未学习过正则表达式，也许因此，我从未能够深入阅读R关于该主题的帮助文档我有一个带有页码注释的输出列表，如下所示： val <- "Output 1: Page 1 of 1 \n Content content \f Output 2: Page 1 of 2 \n content content \f Page 2 of 2 content content" 这里有一个选择 library(stringr) read.table(text = str_c(

抱歉，我从未学习过正则表达式，也许因此，我从未能够深入阅读R关于该主题的帮助文档

我有一个带有页码注释的输出列表，如下所示：

val <- "Output 1: Page 1 of 1 \n
  Content content \f
  Output 2: Page 1 of 2 \n
  content content \f 
  Page 2 of 2 content content"

这里有一个选择

library(stringr)
read.table(text = str_c(str_extract_all(val,
   "(Page) (\\d+) (of) (\\d+)")[[1]], collapse='\n'), header = FALSE, 
   col.names = c('V1', 'page', 'V3', 'of'))[c('page', 'of')]
#  page of
#1    1  1
#2    1  2
#3    2  2

或另一个带有提取行和分隔行的选项

谢谢斯特林格看起来很有前途。

library(stringr)
read.table(text = str_c(str_extract_all(val,
   "(Page) (\\d+) (of) (\\d+)")[[1]], collapse='\n'), header = FALSE, 
   col.names = c('V1', 'page', 'V3', 'of'))[c('page', 'of')]
#  page of
#1    1  1
#2    1  2
#3    2  2

library(dplyr)
library(tidyr)
tibble(col1 = val) %>%
   separate_rows(col1, sep = "\\s*\n\\s*") %>%
   filter(str_detect(col1, 'Page')) %>%
   extract(col1, into = c("page", "of"), 
        ".*Page (\\d+) of (\\d+).*", convert = TRUE) 
# A tibble: 3 x 2
#   page    of
#  <int> <int>
#1     1     1
#2     1     2
#3     2     2