Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何有效地解析R中正则表达式的数字部分?_R_Regex - Fatal编程技术网

如何有效地解析R中正则表达式的数字部分?

如何有效地解析R中正则表达式的数字部分?,r,regex,R,Regex,抱歉,我从未学习过正则表达式,也许因此,我从未能够深入阅读R关于该主题的帮助文档 我有一个带有页码注释的输出列表,如下所示: val <- "Output 1: Page 1 of 1 \n Content content \f Output 2: Page 1 of 2 \n content content \f Page 2 of 2 content content" 这里有一个选择 library(stringr) read.table(text = str_c(

抱歉,我从未学习过正则表达式,也许因此,我从未能够深入阅读R关于该主题的帮助文档

我有一个带有页码注释的输出列表,如下所示:

val <- "Output 1: Page 1 of 1 \n
  Content content \f
  Output 2: Page 1 of 2 \n
  content content \f 
  Page 2 of 2 content content"
这里有一个选择

library(stringr)
read.table(text = str_c(str_extract_all(val,
   "(Page) (\\d+) (of) (\\d+)")[[1]], collapse='\n'), header = FALSE, 
   col.names = c('V1', 'page', 'V3', 'of'))[c('page', 'of')]
#  page of
#1    1  1
#2    1  2
#3    2  2
或另一个带有提取行和分隔行的选项


谢谢斯特林格看起来很有前途。
library(stringr)
read.table(text = str_c(str_extract_all(val,
   "(Page) (\\d+) (of) (\\d+)")[[1]], collapse='\n'), header = FALSE, 
   col.names = c('V1', 'page', 'V3', 'of'))[c('page', 'of')]
#  page of
#1    1  1
#2    1  2
#3    2  2
library(dplyr)
library(tidyr)
tibble(col1 = val) %>%
   separate_rows(col1, sep = "\\s*\n\\s*") %>%
   filter(str_detect(col1, 'Page')) %>%
   extract(col1, into = c("page", "of"), 
        ".*Page (\\d+) of (\\d+).*", convert = TRUE) 
# A tibble: 3 x 2
#   page    of
#  <int> <int>
#1     1     1
#2     1     2
#3     2     2