Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R tidyr regex:从字符列中提取有序数字_R_Regex_Tidyr_Regex Lookarounds - Fatal编程技术网

R tidyr regex:从字符列中提取有序数字

R tidyr regex:从字符列中提取有序数字,r,regex,tidyr,regex-lookarounds,R,Regex,Tidyr,Regex Lookarounds,假设我有一个这样的数据帧 df <- data.frame(x=c("This script outputs 10 visualizations.", "This script outputs 1 visualization.", "This script outputs 5 data files.", "This script outputs 1 data

假设我有一个这样的数据帧

df <- data.frame(x=c("This script outputs 10 visualizations.", 
                     "This script outputs 1 visualization.", 
                     "This script outputs 5 data files.", 
                     "This script outputs 1 data file.", 
                     "This script doesn't output any visualizations or data files", 
                     "This script outputs 9 visualizations and 28 data files.", 
                     "This script outputs 1 visualization and 1 data file."))
                                                            x
1                      This script outputs 10 visualizations.
2                        This script outputs 1 visualization.
3                           This script outputs 5 data files.
4                            This script outputs 1 data file.
5 This script doesn't output any visualizations or data files
6     This script outputs 9 visualizations and 28 data files.
7        This script outputs 1 visualization and 1 data file.
    viz   files
1    10       0
2     1       0
3     0       5
4     0       1
5     0       0
6     9      28
7     1       1
是否有一种简单的方法,可以使用
Tidyverse
提取每行的可视化数量和文件数量?当没有可视化(或没有数据文件,或两者都没有)时,我希望提取
0
。基本上,我希望最终结果是这样的

df <- data.frame(x=c("This script outputs 10 visualizations.", 
                     "This script outputs 1 visualization.", 
                     "This script outputs 5 data files.", 
                     "This script outputs 1 data file.", 
                     "This script doesn't output any visualizations or data files", 
                     "This script outputs 9 visualizations and 28 data files.", 
                     "This script outputs 1 visualization and 1 data file."))
                                                            x
1                      This script outputs 10 visualizations.
2                        This script outputs 1 visualization.
3                           This script outputs 5 data files.
4                            This script outputs 1 data file.
5 This script doesn't output any visualizations or data files
6     This script outputs 9 visualizations and 28 data files.
7        This script outputs 1 visualization and 1 data file.
    viz   files
1    10       0
2     1       0
3     0       5
4     0       1
5     0       0
6     9      28
7     1       1
我试过用像这样的东西

str_extract(df$x, "(?<=This script outputs )(.*)(?= visualizatio(n\\.$|ns\\.$))")

str_extract(df$x,”(?我们可以在
str_extract
中使用regex lookaround将一个或多个数字(
\\d+
)后跟空格和“vis”或“data files”提取到两列中

library(dplyr)
library(stringr)
df %>% 
  transmute(viz = as.numeric(str_extract(x, "\\d+(?= vis)")),
            files = as.numeric(str_extract(x, "\\d+(?= data files?)"))) %>%
  mutate_all(replace_na, 0)
#  viz files
#1  10     0
#2   1     0
#3   0     5
#4   0     0
#5   0     0
#6   9    28
#7   1     0
在第一种情况下,模式匹配一个或多个数字(
\\d+
),后跟regex lookaround(
(?=
),其中有一个空格后跟“vis”字,在第二列中,它提取后跟空格的数字和单词“file”或“files”

基本R方法

df$viz <- as.numeric(sub(".*This script outputs (\\d+).*", "\\1", df$x))
df$files <- as.numeric(sub(".*(\\d+) data file.*", "\\1", df$x))
df[is.na(df)] <- 0

df
#                                                             x viz files
# 1                      This script outputs 10 visualizations.  10     0
# 2                        This script outputs 1 visualization.   1     0
# 3                           This script outputs 5 data files.   5     5
# 4                            This script outputs 1 data file.   1     1
# 5 This script doesn't output any visualizations or data files   0     0
# 6     This script outputs 9 visualizations and 28 data files.   9    28
# 7        This script outputs 1 visualization and 1 data file.   1     1
df$viz您可以使用包unglue获得可读的解决方案,因为您有有限的可能模式,然后将NAs替换为0:

library(脱胶)
模式2 1 0
#> 3   0     5
#> 4   0     1
#> 5   0     0
#> 6   9    28
#> 7   1     1

请您修改它,使其既能在
数据文件
上工作,又能在
数据文件
上工作。哦,我确认我只是用
数据文件
替换了
数据文件
我对regex知之甚少,您介意写一小段解释符号在做什么吗?@Euler\u Salter。我添加了
s?
,以确保即使最后没有
s
,它也会处理。Suree,每次关闭都会更新!我希望NA为零,但除此之外,这看起来不错