R 如何将字符串中的所有数字提取为向量
有没有办法将字符串中的所有数字提取为向量?我有一个不遵循任何特定模式的大型数据集,因此使用R 如何将字符串中的所有数字提取为向量,r,regex,extract,R,Regex,Extract,有没有办法将字符串中的所有数字提取为向量?我有一个不遵循任何特定模式的大型数据集,因此使用extract+regex模式不一定能提取所有数字。例如,对于下面显示的每一行数据帧: c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY", "$4000", "3.3% 1S
extract
+regex
模式不一定能提取所有数字。例如,对于下面显示的每一行数据帧:
c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY",
"$4000", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE",
"3.2 - $100000")
[1] "3.2% 1ST $100000 AND 1.1% BALANCE"
[2] "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY"
[3] "$4000"
[4] "3.3% 1ST $100000 AND 1.2% BALANCE"
[5] "3.3% 1ST $100000 AND 1.2% BALANCE"
[6] "3.2 - $100000"
我想要一个输出,比如:
[1] "3.2 100000 1.1"
[2] "3.3 100000 1.2 3000"
[3] "4000"
[4] "3.3 100000 1.2 "
[5] "3.3 100000 1.2 "
[6] "3.2 100000 "
我查看了参考资料,发现了以下链接:
上面的函数似乎可以工作,但它不能同时对所有类型的数字执行此任务。我知道
“[[:digit:][]+”
只查找整数,但我们如何更改它以使其涵盖所有类型的数字?我们需要在匹配模式中添加
sapply(regmatches(x, gregexpr("\\b[[:digit:].]+\\b", x)), paste, collapse= ' ')
#[1] "3.2 100000 1.1"
#[2] "3.3 100000 1.2 3000"
#[3] "4000"
#[4] "3.3 100000 1.2"
#[5] "3.3 100000 1.2"
#[6] "3.2 100000"
Akrun answer是完美的,但只是添加了另一个解决方案,使用一个包来创建我最近发现的正则表达式模式
library(stringr)
library(rebus)
library(magrittr)
pattern = one_or_more(DIGIT) %R% optional(DOT) %R% optional(one_or_more(DIGIT))
str_remove(x, "1ST") %>%
str_match_all( pattern = pattern) %>%
lapply( function(x) paste(as.vector(x), collapse = " ")) %>%
unlist()
您可以使用负前瞻正则表达式:
stringr::str_extract_all(x, '\\d+(\\.\\d+)?(?![A-Z])')
#[[1]]
#[1] "3.2" "100000" "1.1"
#[[2]]
#[1] "3.3" "100000" "1.2" "3000"
#[[3]]
#[1] "4000"
#[[4]]
#[1] "3.3" "100000" "1.2"
#[[5]]
#[1] "3.3" "100000" "1.2"
#[[6]]
#[1] "3.2" "100000"
如果希望输出为一个字符串:
sapply(stringr::str_extract_all(x, '\\d+(\\.\\d+)?(?![A-Z])'), paste, collapse = ' ')
#[1] "3.2 100000 1.1" "3.3 100000 1.2 3000" "4000"
#[4] "3.3 100000 1.2" "3.3 100000 1.2" "3.2 100000"
谢谢@akrun,但它会在1中抽取1。我只是在寻找纯洁的爱情numbers@Roozbeh_you抱歉,已用word边界更新。你能检查一下吗?谢谢,约翰。你的回答也是正确的;但是,我要补充的一点是,并非所有字符串都有1ST,例如,有些字符串有1ST。这可能会导致使用
str remove
时出现问题,对吗?
sapply(stringr::str_extract_all(x, '\\d+(\\.\\d+)?(?![A-Z])'), paste, collapse = ' ')
#[1] "3.2 100000 1.1" "3.3 100000 1.2 3000" "4000"
#[4] "3.3 100000 1.2" "3.3 100000 1.2" "3.2 100000"