Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何将字符串中的所有数字提取为向量_R_Regex_Extract - Fatal编程技术网

R 如何将字符串中的所有数字提取为向量

R 如何将字符串中的所有数字提取为向量,r,regex,extract,R,Regex,Extract,有没有办法将字符串中的所有数字提取为向量?我有一个不遵循任何特定模式的大型数据集,因此使用extract+regex模式不一定能提取所有数字。例如,对于下面显示的每一行数据帧: c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY", "$4000", "3.3% 1S

有没有办法将字符串中的所有数字提取为向量?我有一个不遵循任何特定模式的大型数据集,因此使用
extract
+
regex
模式不一定能提取所有数字。例如,对于下面显示的每一行数据帧:

c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY", 
"$4000", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE", 
"3.2 - $100000")

[1] "3.2% 1ST $100000 AND 1.1% BALANCE"                                
[2] "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY"
[3] "$4000"                                                            
[4] "3.3% 1ST $100000 AND 1.2% BALANCE"                                
[5] "3.3% 1ST $100000 AND 1.2% BALANCE"                                
[6] "3.2 - $100000"   
我想要一个输出,比如:

[1] "3.2 100000 1.1"                                
[2] "3.3 100000 1.2 3000"
[3] "4000"                                                            
[4] "3.3 100000 1.2 "                                
[5] "3.3 100000 1.2 "                                
[6] "3.2 100000 "   
我查看了参考资料,发现了以下链接:


上面的函数似乎可以工作,但它不能同时对所有类型的数字执行此任务。我知道
“[[:digit:][]+”
只查找整数,但我们如何更改它以使其涵盖所有类型的数字?

我们需要在匹配模式中添加

sapply(regmatches(x, gregexpr("\\b[[:digit:].]+\\b", x)), paste, collapse= ' ')
#[1] "3.2 100000 1.1"    
#[2] "3.3 100000 1.2 3000" 
#[3] "4000"              
#[4] "3.3 100000 1.2"   
#[5] "3.3 100000 1.2"     
#[6] "3.2 100000"   

Akrun answer是完美的,但只是添加了另一个解决方案,使用一个包来创建我最近发现的正则表达式模式

library(stringr)
library(rebus)
library(magrittr)

pattern = one_or_more(DIGIT) %R% optional(DOT) %R% optional(one_or_more(DIGIT))

str_remove(x, "1ST") %>% 
str_match_all( pattern = pattern) %>% 
  lapply( function(x) paste(as.vector(x), collapse = " ")) %>% 
  unlist()


您可以使用负前瞻正则表达式:

stringr::str_extract_all(x, '\\d+(\\.\\d+)?(?![A-Z])')

#[[1]]
#[1] "3.2"    "100000" "1.1"   

#[[2]]
#[1] "3.3"    "100000" "1.2"    "3000"  

#[[3]]
#[1] "4000"

#[[4]]
#[1] "3.3"    "100000" "1.2"   

#[[5]]
#[1] "3.3"    "100000" "1.2"   

#[[6]]
#[1] "3.2"    "100000"
如果希望输出为一个字符串:

sapply(stringr::str_extract_all(x, '\\d+(\\.\\d+)?(?![A-Z])'), paste, collapse = ' ')
#[1] "3.2 100000 1.1"      "3.3 100000 1.2 3000" "4000"               
#[4] "3.3 100000 1.2"      "3.3 100000 1.2"      "3.2 100000"  

谢谢@akrun,但它会在1中抽取1。我只是在寻找纯洁的爱情numbers@Roozbeh_you抱歉,已用word边界更新。你能检查一下吗?谢谢,约翰。你的回答也是正确的;但是,我要补充的一点是,并非所有字符串都有1ST,例如,有些字符串有1ST。这可能会导致使用
str remove
时出现问题,对吗?
sapply(stringr::str_extract_all(x, '\\d+(\\.\\d+)?(?![A-Z])'), paste, collapse = ' ')
#[1] "3.2 100000 1.1"      "3.3 100000 1.2 3000" "4000"               
#[4] "3.3 100000 1.2"      "3.3 100000 1.2"      "3.2 100000"