Regex 从字符串向量中提取数字_Regex_R

Regex 从字符串向量中提取数字

regex r

Regex 从字符串向量中提取数字,regex,r,Regex,R,我有这样的字符串： years<-c("20 years old", "1 years old") 我该怎么做呢？怎么样 # pattern is by finding a set of numbers in the start and capturing them as.numeric(gsub("([0-9]+).*$", "\\1", years)) 或或下面是Arun的第一个解决方案的替代方案，它使用了一个更简单的类似Perl的正则表达式： as.numeric(gsub(

我有这样的字符串：

years<-c("20 years old", "1 years old")

我该怎么做呢？

怎么样

# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))

或

下面是Arun的第一个解决方案的替代方案，它使用了一个更简单的类似Perl的正则表达式：

as.numeric(gsub("[^\\d]+", "", years, perl=TRUE))

你也可以去掉所有的字母：

as.numeric(gsub("[[:alpha:]]", "", years))

不过，这很可能不那么普遍。

我认为替代是获得解决方案的一种间接方式。如果要检索所有数字，我建议

gregexpr

：

matches <- regmatches(years, gregexpr("[[:digit:]]+", years))
as.numeric(unlist(matches))

匹配来自Gabor Grothendieck
年或简单地说：
as.numeric(gsub("\\D", "", years))
# [1] 20  1

更新
由于不推荐使用extract\u numeric
，我们可以使用readr
包中的parse\u number

library(readr)
parse_number(years)

这里是另一个带有extract\u numeric

library(tidyr)
extract_numeric(years)
#[1] 20  1

stringr
流水线解决方案：
library(stringr)
years %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric

从起始位置的任何字符串中提取数字
x <- gregexpr("^[0-9]+", years)  # Numbers with any number of digits
x2 <- as.numeric(unlist(regmatches(years, x)))

x <- gregexpr("[0-9]+", years)  # Numbers with any number of digits
x2 <- as.numeric(unlist(regmatches(years, x)))

x我们也可以从stringr

years<-c("20 years old", "1 years old")
as.integer(stringr::str_extract(years, "\\d+"))
#[1] 20  1

使用包unglue，我们可以执行以下操作：
#安装程序包（“unglue”）
图书馆（非蓝色）
年份[1]20 1

由（v0.3.0）于2019-11-06创建
更多信息：
为什么需要*
？如果您希望在开始时使用它们，为什么不使用^[[：digit:]+
？*
是必需的，因为您需要匹配整个字符串。没有这一点，任何东西都不会被移除。另外，请注意这里可以使用sub
而不是gsub
。如果数字不必在字符串的开头，请使用以下命令：gsub（“.*”（[0-9]+）*，“\\1”，years）
我想得到27。我不明白为什么，通过添加条件（例如添加一个转义“-”，结果会变长…gsub（.*（[0-9]+）.*？，“\\1”，“6月27-30日”）
result:[1]“2730”gsub（.*（[0-9]+）\-.*，“\\1”，“6月27-30日”）
result:[1]“6月27-30日”我没有料到，但是这个解决方案比其他任何一个都慢，慢了一个数量级。@MatthewLundberg thegregexpr
，regexpr
，或者两者都有？gregexpr
。我直到现在才尝试过regexpr
。巨大的差异。使用regexpr
将它置于Andrew和Arun的解决方案之间（第二快）在1e6集合上。可能也很有趣，在Andrew的解决方案中使用sub
并不能提高速度。这是基于小数点的拆分。例如，2.5变为c（'2'，'5'）。奇怪的是，Andrew的解决方案在我的机器上比这快了5倍。as.numeric（sub（\\D+，''，years））
。如果前后有字母，则gsub
适用于此应用程序，但请记住parse_number
不会处理负数。请尝试parse_number（–27633”）
@Nettle是的，这是正确的，如果有多个实例，它将不起作用。负数解析错误已被修复：readr:：parse_number（“-12345”）#[1]-12345
extract_numeric现在已被弃用，您将收到使用readr:：parse_number（）的警告@我确实在更新中指定了这一点，如果您注意到这一点，谢谢Joe，但是这个答案不会提取字符串中数字前的负号。
library(stringr)
years %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric

x <- gregexpr("^[0-9]+", years)  # Numbers with any number of digits
x2 <- as.numeric(unlist(regmatches(years, x)))

x <- gregexpr("[0-9]+", years)  # Numbers with any number of digits
x2 <- as.numeric(unlist(regmatches(years, x)))

years<-c("20 years old", "1 years old")
as.integer(stringr::str_extract(years, "\\d+"))
#[1] 20  1

years<-c("20 years old and 21", "1 years old")
stringr::str_extract(years, "\\d+")
#[1] "20"  "1"

stringr::str_extract_all(years, "\\d+")

#[[1]]
#[1] "20" "21"

#[[2]]
#[1] "1"