R 在短语前拉出数字
我正在努力使用正则表达式,因此任何见解都会有所帮助。我有这样一份清单:R 在短语前拉出数字,r,string,extract,R,String,Extract,我正在努力使用正则表达式,因此任何见解都会有所帮助。我有这样一份清单: [1] "collected 1 hr total. wind >15 mph." "collected 4 hr total. wind ~15 mph." [3] "collected 10 hr total. gusts 5-10 mph." "collected 1 hr total. breeze at 1mph," [5] "collected 2 hrs." [6] 我想: [1]
[1] "collected 1 hr total. wind >15 mph." "collected 4 hr total.
wind ~15 mph."
[3] "collected 10 hr total. gusts 5-10 mph." "collected 1 hr total.
breeze at 1mph,"
[5] "collected 2 hrs." [6]
我想:
[1] > 15 mph
[2] ~15 mph
[3] 5-10 mph
[4] 1mph
[5]
[6]
我想计算每一排的风速。你能推荐正确的正则表达式吗?如你所见,
a) 数字和“mph”之间可以有不同数量的空格
b) mph之前的数字可以有不同的符号“>”,“假设每个字符串最多只有一个匹配项,那么我们可以尝试使用
sapply
和sub
:
input <- c("collected 1 hr total. wind >15 mph.",
"collected 4 hr total. wind ~15 mph.",
"collected 10 hr total. gusts 5-10 mph.",
"collected 1 hr total. breeze at 1mph,",
"collected 2 hrs.")
matches <- sapply(input, function(x) {
ifelse(grepl("[>~0-9-]+\\s*mph", x),
sub(".*?([>~0-9-]+\\s*mph).*", "\\1", x),
"")})
names(matches) <- c(1:length(matches))
matches
1 2 3 4 5
">15 mph" "~15 mph" "5-10 mph" "1mph" ""
input带有stru-extract的一个选项
library(stringr)
trimws(str_extract(v1, "[>~]?[0-9- ]+mph"))
#[1] ">15 mph" "~15 mph" "5-10 mph" "1mph" NA
数据
v1可复制示例。您可以使用dput()
。
v1 <- c("collected 1 hr total. wind >15 mph.",
"collected 4 hr total. wind ~15 mph.",
"collected 10 hr total. gusts 5-10 mph.",
"collected 1 hr total. breeze at 1mph,",
"collected 2 hrs.")