R 仅从字符串中提取数字
如何仅从以下数据帧中提取数字R 仅从字符串中提取数字,r,regex,split,extract,sapply,R,Regex,Split,Extract,Sapply,如何仅从以下数据帧中提取数字 last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago', 'Last run 22 days ago','1st up after 177 days','1st up after 364 days')%>% as.data.frame() 上次运行% as.data.frame() 所需输出为: 我的尝试是: ne
last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')%>%
as.data.frame()
上次运行%
as.data.frame()
所需输出为:
我的尝试是:
new_df<-sapply(str_split(last_run$last_run," run"|"after"),'[',2)%>%
as.data.frame()
new_df%
as.data.frame()
strsplit 它将解析
last\u run
,并返回一个列表,其中每个元素都是一个字符向量,其中的句子被拆分为单词
> strsplit(last_run, " ")
[[1]]
[1] "Last" "run" "15" "days" "ago"
[[2]]
[1] "1st" "up" "after" "126" "days"
[[3]]
[1] "Last" "run" "21" "days" "ago"
[[4]]
[1] "Last" "run" "22" "days" "ago"
[[5]]
[1] "1st" "up" "after" "177" "days"
[[6]]
[1] "1st" "up" "after" "364" "days"
如:数字 它将尝试将单词转换为数字,如果不可能,则返回
NA
> as.numeric(strsplit(last_run, " ")[[1]])
[1] NA NA 15 NA NA
省略 它将从向量中删除NA
na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
[1] 15
na.omit
返回一个列表,不带na的向量是列表的第一个元素(这就是为什么需要[[1]]
)
愚蠢的
sapply
对列表的每个元素应用一个函数并返回一个向量您可以借助regex。提取单词'run'
或'after'
后的数字。使用基本Rsub
:
as.numeric(sub('.*(run|after)\\s(\\d+).*', '\\2', last_run))
#[1] 15 126 21 22 177 364
使用stringr::str\u extract
:
as.numeric(stringr::str_extract(last_run, '(?<=(run|after)\\s)\\d+'))
as.numeric(stringr::str_extract(last_run)”(?您可以使用正则表达式提取值并将其添加到data.frame:
run = c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')
as.numeric(sub("(.* )([[:digit:]]+)( .*)", '\\2', run))
在base R或stringr::str_extract
中,将模式\\d+
放在边界标记\\b
之间,以避免捕捉像“1st”
这样的字符串
1.基本R
gsub(".*(\\b\\d+\\b).*", "\\1", last_run)
#[1] "15" "126" "21" "22" "177" "364"
as.integer(gsub(".*(\\b\\d+\\b).*", "\\1", last_run))
#[1] 15 126 21 22 177 364
2.包装stringr
stringr::str_extract(last_run, "\\b\\d+\\b")
#[1] "15" "126" "21" "22" "177" "364"
as.integer(stringr::str_extract(last_run, "\\b\\d+\\b"))
#[1] 15 126 21 22 177 364
回答你的问题了吗?与Ronak的回答类似,前瞻可以应用为.numeric(str_extract(last_run$,'\\d+(?=days))
gsub(".*(\\b\\d+\\b).*", "\\1", last_run)
#[1] "15" "126" "21" "22" "177" "364"
as.integer(gsub(".*(\\b\\d+\\b).*", "\\1", last_run))
#[1] 15 126 21 22 177 364
stringr::str_extract(last_run, "\\b\\d+\\b")
#[1] "15" "126" "21" "22" "177" "364"
as.integer(stringr::str_extract(last_run, "\\b\\d+\\b"))
#[1] 15 126 21 22 177 364