从R中的字符日期中删除分和秒
我有这个时间戳向量:从R中的字符日期中删除分和秒,r,R,我有这个时间戳向量: c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04", "01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36", "01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39", "01/09/2019 10:52:20") 我想去掉字符
c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
我想去掉字符向量中的分和秒,这样我就有了01/09/2019 9
和01/09/2019 10
最有效的方法是什么?这里有一个
datevec <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
format(as.POSIXct(datevec, format = "%d/%m/%Y %H:%M:%OS"), "%d/%m/%Y %H")
# Result
[1] "01/09/2019 09" "01/09/2019 09" "01/09/2019 09" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"
[7] "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"
datevec您想要的输出类是什么?这个怎么样:
v <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
strptime(v, "%m/%d/%Y %H")
v另一个:
dates <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
unlist(lapply(dates,function(x) strsplit(x,":")[[1]][1]))
这看起来不错
unlist(strsplit(mystring, split = ":", fixed=TRUE))[c(TRUE, FALSE,FALSE)]
(在来自的帮助下制作)
替代方案可能是
sapply(strsplit(mystring, split=':', fixed=TRUE), `[`, 1)
使用一些基准测试和Ronak最近的评论,fixed=TRUE使方法更快,我们看到方法four(上述方法)是最快的
mystring <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
microbenchmark(one = sapply(strsplit(mystring, split=':', fixed=TRUE), `[`, 1),
two = unlist(lapply(mystring,function(x) strsplit(x,":", fixed=TRUE)[[1]][1])),
three = strptime(mystring, "%m/%d/%Y %H"),
four = unlist(strsplit(mystring, split = ":", fixed=TRUE))[c(TRUE, FALSE,FALSE)],
five = format(as.POSIXct(mystring, format = "%d/%m/%Y %H:%M:%OS"), "%d/%m/%Y %H"),
six = gsub("(.*?):.*", "\\1", mystring),
seven = str_extract(mystring, ".+(?=:.+:)"),
times = 100000)
Unit: microseconds
expr min lq mean median uq max neval
one 42.792 49.471 85.63742 52.572 57.1310 669280.96 1e+05
two 64.637 70.618 114.16364 73.252 77.6840 582466.94 1e+05
three 129.456 134.771 156.82308 136.188 139.2030 339715.94 1e+05
four 12.860 15.641 22.75699 17.254 18.5440 305703.52 1e+05
five 482.888 505.647 633.15388 512.880 552.1155 551274.28 1e+05
six 37.889 43.121 52.79030 45.567 49.1880 32954.59 1e+05
seven 53.432 59.051 88.05015 62.326 69.9320 1180361.17 1e+05
mystring这里是另一个使用gsub的
通过()
和\\1
捕获模式要引用捕获的组,需要?
使正则表达式变懒,因为有多个:
gsub("(.*?):.*", "\\1", dates)
您还可以使用stru extract
fromstringr
:
date_strings <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
str_extract(date_strings, ".+(?=:.+:)")
[1] "01/09/2019 9" "01/09/2019 9" "01/09/2019 9" "01/09/2019 10"
[5] "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"
[9] "01/09/2019 10" "01/09/2019 10"
date\u字符串是的。我认为fixed=TRUE
使它更快。非常有趣,实际上4,fixed=TRUE是最快的,改变了基准来显示这一点。有趣的公认答案,你对效率的定义是什么?我对tidyverse
软件包有很大的偏见:)啊,我明白了,我不怪你:)下次你可能应该把它放在问题里。。。
date_strings <- c("01/09/2019 9:51:03", "01/09/2019 9:51:39", "01/09/2019 9:57:04",
"01/09/2019 10:01:41", "01/09/2019 10:06:06", "01/09/2019 10:09:36",
"01/09/2019 10:11:55", "01/09/2019 10:21:15", "01/09/2019 10:21:39",
"01/09/2019 10:52:20")
str_extract(date_strings, ".+(?=:.+:)")
[1] "01/09/2019 9" "01/09/2019 9" "01/09/2019 9" "01/09/2019 10"
[5] "01/09/2019 10" "01/09/2019 10" "01/09/2019 10" "01/09/2019 10"
[9] "01/09/2019 10" "01/09/2019 10"