R 用|分隔符连接字符向量
我有一个包含字符向量的数据结构(见下文)。它有点混乱,因为它来自json源代码 我需要组合/连接到一个大字符串,其中lat/long对由|分隔,lat/long值由逗号分隔,名称已删除 i、 e.“53.193418、-2881248 | 53.1905138631287、-2.89043889005541等” 我试过了R 用|分隔符连接字符向量,r,R,我有一个包含字符向量的数据结构(见下文)。它有点混乱,因为它来自json源代码 我需要组合/连接到一个大字符串,其中lat/long对由|分隔,lat/long值由逗号分隔,名称已删除 i、 e.“53.193418、-2881248 | 53.1905138631287、-2.89043889005541等” 我试过了 piped.data<-unname(paste(b, sep="|", collapse=",")) 你可以试试 paste(sapply(split(b,cums
piped.data<-unname(paste(b, sep="|", collapse=","))
你可以试试
paste(sapply(split(b,cumsum(grepl('latitude',names(b)))),
toString),collapse="|")
如果您不需要空间
paste(sapply(split(b,cumsum(grepl('latitude',names(b)))),
paste, collapse=","), collapse="|")
或者使用vapply
,这样会更快一些
paste(vapply(split(b,cumsum(grepl('latitude',names(b)))),
paste, collapse=",", character(1L)), collapse="|")
或
或
我会将您的“b”转换为2列矩阵
,并粘贴:
apply(matrix(b, ncol = 2, byrow = TRUE), 1, paste, collapse = "|")
# [1] "53.193418|-2.881248" "53.1905138631287|-2.89043889005541"
# [3] "53.186744|-2.890165" "53.189836|-2.893896"
# [5] "53.1884117|-2.88802" "53.1902965|-2.8919373"
# [7] "53.1940384|-2.8972299" "53.1934748|-2.8814698"
# [9] "53.1894004|-2.8886692" "53.1916771|-2.8846099"
编辑
我想我误解了你的问题
如果它是您想要的单个长字符串,首先用逗号分隔,然后用管道分隔,则需要粘贴两次:
paste(apply(matrix(b, ncol = 2, byrow = TRUE), 1, paste, collapse = ","),
collapse = "|")
你可以:
tmp <- apply(matrix(b, ncol = 2, byrow = TRUE), MARGIN = 1, FUN = paste, collapse = ",")
paste(tmp, collapse = "|")
# [1] "53.193418,-2.881248|53.1905138631287,-2.89043889005541|53.186744,-2.890165|53.189836,-2.893896|53.1884117,-2.88802|53.1902965,-2.8919373|53.1940384,-2.8972299|53.1934748,-2.8814698|53.1894004,-2.8886692|53.1916771,-2.8846099"
tmp另一个选项是
paste(tapply(b, gl(length(b)/2, 2), toString), collapse = "|")
# [1] "53.193418, -2.881248|53.1905138631287, -2.89043889005541|53.186744, -2.890165|53.189836,
# -2.893896|53.1884117, -2.88802|53.1902965, -2.8919373|53.1940384, -2.8972299|53.1934748,
# -2.8814698|53.1894004, -2.8886692|53.1916771, -2.8846099"
如果不希望逗号后面有空格,请执行以下操作
paste(tapply(b, gl(length(b)/2, 2), paste, collapse = ","), collapse = "|")
编辑:
所以@akrun和@SvenHohenstein能够矢量化他们的解决方案,下面是一些用于说明的基准
b <- rep(b, 1e3)
library(microbenchmark)
microbenchmark(
SH = paste(paste(b[c(TRUE, FALSE)], b[c(FALSE, TRUE)], sep = ","), collapse = "|"),
akrun1 = paste(c(rbind(b,rep(c(',','|'), length.out = length(b))))[-length(b)*2], collapse = ""),
akrun2 = paste(vapply(split(b,cumsum(grepl('latitude',names(b)))), paste, collapse=",", character(1L)), collapse="|"),
akrun3 = as.data.table(matrix(b, ncol=2, byrow=TRUE))[, paste(V1, V2, sep=',',collapse="|")],
AM = paste(apply(matrix(b, ncol = 2, byrow = TRUE), 1, paste, collapse = ","), collapse = "|"),
DA = paste(tapply(b, gl(length(b)/2, 2), paste, collapse = ","), collapse = "|"),
BA = do.call(paste, c(data.frame(matrix(b, ncol=2, byrow=TRUE)), list(sep=",", collapse="|")))
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# SH 6.207338 6.275886 6.633830 6.472943 6.915140 10.556983 100
#akrun1 8.738792 8.790045 9.301718 9.049665 9.611671 11.899290 100
#akrun2 40.676819 42.329860 45.361688 43.887247 46.427638 109.963421 100
#akrun3 4.648384 4.831599 5.019834 4.901934 5.217579 5.798325 100
# AM 38.322320 40.905073 43.108411 42.457375 44.875023 56.236726 100
# DA 47.102466 49.679579 52.092028 51.237417 53.694154 68.123738 100
# BA 5.227204 5.366769 6.147758 5.494207 5.806313 55.938247 100
b您可以使用逻辑索引和向量回收:
paste(paste(b[c(TRUE, FALSE)], b[c(FALSE, TRUE)], sep = ","), collapse = "|")
另一个选项是将向量重塑为data.frame
do.call(paste, c(data.frame(matrix(b, ncol=2, byrow=TRUE)),
list(sep=",", collapse="|")))
@LeeJH如果你不需要空间
,请使用粘贴,collapse=“,”
你总能找到一种将所有内容矢量化的方法,不是吗:)@davidernburg我只是在探索一种只使用粘贴
一次的方法。@davidernburg我不知道这是否比任何解决方案都快。只是一次尝试:-)我意识到我的解决方案基本上是您的扩展,但它的格式看起来更糟糕comment@baptiste,不用担心。我也考虑过这个选择,但在我的平板电脑上发布,不想尝试太多。不管怎么说,你的选择会变得更快。每个人所做的贡献都是值得赞赏的。这给了我们很多思考的东西。感谢@Davidernburg花时间运行基准测试。
b <- rep(b, 1e3)
library(microbenchmark)
microbenchmark(
SH = paste(paste(b[c(TRUE, FALSE)], b[c(FALSE, TRUE)], sep = ","), collapse = "|"),
akrun1 = paste(c(rbind(b,rep(c(',','|'), length.out = length(b))))[-length(b)*2], collapse = ""),
akrun2 = paste(vapply(split(b,cumsum(grepl('latitude',names(b)))), paste, collapse=",", character(1L)), collapse="|"),
akrun3 = as.data.table(matrix(b, ncol=2, byrow=TRUE))[, paste(V1, V2, sep=',',collapse="|")],
AM = paste(apply(matrix(b, ncol = 2, byrow = TRUE), 1, paste, collapse = ","), collapse = "|"),
DA = paste(tapply(b, gl(length(b)/2, 2), paste, collapse = ","), collapse = "|"),
BA = do.call(paste, c(data.frame(matrix(b, ncol=2, byrow=TRUE)), list(sep=",", collapse="|")))
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# SH 6.207338 6.275886 6.633830 6.472943 6.915140 10.556983 100
#akrun1 8.738792 8.790045 9.301718 9.049665 9.611671 11.899290 100
#akrun2 40.676819 42.329860 45.361688 43.887247 46.427638 109.963421 100
#akrun3 4.648384 4.831599 5.019834 4.901934 5.217579 5.798325 100
# AM 38.322320 40.905073 43.108411 42.457375 44.875023 56.236726 100
# DA 47.102466 49.679579 52.092028 51.237417 53.694154 68.123738 100
# BA 5.227204 5.366769 6.147758 5.494207 5.806313 55.938247 100
paste(paste(b[c(TRUE, FALSE)], b[c(FALSE, TRUE)], sep = ","), collapse = "|")
do.call(paste, c(data.frame(matrix(b, ncol=2, byrow=TRUE)),
list(sep=",", collapse="|")))