R 将连续整数运行折叠为范围字符串
我在一个列表中有一些数据,我需要寻找连续运行的整数(我的大脑思考R 将连续整数运行折叠为范围字符串,r,design-patterns,range,R,Design Patterns,Range,我在一个列表中有一些数据,我需要寻找连续运行的整数(我的大脑思考rle,但不知道如何在这里使用它) 查看数据集和解释我的目标更容易 这是数据视图: $greg [1] 7 8 9 10 11 20 21 22 23 24 30 31 32 33 49 $researcher [1] 42 43 44 45 46 47 48 $sally [1] 25 26 27 28 29 37 38 39 40 41 $sam [1] 1 2 3 4 5 6 16 17 18 1
rle
,但不知道如何在这里使用它)
查看数据集和解释我的目标更容易
这是数据视图:
$greg
[1] 7 8 9 10 11 20 21 22 23 24 30 31 32 33 49
$researcher
[1] 42 43 44 45 46 47 48
$sally
[1] 25 26 27 28 29 37 38 39 40 41
$sam
[1] 1 2 3 4 5 6 16 17 18 19 34 35 36
$teacher
[1] 12 13 14 15
$greg
[1] 7:11, 20:24, 30:33, 49
$researcher
[1] 42:48
$sally
[1] 25:29, 37:41
$sam
[1] 1:6, 16:19 34:36
$teacher
[1] 12:15
z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L,
23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L,
26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L,
3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg",
"researcher", "sally", "sam", "teacher"))
所需输出:
$greg
[1] 7 8 9 10 11 20 21 22 23 24 30 31 32 33 49
$researcher
[1] 42 43 44 45 46 47 48
$sally
[1] 25 26 27 28 29 37 38 39 40 41
$sam
[1] 1 2 3 4 5 6 16 17 18 19 34 35 36
$teacher
[1] 12 13 14 15
$greg
[1] 7:11, 20:24, 30:33, 49
$researcher
[1] 42:48
$sally
[1] 25:29, 37:41
$sam
[1] 1:6, 16:19 34:36
$teacher
[1] 12:15
z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L,
23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L,
26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L,
3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg",
"researcher", "sally", "sam", "teacher"))
使用基本软件包如何将连续范围替换为最高和最低之间的冒号以及非连续部分之间的逗号?请注意,数据从整数向量列表到字符向量列表
MWE数据:
$greg
[1] 7 8 9 10 11 20 21 22 23 24 30 31 32 33 49
$researcher
[1] 42 43 44 45 46 47 48
$sally
[1] 25 26 27 28 29 37 38 39 40 41
$sam
[1] 1 2 3 4 5 6 16 17 18 19 34 35 36
$teacher
[1] 12 13 14 15
$greg
[1] 7:11, 20:24, 30:33, 49
$researcher
[1] 42:48
$sally
[1] 25:29, 37:41
$sam
[1] 1:6, 16:19 34:36
$teacher
[1] 12:15
z <- structure(list(greg = c(7L, 8L, 9L, 10L, 11L, 20L, 21L, 22L,
23L, 24L, 30L, 31L, 32L, 33L, 49L), researcher = 42:48, sally = c(25L,
26L, 27L, 28L, 29L, 37L, 38L, 39L, 40L, 41L), sam = c(1L, 2L,
3L, 4L, 5L, 6L, 16L, 17L, 18L, 19L, 34L, 35L, 36L), teacher = 12:15), .Names = c("greg",
"researcher", "sally", "sam", "teacher"))
z我认为diff
是解决方案。你可能需要一些额外的技巧来对付单身汉,但是:
lapply(z, function(x) {
diffs <- c(1, diff(x))
start_indexes <- c(1, which(diffs > 1))
end_indexes <- c(start_indexes - 1, length(x))
coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
paste0(coloned, collapse=", ")
})
$greg
[1] "7:11, 20:24, 30:33, 49:49"
$researcher
[1] "42:48"
$sally
[1] "25:29, 37:41"
$sam
[1] "1:6, 16:19, 34:36"
$teacher
[1] "12:15"
lappy(z,函数(x){
使用IRanges进行差异:
require(IRanges)
lapply(z, function(x) {
t <- as.data.frame(reduce(IRanges(x,x)))[,1:2]
apply(t, 1, function(x) paste(unique(x), collapse=":"))
})
# $greg
# [1] "7:11" "20:24" "30:33" "49"
#
# $researcher
# [1] "42:48"
#
# $sally
# [1] "25:29" "37:41"
#
# $sam
# [1] "1:6" "16:19" "34:36"
#
# $teacher
# [1] "12:15"
require(IRanges)
lappy(z,函数(x){
t我对马吕斯有一个非常类似的解决方案,他的作品和我的作品一样,但机制稍有不同,所以我想我不妨发布它:
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unname(sapply(difflist, function(x){
if(length(x) == 1) as.character(x) else paste0(x[1], ":", x[length(x)])
}))
}
lapply(z, findIntRuns)
这里尝试使用diff
和tapply
返回字符向量
runs <- lapply(z, function(x) {
z <- which(diff(x)!=1);
results <- x[sort(unique(c(1,length(x), z,z+1)))]
lr <- length(results)
collapse <- rep(seq_len(ceiling(lr/2)),each = 2, length.out = lr)
as.vector(tapply(results, collapse, paste, collapse = ':'))
})
runs
$greg
[1] "7:11" "20:24" "30:33" "49"
$researcher
[1] "42:48"
$sally
[1] "25:29" "37:41"
$sam
[1] "1:6" "16:19" "34:36"
$teacher
[1] "12:15"
使用lappy
和tapply
运行另一个简短的解决方案:
lapply(z, function(x)
unname(tapply(x, c(0, cumsum(diff(x) != 1)), FUN = function(y)
paste(unique(range(y)), collapse = ":")
))
)
结果是:
$greg
[1] "7:11" "20:24" "30:33" "49"
$researcher
[1] "42:48"
$sally
[1] "25:29" "37:41"
$sam
[1] "1:6" "16:19" "34:36"
$teacher
[1] "12:15"
派对迟到了,但这里有一条基于deparse
的班轮:
lapply(z,function(x) paste(sapply(split(x,cumsum(c(1,diff(x)-1))),deparse),collapse=", "))
$greg
[1] "7:11, 20:24, 30:33, 49L"
$researcher
[1] "42:48"
$sally
[1] "25:29, 37:41"
$sam
[1] "1:6, 16:19, 34:36"
$teacher
[1] "12:15"
你的问题有点类似于这个问题:谢谢你分享你的想法+1当我觉得我越来越擅长R时,我看着这样的代码,意识到我有很多东西要学+1我不太确定这是一种恭维:)。不,是的。有一些函数的组合我没想到会组合在一起:-)我喜欢创意。效果很好。不是在基地,但对未来的搜索有用。谢谢你。+1当然,任何与间隔相关的东西,最好使用实现间隔树的软件包。是的,这是我第一次看到IRanges
这个我最喜欢的,因为我能理解你所做的一切。我疯了e一个小小的调整,将49:49
改为49
,但这是最简单的部分。谢谢。很好的方法+1肯定会迟到;)