R 如果值之间的间隔小于3个位置，则通过分组值查找范围hi/low_R

R 如果值之间的间隔小于3个位置，则通过分组值查找范围hi/low

R 如果值之间的间隔小于3个位置，则通过分组值查找范围hi/low,r,R,我的数据由4列组成：日期、低位、高位和位置我试图通过根据位置字段将数据汇总成组来找到范围如果差异（位置）=3，则仅计算当前点和上一点的范围前15个位置的示例，数据的第4个字段： c(12,14,17,18,19,20,21,22,24,28,33,36,37,38,43) 预期结果是分组（12,14），然后（17:24），（24,28），（28,33），（33,36），（36:38），最后（38,43），并找到每个组的范围。使用IRanges： require(IRanges) x &l

我的数据由4列组成：日期、低位、高位和位置

我试图通过根据位置字段将数据汇总成组来找到范围

如果差异（位置）<3，则将数据分组，并对每组应用范围函数

如果差异（位置）>=3，则仅计算当前点和上一点的范围

前15个位置的示例，数据的第4个字段：

c(12,14,17,18,19,20,21,22,24,28,33,36,37,38,43)

预期结果是分组

（12,14）

，然后

（17:24）

，

（24,28）

，

（28,33）

，

（33,36）

，

（36:38）

，最后

（38,43）

，并找到每个组的范围。

使用

IRanges

：

require(IRanges)
x <- c(12,14,17,18,19,20,21,22,24,28,33,36,37,38,43)
o <- reduce(IRanges(x, width=1), min.gapwidth=2)

这解决了你一半的问题。在

width=1

的位置，您希望获得适当的先前值。那么，让我们将其转换为data.frame

o <- as.data.frame(o)
o$start[o$width == 1] <- o$end[which(o$width == 1)-1]
o$width <- NULL

#   start end
# 1    12  14
# 2    17  24
# 3    24  28
# 4    28  33
# 5    36  38
# 6    38  43

o这里有一个函数，它使用基R函数返回根据所述规则分组的位置索引列表。如果值可能不是单调的，而您只关心绝对差异，我认为将diff（x）
更改为abs（diff（x））
（并删除随后的单调性检查）就足够了
这里有一个使用diff
识别组之间边界的选项
groupBy <- function(dat, thresh=3)  {
    # bounds will grab the *END* of every group (except last element)
    bounds <- which(! diff(dat) < thresh) 

    # add the last index of dat to the "stops" indecies
    stops  <- c(bounds, length(dat))

    # starts are 1 more than the bounds. We also add the first element 
    starts <- c(1, bounds+1) 

    # mapply to get `seq(starts, stops)`
    indecies <- mapply(seq, from=starts, to=stops)

    # return: lapply over each index to get the results
    lapply(indecies, function(i) dat[i])
}

groupBy我认为沿位置字段创建一个因子字段，在大于等于3的位置的每个“间隙”处增加1，可以解决一半的问题，使重叠问题仍然存在。我想的因子域看起来像c（1,1,2,2,2,2,2,2,3,4,5,5,5,6）不应该也包括在输出组中吗？是的，你是对的，应该包括（14,17）。谢谢你的回答！熟悉BioLite（）软件包。我的安装有问题。但它现在起作用了。谢谢你的回答，我更喜欢使用apply函数族，你的代码做到了。我感谢所有三个答复。谢谢大家！面向对象，当一个元素有背对背的组时，代码原样会生成分组错误。当数据为：dat 3但被分组在一个组中时。@user2004820，太棒了。对不起，我原来没看到。请看编辑（现在包装在一个很好的小功能中！）非常好，非常感谢！！
ir <- IRanges(x, width = 1)
o1 <- reduce(ir, min.gapwidth = 2)
o2 <- gaps(o1)
start(o2) <- start(o2) - 1
end(o2) <- end(o2) + 1
o1 <- as.data.frame(o1[width(o1) > 1])
o2 <- as.data.frame(o2)
out <- rbind(o1, o2)
out <- out[with(out, order(start, end)), ]

#   start end width
# 1    12  14     3
# 4    14  17     4
# 2    17  24     8
# 5    24  28     5
# 6    28  33     6
# 7    33  36     4
# 3    36  38     3
# 8    38  43     6

groupIndexes <- function(x, gap=3) {
    d <- diff(x)
    # currently assuming x is in increasing order
    if (any(d<0)) stop("x must be monotonically increasing")
    is.near <- (d < gap)
    # catch case of a single group
    if (all(is.near)) return(list(seq_along(x)))
    runs <- rle(ifelse(is.near, 0, seq_along(is.near)))
    gr <- rep(seq.int(runs$lengths), times=runs$lengths)
    lapply(unique(gr), function(i) {
        ind <- if(runs$values[i]>0) {
            match(i, gr)
        } else {
            which(gr==i)
        }
        c(ind, max(ind)+1)
    })
}

x <- c(12,14,17,18,19,20,21,22,24,28,33,36,37,38,43)
lapply(groupIndexes(x), function(ind) x[ind])

lapply(groupIndexes(dat$position), function(ind) range(dat$low[ind]))

groupBy <- function(dat, thresh=3)  {
    # bounds will grab the *END* of every group (except last element)
    bounds <- which(! diff(dat) < thresh) 

    # add the last index of dat to the "stops" indecies
    stops  <- c(bounds, length(dat))

    # starts are 1 more than the bounds. We also add the first element 
    starts <- c(1, bounds+1) 

    # mapply to get `seq(starts, stops)`
    indecies <- mapply(seq, from=starts, to=stops)

    # return: lapply over each index to get the results
    lapply(indecies, function(i) dat[i])
}

dat1 <- c(12,14,17,18,19,20,21,22,24,28,33,36,37,38,43)
dat2 <- c(5,6,7,9,13,17,21,35,36,41)

groupBy(dat1)
groupBy(dat2)
groupBy(dat2, 5)