R 检测简单数值向量中的一个或多个拐点

R 检测简单数值向量中的一个或多个拐点,r,R,全部, 我正在寻找一种可靠的、无监督的方法来检测相对较短向量中的变化点。考虑下面两个例子: v1 = c(0.299584,0.314446,0.357783,0.388896,0.410417,0.427182,0.450383,0.466671,0.474884,0.474749,0.493566,0.500374,0.522482,0.529851,0.538387,0.577901,0.610939,0.639383,0.662433,0.692656,0.720543,0.738255

全部,

我正在寻找一种可靠的、无监督的方法来检测相对较短向量中的变化点。考虑下面两个例子:

v1 = c(0.299584,0.314446,0.357783,0.388896,0.410417,0.427182,0.450383,0.466671,0.474884,0.474749,0.493566,0.500374,0.522482,0.529851,0.538387,0.577901,0.610939,0.639383,0.662433,0.692656,0.720543,0.738255,0.748055,0.7591,0.770595,0.781811,0.794479,0.794588,0.789448,0.77667,0.765406,0.75152,0.740408,0.726898,0.720766,0.709445,0.69896,0.687508,0.673382,0.65795,0.639214,0.620445,0.590047,0.561773,0.526807,0.486848,0.439681,0.387545,0.313369,0.282872,0.279908,0.271836,0.269088,0.262727,0.259782)

v2 = c(0.081309,0.206263,0.429069,0.511859,0.565194,0.578792,0.56919,0.51985,0.432563,0.193907,0.0771,0.086603,0.18303,0.177608,0.169706,0.260917,0.292062,0.2979,0.263249,0.270576,0.250422,0.25219,0.182878,0.080623,0.079443,0.088944,0.087623,0.126403,0.155563,0.273942,0.312054,0.370195,0.357087,0.336452,0.300574,0.243105,0.243105,0.25593,0.227401,0.218047,0.15857,0.157727,0.139801,0.125742,0.129142,0.142166,0.142166,0.136748,0.107755,0.064377,0.072801,0.060093,0.103441,0.111704,0.124544)
如果你看

plot(v1,type='l') 

您可以看到,对于v1,我想检测索引=28的变化,对于v2,我想检测索引值8、11、18、25、32和51的变化。到目前为止,我已经试验过贝叶斯变化点算法,该算法在确定拐点可能出现的位置(低后验概率区域)方面效果良好,但仍然迫使我依靠目视检查进行最终确定:

install.packages('bcp')
library(bcp)

test = bcp(v1,w0=0.2,p0=0.01)
plot(v1,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)

test = bcp(v2,w0=0.2,p0=0.01)
plot(v2,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)
有没有一种方法可以自动对此类数据中的多个变化点的估计进行无监督选择?也许我只是徒劳地寻找人类直觉的替代品:P我也看过changepoint软件包,但它似乎不是为这种数据设计的

谢谢,
亚伦

所以,这是一个简单的解决方案。您可以修改参数以返回不同(更多/更少、敏感/不敏感)的拐点(或区域,对于数据)

绘图(v2,type=“l”,col=“darkblue”,lwd=2)

#v2你有没有看到关于s.o.上峰值检测或局部最小值/最大值的问题?例如?我喜欢使用
pastecs::turnpoints
,根据输入数据的质量,使用或不使用预平滑。这是一个非常简单、优雅的解决方案。谢谢PS:这是在某个地方的报纸上,还是你在传单上发现的?我刚刚发现了它。看起来你的第二张
非常漂亮
覆盖了第一张。大概第一个版本是一个只关注单向变化的替代版本(例如,仅在过去,当你没有机会整理有价值的新观察时,这可能很有用)。然而,在这种情况下,第一个版本的
sapply
不应该删除第一个和最后一个
k
观察值。这样做的唯一原因是用于迭代函数的语法,可以将其重新编写为类似
i:(k+i-1)
(i+k):(i+2*k-1)
而不是
i-1:k
i+0:k
install.packages('bcp')
library(bcp)

test = bcp(v1,w0=0.2,p0=0.01)
plot(v1,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)

test = bcp(v2,w0=0.2,p0=0.01)
plot(v2,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)
plot(v2, type="l", col="darkblue", lwd=2)
# v2 <- smooth(v2, kind="3")  # optional
lines(v2, lwd=1, col="red")
d2 <- diff(v2)
d2 <- d2>0
d2 <- d2*2 -1 
k <- 5
cutoff <- 10
scores <- sapply(k:(length(d2)-k), FUN=function(i){
  score <- abs(mean(-d2[ i-1:k ], na.rm=T) + mean(d2[ i+0:k ], na.rm=T))
})


scores <- sapply(k:(length(v2)-k), FUN=function(i){
  left <- (v2[sapply(i-1:k, max, 1) ]<v2[i])*2-1
  right <- (v2[sapply(i+1:k, min, length(v2)) ]<v2[i])*2-1

  score <- abs(sum(left) + sum(right))
})

inflections <- (k:(length(v2)-k))[scores>=cutoff]

plot(v2, type="l")
abline(v=inflections, col="red", lwd=3)
print(inflections) #  6 11 18 25 32 (missed 51, if you make cutoff=8 it'll catch it...)