kmeans在R中为我的时间序列数据集返回一个错误

kmeans在R中为我的时间序列数据集返回一个错误,r,R,我有一个时间序列数据集。数据以Excel格式提供,网址为。我想使用k-means对数据进行聚类。然而,我有一个错误 **请注意,FinDat是我从附件来源获得的数据 > head(FinDat) # A tibble: 6 x 10 date ISE...2 ISE...3 SP DAX FTSE NIKKEI BOVESPA EU <dttm> <

我有一个时间序列数据集。数据以Excel格式提供,网址为。我想使用k-means对数据进行聚类。然而,我有一个错误

**请注意,
FinDat
是我从附件来源获得的数据

  > head(FinDat)
# A tibble: 6 x 10
  date                 ISE...2  ISE...3       SP      DAX     FTSE   NIKKEI  BOVESPA       EU
  <dttm>                 <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 2009-01-05 00:00:00  0.0358   0.0384  -0.00468  0.00219  3.89e-3  0        0.0312   0.0127 
2 2009-01-06 00:00:00  0.0254   0.0318   0.00779  0.00846  1.29e-2  0.00416  0.0189   0.0113 
3 2009-01-07 00:00:00 -0.0289  -0.0264  -0.0305  -0.0178  -2.87e-2  0.0173  -0.0359  -0.0171 
4 2009-01-08 00:00:00 -0.0622  -0.0847   0.00339 -0.0117  -4.66e-4 -0.0401   0.0283  -0.00556
5 2009-01-09 00:00:00  0.00986  0.00966 -0.0215  -0.0199  -1.27e-2 -0.00447 -0.00976 -0.0110 
6 2009-01-12 00:00:00 -0.0292  -0.0424  -0.0228  -0.0135  -5.03e-3 -0.0490  -0.0538  -0.0125 
# ... with 1 more variable: EM <dbl>

silhouette_score <- function(k){
  km <- kmeans(FinDat, centers = k, nstart=25)
  ss <- silhouette(km$cluster, dist(FinDat))
  mean(ss[, 3])
}
k <- 2:10
avg_sil <- sapply(k, silhouette_score)

which returns:

        Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In storage.mode(x) <- "double" : NAs introduced by coercion
>头部(FinDat)
#一个tibble:6x10
日期ISE…2 ISE…3 SP DAX富时-日经-波维斯帕EU
1 2009-01-05 00:00:00 0.0358 0.0384-0.00468 0.00219 3.89e-30 0.0312 0.0127
2009-01-06 00:00:00 0.0254 0.0318 0.00779 0.00846 1.29e-20.00416 0.0189 0.0113
3 2009-01-07 00:00:00-0.0289-0.0264-0.0305-0.0178-2.87e-2 0.0173-0.0359-0.0171
2009-01-08 00:00:00-0.0622-0.0847-0.00339-0.0117-4.66e-4-0.0401-0.0283-0.00556
5 2009-01-09 00:00:00 0.00986 0.00966-0.0215-0.0199-1.27e-2-0.00447-0.00976-0.0110
6 2009-01-12 00:00:00-0.0292-0.0424-0.0228-0.0135-5.03e-3-0.0490-0.0538-0.0125
# ... 还有一个变量:EM

剪影评分似乎
kmeans
不喜欢日期列,您可能希望将其排除在外

library(cluster)
silhouette_score <- function(k) {
  stopifnot(!k > nrow(FinDat) - 1)
  km <- kmeans(FinDat[-1], centers=k, nstart=25)
  ss <- silhouette(km$cluster, dist(FinDat[-1]))
  return(setNames(mean(ss[, 3]), k))
}

k <- 2:5
avg_sil <- sapply(k, silhouette_score)
avg_sil
#         2         3         4         5 
# 0.3791762 0.3302388 0.2735529 0.2133566 

数据:

FinDat
silhouette_score2 <- function(k) {
  stopifnot(!k > nrow(FinDat) - 1)
  FinDat <- data.matrix(FinDat)
  km <- kmeans(FinDat, centers=k, nstart=25)
  ss <- silhouette(km$cluster, dist(FinDat))
  return(setNames(mean(ss[, 3]), k))
}

k <- 2:5
avg_sil <- sapply(k, silhouette_score2)
avg_sil
#          2          3          4          5 
# 0.40783229 0.37777778 0.21111111 0.08333333
FinDat <- structure(list(date = structure(c(1231110000, 1231196400, 1231282800, 
1231369200, 1231455600, 1231714800), class = c("POSIXct", "POSIXt"
), tzone = ""), ISE...2 = c(0.0358, 0.0254, -0.0289, -0.0622, 
0.00986, -0.0292), ISE...3 = c(0.0384, 0.0318, -0.0264, -0.0847, 
0.00966, -0.0424), SP = c(-0.00468, 0.00779, -0.0305, 0.00339, 
-0.0215, -0.0228), DAX = c(0.00219, 0.00846, -0.0178, -0.0117, 
-0.0199, -0.0135), FTSE = c(0.00389, 0.0129, -0.0287, -0.000466, 
-0.0127, -0.00503), NIKKEI = c(0, 0.00416, 0.0173, -0.0401, -0.00447, 
-0.049), BOVESPA = c(0.0312, 0.0189, -0.0359, 0.0283, -0.00976, 
-0.0538), EU = c(0.0127, 0.0113, -0.0171, -0.00556, -0.011, -0.0125
)), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")