在R程序中破坏我的数据上的c均值聚类的问题_R_Cluster Analysis_Fuzzy C Means

在R程序中破坏我的数据上的c均值聚类的问题

在R程序中破坏我的数据上的c均值聚类的问题,r,cluster-analysis,fuzzy-c-means,R,Cluster Analysis,Fuzzy C Means,对于此数据，如何解决此问题 > x=data.frame(c(v1="a" ,"b" ,"c" ,"d" ,"e"), + v2=c(97 ,90 ,93 ,97 ,90), + v3=c( 85 ,91 ,87 ,91 ,93)) > library(e1071) > f <- cmeans(x, 2) Error in cmeans(x, 2) : NA/NaN/Inf in foreign function call (arg 1) In addition: War

对于此数据，如何解决此问题

> x=data.frame(c(v1="a" ,"b" ,"c" ,"d" ,"e"),
+ v2=c(97 ,90 ,93 ,97 ,90),
+ v3=c( 85 ,91 ,87 ,91 ,93))
> library(e1071)
> f <- cmeans(x, 2)
Error in cmeans(x, 2) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In cmeans(x, 2) : NAs introduced by coercion
2: In cmeans(x, 2) : NAs introduced by coercion
> f

>x=data.frame（c（v1=“a”、“b”、“c”、“d”、“e”），
+v2=c（97,90,93,97,90），
+v3=c（85,91,87,91,93））
>图书馆（e1071）
>f

我想对我的数据应用c-maen，如上面代码所示，它包含三个向量：v1，v2，v2。如果我们看一下

？cmeans

，我想按向量（v1）应用c-mean标签

x-数据矩阵，其中列对应变量，行对应观察值

因此，在删除字符列（第1列）后，我们可以将

data.frame

转换为

matrix

x1分区聚类算法的k-mean
家族的工作原理是mean
，其本质上只接受数值。您将得到一个错误，因为数据帧由数字值和分类值组成，这是c-mean（）
不喜欢的。此外，不需要将数据帧转换为矩阵，因为这不是实际问题
所以,
替代方法
离散化字符变量，为其分配编号，然后应用聚类。这样就不需要删除任何变量
# create empty data frame
df<- setNames(data.frame(matrix(ncol = 5, nrow = 5)), c("a" ,"b" ,"c" ,"d" ,"e"))

# fill values
df$a<- c("aaaa" ,"bbbb" ,"cccc" ,"dddd" ,"eeee")
df$b<- c(97 ,90 ,93 ,97 ,90)
df$c<- c(97 ,90 ,93 ,97 ,90)
df$d<- c( 85 ,91 ,87 ,91 ,93)
df$e<- c( 85 ,91 ,87 ,91 ,93)

# show the dataframe
df
 a  b  c  d  e
1 aaaa 97 97 85 85
2 bbbb 90 90 91 91
3 cccc 93 93 87 87
4 dddd 97 97 91 91
5 eeee 90 90 93 93

# Discretize the character variable
df$a <- as.numeric( factor(df$a) ) -1
df
  a  b  c  d  e
1 0 97 97 85 85
2 1 90 90 91 91
3 2 93 93 87 87
4 3 97 97 91 91
5 4 90 90 93 93

# Apply clustering
library(e1071)
cmeans(df, 2)
Fuzzy c-means clustering with 2 clusters

Cluster centers:
      a     b     c     d     e
1 1.406 95.72 95.72 87.18 87.18
2 2.510 90.36 90.36 91.85 91.85

Memberships:
           1       2
[1,] 0.92728 0.07272
[2,] 0.04014 0.95986
[3,] 0.80061 0.19939
[4,] 0.72009 0.27991
[5,] 0.03544 0.96456

Closest hard clustering:
[1] 1 2 1 1 2

Available components:
[1] "centers"     "size"        "cluster"     "membership"  "iter"       
[6] "withinerror" "call"

#创建空数据帧
df
# create empty data frame
df<- setNames(data.frame(matrix(ncol = 5, nrow = 5)), c("a" ,"b" ,"c" ,"d" ,"e"))

# fill values
df$a<- c("aaaa" ,"bbbb" ,"cccc" ,"dddd" ,"eeee")
df$b<- c(97 ,90 ,93 ,97 ,90)
df$c<- c(97 ,90 ,93 ,97 ,90)
df$d<- c( 85 ,91 ,87 ,91 ,93)
df$e<- c( 85 ,91 ,87 ,91 ,93)

# show the dataframe
df
 a  b  c  d  e
1 aaaa 97 97 85 85
2 bbbb 90 90 91 91
3 cccc 93 93 87 87
4 dddd 97 97 91 91
5 eeee 90 90 93 93

# Discretize the character variable
df$a <- as.numeric( factor(df$a) ) -1
df
  a  b  c  d  e
1 0 97 97 85 85
2 1 90 90 91 91
3 2 93 93 87 87
4 3 97 97 91 91
5 4 90 90 93 93

# Apply clustering
library(e1071)
cmeans(df, 2)
Fuzzy c-means clustering with 2 clusters

Cluster centers:
      a     b     c     d     e
1 1.406 95.72 95.72 87.18 87.18
2 2.510 90.36 90.36 91.85 91.85

Memberships:
           1       2
[1,] 0.92728 0.07272
[2,] 0.04014 0.95986
[3,] 0.80061 0.19939
[4,] 0.72009 0.27991
[5,] 0.03544 0.96456

Closest hard clustering:
[1] 1 2 1 1 2

Available components:
[1] "centers"     "size"        "cluster"     "membership"  "iter"       
[6] "withinerror" "call"