R-群中百万次观测的Kmeans_R_Plot_Machine Learning_K Means_Rstudio

R-群中百万次观测的Kmeans

r plot machine-learning

R-群中百万次观测的Kmeans,r,plot,machine-learning,k-means,rstudio,R,Plot,Machine Learning,K Means,Rstudio,我正试图用4个观察值在超过一百万行上执行KMeans聚类，都是数字。我正在使用以下代码： kmeansdf<-as.data.frame(rbind(train$V3,train$V5,train$V8,train$length)) km<-kmeans(kmeansdf,2) 这段代码给出了以下错误： Error in plot.new() : figure margins too large 我试着在网上搜索，但找不到解决方案，我也试着在命令行上工作，但仍然得到相同的错误（我

我正试图用4个观察值在超过一百万行上执行KMeans聚类，都是数字。我正在使用以下代码：

kmeansdf<-as.data.frame(rbind(train$V3,train$V5,train$V8,train$length))
km<-kmeans(kmeansdf,2)

这段代码给出了以下错误：

Error in plot.new() : figure margins too large

我试着在网上搜索，但找不到解决方案，我也试着在命令行上工作，但仍然得到相同的错误（我现在正在使用RStudio）

如果您能帮助解决此错误，我们将不胜感激。

TIA。

当我在带有1e6行的df上运行代码时，我没有得到相同的错误，但系统挂起（10分钟后中断）。可能是创建一个每帧有1e6个点的散点图矩阵太多了

你可以考虑随机抽样：

# all this to create a df with two distinct clusters
set.seed(1)
center.1 <- c(2,2,2,2)
center.2 <- c(-2,-2,-2,-2)
n <- 5e5
f <- function(x){return(data.frame(V1=rnorm(n,mean=x[1]),
                                   V2=rnorm(n,mean=x[2]),
                                   V3=rnorm(n,mean=x[3]),
                                   V4=rnorm(n,mean=x[4])))}
df <- do.call("rbind",lapply(list(center.1,center.2),f))

km <- kmeans(df,2)         # run kmeans on full dataset
df$cluster <- km$cluster   # append cluster column to df

# sample is 10% of population (100,000 rows)
s  <- 1e5
df <- df[sample(nrow(df),s),]
plot(df[,1:4],col=df$cluster)

#所有这些都是为了创建具有两个不同集群的df
种子（1）
居中。1您的绘图区域太小。尝试手动放大：在RStudio中，拖动绘图区域的边界使其变大。如果这没有帮助，那么您可能正试图绘制一个非常大的数据量。尝试将绘图直接保存到文件中。请参阅？设备，了解实现此操作的方法。请注意编辑-您需要首先在完整数据集上运行kmeans，然后附加群集列，然后绘制示例。
# all this to create a df with two distinct clusters
set.seed(1)
center.1 <- c(2,2,2,2)
center.2 <- c(-2,-2,-2,-2)
n <- 5e5
f <- function(x){return(data.frame(V1=rnorm(n,mean=x[1]),
                                   V2=rnorm(n,mean=x[2]),
                                   V3=rnorm(n,mean=x[3]),
                                   V4=rnorm(n,mean=x[4])))}
df <- do.call("rbind",lapply(list(center.1,center.2),f))

km <- kmeans(df,2)         # run kmeans on full dataset
df$cluster <- km$cluster   # append cluster column to df

# sample is 10% of population (100,000 rows)
s  <- 1e5
df <- df[sample(nrow(df),s),]
plot(df[,1:4],col=df$cluster)