Colors 在预先分类的数据上使用APCluster-对数据图着色并输出格式良好
问题1: 我试图在AggExResult对象上使用plot()函数,并且()中的集群按预期工作 在我自己的数据中,我在输入中有一个额外的列,它为分类目的提供了一个预定义的“目标”,我想知道是否有一种方法可以用颜色(例如红色=0级,蓝色=1级)突出显示dendogram标签,目标的类别是因子(或字符)。我最终试图直观地显示有多少集群包含“纯”类和“混合”类。以下是在线文档中的一些稍加修改的代码,大致显示了我的输入数据:Colors 在预先分类的数据上使用APCluster-对数据图着色并输出格式良好,colors,reporting,Colors,Reporting,问题1: 我试图在AggExResult对象上使用plot()函数,并且()中的集群按预期工作 在我自己的数据中,我在输入中有一个额外的列,它为分类目的提供了一个预定义的“目标”,我想知道是否有一种方法可以用颜色(例如红色=0级,蓝色=1级)突出显示dendogram标签,目标的类别是因子(或字符)。我最终试图直观地显示有多少集群包含“纯”类和“混合”类。以下是在线文档中的一些稍加修改的代码,大致显示了我的输入数据: cl1Targ <- matrix(nrow=50,ncol=1) fo
cl1Targ <- matrix(nrow=50,ncol=1)
for(c1t in 1:nrow(cl1Targ)){ cl1Targ[c1t] <- as.factor(0) }
cl2Targ <- matrix(nrow=50,ncol=1)
for(c2t in 1:nrow(cl2Targ)){ cl2Targ[c2t] <- as.factor(1) }
## create two Gaussian clouds
#cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06))
#cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05))
cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06),cl1Targ)
cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05),cl2Targ)
x <- rbind(cl1,cl2)
colnames(x) <- c('Column 1','Column 2','Class_ID')
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x, r=2)
## run affinity propagation
apres <- apcluster(sim, q=0.7)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## plot dendrogram
plot(aggres1, main='aggres1 w/ target') #
当我使用我自己的数据时,我看到了以下内容(row.names,即按基因表达平均倍数变化值进行聚类的药物)
这也是我在内容方面所期望的,但我想将其格式化以供报告之用。我曾尝试使用dput()导出此文件,但在输出文件中获得了许多额外的不必要信息。我想知道如何才能从上面导出相同类型的信息以及上面提到的对象名称和目标分类器,并将其导出到如下表中(并将对象名称添加到输出中):
非常感谢Ulrich通过电子邮件快速回答了这些问题,我们想与社区分享我们的讨论,所以我会让他用他的解决方案来回答,这样他就能得到他应得的荣誉:-)
作为更新,我尝试实现问题1的答案,示例代码按预期工作,但我在将其用于我的数据时遇到了问题。输入数据分为两部分。第一个是包含数字测量数据的矩阵,包括列和行标签:
> fci[1:3,1:3]
M30596_PROBE1 AI231309_PROBE1 NM_012489_PROBE1
amantadine_58mg1d_fc 0.05630744 -0.10441722 0.41873201
amantadine_58mg6h_fc -0.42780274 -0.26222322 0.02703001
amantadine_220mg1d_fc 0.35260779 -0.09902214 0.04067055
第二个是因子格式的“目标”值,每个值对应于上述fci中的同一行:
> targs[1:3]
amantadine_58mg1d_fc amantadine_58mg6h_fc amantadine_220mg1d_fc
0 0 0
Levels: 0 1
从这里开始,树的构建如下:
# build the AggExResult:
aglomr1 <- aggExCluster(negDistMat(r=2), fci)
# convert the data
tree <- as.dendrogram(aglomr1)
# assign the color codes
colorCodes <- c("0"="red", "1"="green")
names(targs) <- rownames(fci)
xColor <- colorCodes[as.character(targs)]
names(xColor) <- rownames(fci)
# plot the colored tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree, main="Colored Tree")
这一部分似乎按照预期工作,目标具有正确的指定颜色,但行名不是xColor,行标记颜色(树)头部(顺序树状图(树))
[1] "295" "929" "488" "493" "233" "235"
>头部(标签和颜色(树))
295 929 488 493 233 235
>头部(xColor[顺序树状图(树)])
娜娜娜娜娜娜
如何获取行标签\u颜色(树)以下是我对问题1的回答:“
aggexpresult
”对象的plot()
方法在内部使用plot.dendrogram()
方法。由于该方法不允许对树状图的叶子着色,因此该方法不起作用。但是,有一个“dendextend
”包提供了这样的功能。(顺便说一句,我在另一个线程中发现了该解决方案:)由于“apcluster
”为“hclust
”和“dendrogram
”对象提供了一些强制转换,因此该包的功能或多或少可以直接使用
下面是一些示例代码:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## load 'dendextend' package
## install.packages("dendextend") ## if not yet installed
library(dendextend)
## convert object
tree <- as.dendrogram(aggres1)
## assign color codes
colorCodes <- c("0"="red", "1"="green")
xColor <- colorCodes[x$Class_ID]
names(xColor) <- rownames(x)
## plot color-labeled tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree)
库(apcluster)
##创建两个高斯云以及类标签0/1
cl1以下是我对问题1的回答:“AggExResult
”对象的plot()
方法在内部使用plot.dendrogram()
方法。由于该方法不允许对树状图的叶子着色,因此该方法不起作用。但是,有一个“dendextend
”包提供了这样的功能。(顺便说一句,我在另一个线程中发现了该解决方案:)由于“apcluster
”为“hclust
”和“dendrogram
”对象提供了一些强制转换,因此该包的功能或多或少可以直接使用
下面是一些示例代码:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## load 'dendextend' package
## install.packages("dendextend") ## if not yet installed
library(dendextend)
## convert object
tree <- as.dendrogram(aggres1)
## assign color codes
colorCodes <- c("0"="red", "1"="green")
xColor <- colorCodes[x$Class_ID]
names(xColor) <- rownames(x)
## plot color-labeled tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree)
库(apcluster)
##创建两个高斯云以及类标签0/1
cl1以下是我对您的问题2的回答:对不起,“apcluster
”包中没有实现此类功能。由于这是一个非常特殊的请求,我不愿意将其包含在包中(更不用说show()
方法不能有额外的参数了)。因此,或者,我想为您提供一个自定义函数,用于标记/分组示例和样本:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## special show() function with labeled data
show.ExClust.labeled <- function(object, labels=NULL)
{
if (!is(object, "ExClust"))
stop("'object' is not of class 'ExClust'")
if (is.null(labels))
{
show(object)
return(invisible(NULL))
}
cat("\n", class(object), " object\n", sep="")
if (!is.finite(object@l) || !is.finite(object@it))
stop("object is not result of an affinity propagation run; ",
"it is pointless to create 'APResult' objects yourself.")
cat("\nNumber of samples = ", object@l, "\n")
if (length(object@sel) > 0)
{
cat("Number of sel samples = ", length(object@sel),
paste(" (", round(100*length(object@sel)/object@l,1),
"%)\n", sep=""))
cat("Number of sweeps = ", object@sweeps, "\n")
}
cat("Number of iterations = ", object@it, "\n")
cat("Input preference = ", object@p, "\n")
cat("Sum of similarities = ", object@dpsim, "\n")
cat("Sum of preferences = ", object@expref, "\n")
cat("Net similarity = ", object@netsim, "\n")
cat("Number of clusters = ", length(object@exemplars), "\n\n")
if (length(object@exemplars) > 0)
{
if (length(names(object@exemplars)) == 0)
{
cat("Exemplars:\n")
df <- data.frame("Sample"=object@exemplars,
Label=labels[object@exemplars])
print(df, row.names=FALSE)
for (i in 1:length(object@exemplars))
{
cat("\nCluster ", i, ", exemplar ",
object@exemplars[i], ":\n", sep="")
df <- data.frame(Sample=object@clusters[[i]],
Label=labels[object@clusters[[i]]])
print(df, row.names=FALSE)
}
}
else
{
df <- data.frame("Exemplars"=names(object@exemplars),
Label=labels[names(object@exemplars)])
print(df, row.names=FALSE)
for (i in 1:length(object@exemplars))
{
cat("\nCluster ", i, ", exemplar ",
names(object@exemplars)[i], ":\n", sep="")
df <- data.frame(Sample=names(object@clusters[[i]]),
Label=labels[names(object@clusters[[i]])])
print(df, row.names=FALSE)
}
}
}
else
{
cat("No clusters identified.\n")
}
}
## create label vector (with proper names)
label <- x$Class_ID
names(label) <- rownames(x)
## run apcluster()
apres <- apcluster(sim, q=0.3)
## show with labels
show.ExClust.labeled(apres, label)
库(apcluster)
##创建两个高斯云以及类标签0/1
cl1以下是我对您的问题2的回答:对不起,“apcluster
”包中没有实现此类功能。由于这是一个非常特殊的请求,我不愿意将其包含在包中(更不用说show()
方法不能有额外的参数了)。因此,或者,我想为您提供一个自定义函数,用于标记/分组示例和样本:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## special show() function with labeled data
show.ExClust.labeled <- function(object, labels=NULL)
{
if (!is(object, "ExClust"))
stop("'object' is not of class 'ExClust'")
if (is.null(labels))
{
show(object)
return(invisible(NULL))
}
cat("\n", class(object), " object\n", sep="")
if (!is.finite(object@l) || !is.finite(object@it))
stop("object is not result of an affinity propagation run; ",
"it is pointless to create 'APResult' objects yourself.")
cat("\nNumber of samples = ", object@l, "\n")
if (length(object@sel) > 0)
{
cat("Number of sel samples = ", length(object@sel),
paste(" (", round(100*length(object@sel)/object@l,1),
"%)\n", sep=""))
cat("Number of sweeps = ", object@sweeps, "\n")
}
cat("Number of iterations = ", object@it, "\n")
cat("Input preference = ", object@p, "\n")
cat("Sum of similarities = ", object@dpsim, "\n")
cat("Sum of preferences = ", object@expref, "\n")
cat("Net similarity = ", object@netsim, "\n")
cat("Number of clusters = ", length(object@exemplars), "\n\n")
if (length(object@exemplars) > 0)
{
if (length(names(object@exemplars)) == 0)
{
cat("Exemplars:\n")
df <- data.frame("Sample"=object@exemplars,
Label=labels[object@exemplars])
print(df, row.names=FALSE)
for (i in 1:length(object@exemplars))
{
cat("\nCluster ", i, ", exemplar ",
object@exemplars[i], ":\n", sep="")
df <- data.frame(Sample=object@clusters[[i]],
Label=labels[object@clusters[[i]]])
print(df, row.names=FALSE)
}
}
else
{
df <- data.frame("Exemplars"=names(object@exemplars),
Label=labels[names(object@exemplars)])
print(df, row.names=FALSE)
for (i in 1:length(object@exemplars))
{
cat("\nCluster ", i, ", exemplar ",
names(object@exemplars)[i], ":\n", sep="")
df <- data.frame(Sample=names(object@clusters[[i]]),
Label=labels[names(object@clusters[[i]])])
print(df, row.names=FALSE)
}
}
}
else
{
cat("No clusters identified.\n")
}
}
## create label vector (with proper names)
label <- x$Class_ID
names(label) <- rownames(x)
## run apcluster()
apres <- apcluster(sim, q=0.3)
## show with labels
show.ExClust.labeled(apres, label)
库(apcluster)
##创建两个高斯云以及类标签0/1
cl1关于计算相似性的方法的一点注释:如果数据有一个数字标签列,negDistMat()将使用此列。这可能不是你想要/想要的。因此,要么在聚类之前从数据中删除标签列,要么将数据放入数据框中。如果这样做,并且标签列是一个因素,“apcluster”包中实现的相似性度量将自动忽略它。在1.4.4之前的版本中,as.dendrogram()方法中存在错误。这就是为什么xColor[order.dendrogram(tree)]分配颜色不起作用的原因。我建议升级到1.4.4版(自2017年7月4日起在CRAN上),或者在旧版本中使用变通方法xColor[as.numeric(order.dendrogram(tree))。对计算相似性的方法有一点评论:如果数据有数字标签列,negDistMat()将使用此列。这可能不是你想要/想要的。因此,要么在聚类之前从数据中删除标签列,要么将数据放入数据框中。如果你愿意
> head(order.dendrogram(tree))
[1] "295" "929" "488" "493" "233" "235"
> head(labels_colors(tree))
295 929 488 493 233 235
> head(xColor[order.dendrogram(tree)])
<NA> <NA> <NA> <NA> <NA> <NA>
NA NA NA NA NA NA
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## load 'dendextend' package
## install.packages("dendextend") ## if not yet installed
library(dendextend)
## convert object
tree <- as.dendrogram(aggres1)
## assign color codes
colorCodes <- c("0"="red", "1"="green")
xColor <- colorCodes[x$Class_ID]
names(xColor) <- rownames(x)
## plot color-labeled tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree)
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## special show() function with labeled data
show.ExClust.labeled <- function(object, labels=NULL)
{
if (!is(object, "ExClust"))
stop("'object' is not of class 'ExClust'")
if (is.null(labels))
{
show(object)
return(invisible(NULL))
}
cat("\n", class(object), " object\n", sep="")
if (!is.finite(object@l) || !is.finite(object@it))
stop("object is not result of an affinity propagation run; ",
"it is pointless to create 'APResult' objects yourself.")
cat("\nNumber of samples = ", object@l, "\n")
if (length(object@sel) > 0)
{
cat("Number of sel samples = ", length(object@sel),
paste(" (", round(100*length(object@sel)/object@l,1),
"%)\n", sep=""))
cat("Number of sweeps = ", object@sweeps, "\n")
}
cat("Number of iterations = ", object@it, "\n")
cat("Input preference = ", object@p, "\n")
cat("Sum of similarities = ", object@dpsim, "\n")
cat("Sum of preferences = ", object@expref, "\n")
cat("Net similarity = ", object@netsim, "\n")
cat("Number of clusters = ", length(object@exemplars), "\n\n")
if (length(object@exemplars) > 0)
{
if (length(names(object@exemplars)) == 0)
{
cat("Exemplars:\n")
df <- data.frame("Sample"=object@exemplars,
Label=labels[object@exemplars])
print(df, row.names=FALSE)
for (i in 1:length(object@exemplars))
{
cat("\nCluster ", i, ", exemplar ",
object@exemplars[i], ":\n", sep="")
df <- data.frame(Sample=object@clusters[[i]],
Label=labels[object@clusters[[i]]])
print(df, row.names=FALSE)
}
}
else
{
df <- data.frame("Exemplars"=names(object@exemplars),
Label=labels[names(object@exemplars)])
print(df, row.names=FALSE)
for (i in 1:length(object@exemplars))
{
cat("\nCluster ", i, ", exemplar ",
names(object@exemplars)[i], ":\n", sep="")
df <- data.frame(Sample=names(object@clusters[[i]]),
Label=labels[names(object@clusters[[i]])])
print(df, row.names=FALSE)
}
}
}
else
{
cat("No clusters identified.\n")
}
}
## create label vector (with proper names)
label <- x$Class_ID
names(label) <- rownames(x)
## run apcluster()
apres <- apcluster(sim, q=0.3)
## show with labels
show.ExClust.labeled(apres, label)