在R中获得多个分区方法的一致性

在R中获得多个分区方法的一致性,r,algorithm,classification,cluster-analysis,partitioning,R,Algorithm,Classification,Cluster Analysis,Partitioning,我的数据: data=cbind(c(1,1,2,1,1,3),c(1,1,2,1,1,1),c(2,2,1,2,1,2)) colnames(data)=paste("item",1:3) rownames(data)=paste("method",1:6) 我想作为一个输出,根据多数票,有两个社区(及其组成部分)。类似于:group1={item1,item2},group2={item3} 你可以试试这个,baseR: res=apply(data,2,function(u) as.nu

我的数据:

data=cbind(c(1,1,2,1,1,3),c(1,1,2,1,1,1),c(2,2,1,2,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:6)

我想作为一个输出,根据多数票,有两个社区(及其组成部分)。类似于:
group1={item1,item2}
group2={item3}

你可以试试这个,base
R

res=apply(data,2,function(u) as.numeric(names(sort(table(u), decreasing=T))[1]))

setNames(lapply(unique(res), function(u) names(res)[res==u]), unique(res))
#$`1`
#[1] "item 1" "item 2"

#$`2`
#[1] "item 3"

该函数被传递一个矩阵,其中每一列是一个项,每一行是根据聚类方法对应于项的划分的成员向量。组成每一行的元素(数字)除了表示成员身份之外没有其他意义,并且在每一行之间循环使用。该函数返回多数票分区。当某个项目不存在一致意见时,第一行给出的分区将获胜。例如,这允许通过降低模块化的值来对分区进行排序

    consensus.final <-
  function(data){
    output=list()
    for (i in 1:nrow(data)){
      row=as.numeric(data[i,])
      output.inner=list()
      for (j in 1:length(row)){
        group=character()
        group=c(group,colnames(data)[which(row==row[j])])
        output.inner[[j]]=group
      }
      output.inner=unique(output.inner)
      output[[i]]=output.inner
    }

    # gives the mode of the vector representing the number of groups found by each method
    consensus.n.comm=as.numeric(names(sort(table(unlist(lapply(output,length))),decreasing=TRUE))[1])

    # removes the elements of the list that do not correspond to this consensus solution
    output=output[lapply(output,length)==consensus.n.comm]

    # 1) find intersection 
    # 2) use majority vote for elements of each vector that are not part of the intersection

    group=list()

    for (i in 1:consensus.n.comm){ 
      list.intersection=list()
      for (p in 1:length(output)){
        list.intersection[[p]]=unlist(output[[p]][i])
      }

      # candidate group i
      intersection=Reduce(intersect,list.intersection)
      group[[i]]=intersection

      # we need to reinforce that group
      for (p in 1:length(list.intersection)){
        vector=setdiff(list.intersection[[p]],intersection)
        if (length(vector)>0){
          for (j in 1:length(vector)){
            counter=vector(length=length(list.intersection))
            for (k in 1:length(list.intersection)){
              counter[k]=vector[j]%in%list.intersection[[k]]
            }
            if(length(which(counter==TRUE))>=ceiling((length(counter)/2)+0.001)){
              group[[i]]=c(group[[i]],vector[j])
            }
          }
        }
      }
    }

    group=lapply(group,unique)

    # variables for which consensus has not been reached
    unclassified=setdiff(colnames(data),unlist(group))

    if (length(unclassified)>0){
      for (pp  in 1:length(unclassified)){
        temp=matrix(nrow=length(output),ncol=consensus.n.comm)
        for (i in 1:nrow(temp)){
          for (j in 1:ncol(temp)){
            temp[i,j]=unclassified[pp]%in%unlist(output[[i]][j])
          }
        }
        # use the partition of the first method when no majority exists (this allows ordering of partitions by decreasing modularity values for instance)
        index.best=which(temp[1,]==TRUE)
        group[[index.best]]=c(group[[index.best]],unclassified[pp])
      }
    }
    output=list(group=group,unclassified=unclassified)
  }

对不起,那不行。例如:data=cbind(c(1,1,1,1,1,1,3),c(1,1,1,1,1,1),c(1,1,1,2,1,2))colnames(data)=paste(“item”,1:3)rownames(data)=paste(“method”,1:6)您的方法返回3个组,但显然只有一个基于多数票的组,如:consensensensess所强调的。名字和顺序上的打字错误。您的第二个示例在代码中得到了充分反映,我发现了一个新问题:例如:data=cbind(c(1,3,2,1),c(2,2,3,3),c(3,1,1,2));colnames(数据)=粘贴(“项目”,1:3);rownames(data)=粘贴(“method”,1:4)当一致性显然是3集群解决方案时,您的命令返回{item1,item3}和{item2}。请记住,数字不是固定的组标签,它们仅表示成员身份,并且在相等的情况下从一行循环到另一行(第2列),您的意思是您想要两个组吗?对于每一行(每种方法),这三个项分为不同的组。我不太明白你所说的“在平等的情况下(第2栏)”是什么意思。
data=cbind(c(1,1,2,1,1,3),c(1,1,2,1,1,1),c(2,2,1,2,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:6)
data
consensus.final(data)$group

[[1]]
[1] "item 1" "item 2"

[[2]]
[1] "item 3"

data=cbind(c(1,1,1,1,1,3),c(1,1,1,1,1,1),c(1,1,1,2,1,2)) 
colnames(data)=paste("item",1:3) 
rownames(data)=paste("method",1:6)
data
consensus.final(data)$group

[[1]]
[1] "item 1" "item 2" "item 3"

data=cbind(c(1,3,2,1),c(2,2,3,3),c(3,1,1,2))
colnames(data)=paste("item",1:3)
rownames(data)=paste("method",1:4)
data
consensus.final(data)$group

[[1]]
[1] "item 1"

[[2]]
[1] "item 2"

[[3]]
[1] "item 3"