如何从R和SAS中的两对列中获取相关矩阵?对角线为零

如何从R和SAS中的两对列中获取相关矩阵?对角线为零,r,sas,R,Sas,我有一个如下所示的数据框;我用R把两列转移到一个矩阵,但R不能给我矩阵。(我期望的矩阵大约是700*700。)R停止并显示达到12213Mb的总分配:请参阅帮助(memory.size) 我想在SAS做同样的事情。我们怎么能做到呢?或者我需要不同的代码才能在R中完成此任务 ID_r ID_c SCORE A1 A2 0.2 A1 A3 0.2 A1 A4 0.3 A1 A5 0.2 A1 A6 0.2 A2 A3 0.6 A2 A4 0.2

我有一个如下所示的数据框;我用R把两列转移到一个矩阵,但R不能给我矩阵。(我期望的矩阵大约是700*700。)R停止并显示
达到12213Mb的总分配:请参阅帮助(memory.size)

我想在SAS做同样的事情。我们怎么能做到呢?或者我需要不同的代码才能在R中完成此任务

ID_r ID_c SCORE
A1   A2   0.2
A1   A3   0.2
A1   A4   0.3
A1   A5   0.2
A1   A6   0.2
A2   A3   0.6
A2   A4   0.2
A2   A5   0.2
A2   A6   0.2
A3   A4   0.2
A3   A5   0.2
A3   A6   0.2
A4   A5   0.2
A4   A6   0.9
A5   A6   0.2

    ID_r<-c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5')
    ID_c<-c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6')
    SCORE<-c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2)

library(dplyr); library(tidyr)
df$ID_r <- as.character(df$ID_r)
df$ID_c <- as.character(df$ID_c)
ID <- unique(c(df$ID_r, df$ID_c))
diagDf <- data.frame(ID_r = ID, ID_c = ID, SCORE = "0.0")
newDf <- rbind(df, diagDf) %>% arrange(ID_r, ID_c)

resultDf <- spread(newDf, ID_r, SCORE, fill = ".")
names(resultDf)[1] <- ""
resultDf
我想用这两列数据生成如下矩阵(对角线为零)


提前谢谢

PROC TRANSPOSE
是您的朋友

proc transpose data=score_data out=score_matrix;
  by id_r; 
  id id_c; *this makes variable names;
  var score;
run;
这将为您提供上对角线。第二个
proc transpose
可以为您提供较低的对角线(交换
id\u r
id\u c
),或者您可以在数据集中执行。您仍然需要在数据集中创建六个0.0行,但这应该不是特别困难

这样做的一个例子:

data pre_transpose;
  set score_data end=eof;
  by id_r id_c;
  output;

  *Swap R and C;
  _idtemp = id_r;
  id_r=id_c;
  id_c=_idtemp;
  output;

  *If EOF, then need that last 0,0 combo which never gets an R;
   if eof then do;
    id_c = id_r;
    score=0;
    output;
    id_c = _idtemp;
  end;

  *If first line of a new ID, then need the R=C row;
  if first.id_r then do;
    id_r=id_c;
    score=0;
    output;
  end;

run;

proc sort data=pre_transpose;
  by id_r id_c;
run;
proc transpose data=pre_transpose out=score_matrix;
  by id_r; 
  id id_c; *this makes variable names;
  var score;
run;

R
解决方案:

library(plyr)
ID_r = c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5')
ID_c = c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6')
SCORE = c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2)
df1 = data.frame(ID_r, ID_c, SCORE)
df2 = data.frame(ID_c, ID_r, SCORE)
names(df2) = c("ID_r","ID_c","SCORE")
df = rbind(df1,df2)
ID <- unique(c(ID_r, ID_c))

df1 = expand.grid(ID,ID)
names(df1) = c("ID_r","ID_c")
d = join(df1, df, by = c("ID_r","ID_c"))
d$SCORE[is.na(d$SCORE)] <- 0

a = matrix(0, nrow = length(ID), ncol = length(ID))
rownames(a) <- ID
colnames(a) <- ID
a

b = as.matrix(d)
b

a[b[,1:2]] <- b[,3]
a
库(plyr)
ID_r=c('A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A4','A4','A5'))
ID_c=c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6'))
得分=c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.9,0.2)
df1=数据帧(ID\u r,ID\u c,分数)
df2=数据帧(ID\u c,ID\u r,分数)
姓名(df2)=c(“身份证”、“身份证”、“分数”)
df=rbind(df1,df2)

ID
join
需要
plyr
包。
join
默认使用类型
left
。这就是你在这个问题上需要的。您遇到了什么错误?请使用已编辑的答案。您需要
库(plyr)
。您也可以转到RStudio中的“软件包”选项卡,并选中
plyr
中的复选框。您可以提供更多详细信息吗?您原来的问题没有任何1-12行/列的内容谢谢!!!将id从A1更改为1后,它将完美地生成csv矩阵!!!非常感谢你!!!这太棒了。谢谢你!!!!它工作得很好!!!!非常感谢你。我从你的答案中学到了很多。
data pre_transpose;
  set score_data end=eof;
  by id_r id_c;
  output;

  *Swap R and C;
  _idtemp = id_r;
  id_r=id_c;
  id_c=_idtemp;
  output;

  *If EOF, then need that last 0,0 combo which never gets an R;
   if eof then do;
    id_c = id_r;
    score=0;
    output;
    id_c = _idtemp;
  end;

  *If first line of a new ID, then need the R=C row;
  if first.id_r then do;
    id_r=id_c;
    score=0;
    output;
  end;

run;

proc sort data=pre_transpose;
  by id_r id_c;
run;
proc transpose data=pre_transpose out=score_matrix;
  by id_r; 
  id id_c; *this makes variable names;
  var score;
run;
library(plyr)
ID_r = c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5')
ID_c = c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6')
SCORE = c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2)
df1 = data.frame(ID_r, ID_c, SCORE)
df2 = data.frame(ID_c, ID_r, SCORE)
names(df2) = c("ID_r","ID_c","SCORE")
df = rbind(df1,df2)
ID <- unique(c(ID_r, ID_c))

df1 = expand.grid(ID,ID)
names(df1) = c("ID_r","ID_c")
d = join(df1, df, by = c("ID_r","ID_c"))
d$SCORE[is.na(d$SCORE)] <- 0

a = matrix(0, nrow = length(ID), ncol = length(ID))
rownames(a) <- ID
colnames(a) <- ID
a

b = as.matrix(d)
b

a[b[,1:2]] <- b[,3]
a