R:从标签的宽列表到连接的长列表
我有一个数据框,其中包含由1或0表示的标记的个人首选项:R:从标签的宽列表到连接的长列表,r,tags,tidyr,R,Tags,Tidyr,我有一个数据框,其中包含由1或0表示的标记的个人首选项: mydata <- data.frame( ID = c(1:4), tag1 = c(1, 0, 1, 0), tag2 = c(0, 0, 0, 0), tag3 = c(1, 0, 1, 1), tag4 = c(1, 1, 0, 1), tag5 = c(0, 1, 1, 1) ) 我的数据有很多标签,而不仅仅是5个 对于网络图,我正在寻找一种方法,将宽格式数据转换为一行中每
mydata <- data.frame(
ID = c(1:4),
tag1 = c(1, 0, 1, 0),
tag2 = c(0, 0, 0, 0),
tag3 = c(1, 0, 1, 1),
tag4 = c(1, 1, 0, 1),
tag5 = c(0, 1, 1, 1)
)
我的数据有很多标签,而不仅仅是5个
对于网络图,我正在寻找一种方法,将宽格式数据转换为一行中每对tag=1之间出现的长格式列表。对于上面的示例,它将如下所示:
mydata2 <- data.frame(
ID = c(1,1,1,2,3,3,3,4,4,4),
target = c("tag1","tag1","tag3","tag4","tag1","tag1","tag3","tag3","tag3","tag4"),
source = c("tag3","tag4","tag4","tag5","tag3","tag5","tag5","tag4","tag5","tag5")
)
library(reshape2)
M <- melt(mydata, id.vars = "ID")
M2 <- M[M$value > 0, 1:2]
MS <- split(M2$variable, M2$ID)
do.call(rbind,
lapply(names(MS), function(x) {
data.frame(ID = x, t(combn(as.character(MS[[x]]), 2)))
}))
我想使用tidyr的gather来实现这一点,但不知道如何将其用于成对的列。我可以为每一对创建新的变量并收集它们,但是对于一长串的标签,这将变得不切实际。有没有更优雅的方法?或者甚至是一个特定的函数?下面是一个基于应用的答案,用于将函数应用于每一行和组合,2查找所有对
ll <- apply(mydata,1,
function(x){
if(sum(x[-1])<2)
# otherwise you'll get errors if there are less than two
# elements selected
return(NULL)
tmp = combn(names(x[-1])[ !!(x[-1]) ],# see note below
2) # pairs
# the return value
data.frame(ID = x['ID'],
target = tmp[1,],
source = tmp[2,],
# otherwise you get names warning, which is
# annoying.j
check.names=FALSE)
})
# bind the individual results together
do.call(rbind,ll)
#> ID target source
#> 1 tag1 tag3
#> 1 tag1 tag4
#> 1 tag3 tag4
#> 2 tag4 tag5
#> 3 tag1 tag3
#> 3 tag1 tag5
#> 3 tag3 tag5
#> 4 tag3 tag4
#> 4 tag3 tag5
#> 4 tag4 tag5
注意!!x是一种标准的JavaScript技巧,用于将值强制为
逻辑,在R中也有效。使用tidyr/dplyr的选项
我能想到的最直接的方法是使用重塑2的melt,然后再次使用combn和melt: 这里有一种使用data.table的方法,尽管使用了未报告的函数vecseq
你能解释一下我们应该如何从特定的数据输入到特定的数据输出吗?
library(reshape2)
M <- melt(mydata, id.vars = "ID") ## Melt the dataset
M2 <- M[M$value > 0, 1:2] ## Subset the rows of interest
melt(
lapply(
split(M2$variable, M2$ID), function(x) {
data.frame(t(combn(as.character(x), 2))) ## Inside, we use combn
}), id.vars = c("X1", "X2")) ## We know we'll just have X1 and X2
# X1 X2 L1
# 1 tag1 tag3 1
# 2 tag1 tag4 1
# 3 tag3 tag4 1
# 4 tag4 tag5 2
# 5 tag1 tag3 3
# 6 tag1 tag5 3
# 7 tag3 tag5 3
# 8 tag3 tag4 4
# 9 tag3 tag5 4
# 10 tag4 tag5 4
library(reshape2)
M <- melt(mydata, id.vars = "ID")
M2 <- M[M$value > 0, 1:2]
MS <- split(M2$variable, M2$ID)
do.call(rbind,
lapply(names(MS), function(x) {
data.frame(ID = x, t(combn(as.character(MS[[x]]), 2)))
}))
require(data.table)
foo <- function(x) {
lx = length(x)
idx1 = data.table:::vecseq(rep.int(1L, lx), (lx-1L):0L, NULL)
idx2 = data.table:::vecseq(c(seq_len(lx)[-1L], 1L), (lx-1L):0L, NULL)
list(x[idx1], x[idx2])
}
melt(dt, id="ID")[value == 1L, foo(variable), by=ID]
# ID V1 V2
# 1: 1 tag1 tag3
# 2: 1 tag3 tag4
# 3: 1 tag1 tag4
# 4: 3 tag1 tag3
# 5: 3 tag3 tag5
# 6: 3 tag1 tag5
# 7: 4 tag3 tag4
# 8: 4 tag4 tag5
# 9: 4 tag3 tag5
# 10: 2 tag4 tag5