使用R确定所有可能的唯一二元组合中共享逻辑值的数量
我有一个包含组和逻辑向量的数据框,用于评估它们是否位于每个区域使用R确定所有可能的唯一二元组合中共享逻辑值的数量,r,dataframe,social-networking,R,Dataframe,Social Networking,我有一个包含组和逻辑向量的数据框,用于评估它们是否位于每个区域 # Create data frame Group = c('Group1', 'Group2', 'Group3', 'Group4') Area1 = c(TRUE, FALSE, TRUE, FALSE) Area2 = c(TRUE, TRUE, FALSE, FALSE) Area3 = c(FALSE, TRUE, FALSE, FALSE) Area4 = c(FALSE, FALSE, FALSE, TRUE
# Create data frame
Group = c('Group1', 'Group2', 'Group3', 'Group4')
Area1 = c(TRUE, FALSE, TRUE, FALSE)
Area2 = c(TRUE, TRUE, FALSE, FALSE)
Area3 = c(FALSE, TRUE, FALSE, FALSE)
Area4 = c(FALSE, FALSE, FALSE, TRUE)
df = data.frame(Group, Area1, Area2, Area3, Area4)
# Generate unique combinations of Groups
links <- expand.grid(df$Group, df$Group) #generates all possible combination
links$key <- apply(links, 1, function(x)paste(sort(x), collapse=''))
undirected <- subset(links, !duplicated(links$key))
undirected$ID <- seq.int(nrow(undirected))
我不确定我是否正确理解了你的问题。数据结构令人困惑。标题为
Group2Group4
的二元{i=2,j=4}
真的有共同的区域3和4吗?我想不会
我不确定这里是否真的需要igraph
。然而,这可以设置为一个像G(V)这样的二部网络₁,v₂,E)
区分区域∈ v₁代码>来自组∈ v₂代码>并使二元始终从区域到组运行:eⁱʲ ∈ E我∈ v₁; J∈ v₂代码>。然后,通过列出每个组节点的邻域来获得共享区域,并通过计算共享区域的度数来获得共享区域的数量
如果你真的,真的想在代码中看到这一点,我会在有时间的时候发布
同时,我认为这是你喜欢的。我不会在这项比赛中赢得任何代码高尔夫比赛,但如果我正确理解了你的问题,它会起作用:
# Make that same data
Group = c('Group1', 'Group2', 'Group3', 'Group4')
Area1 = c(TRUE, FALSE, TRUE, FALSE)
Area2 = c(TRUE, TRUE, FALSE, FALSE)
Area3 = c(FALSE, TRUE, FALSE, FALSE)
Area4 = c(FALSE, FALSE, FALSE, TRUE)
df = data.frame(Group, Area1, Area2, Area3, Area4)
# Take two groups (by number) and list the areas they have in common
is.shared <- function(i, j){
# Make a dataframe with two rows (one for i and one for j) where
# The order of the areas are multiplied with the boolean that indicates
# if the group resides in area x. If so, set x, if not, set 0.
dyad <- as.data.frame(matrix(rep(2:ncol(df)-1,2), nrow=2, byrow=T)) * df[c(i,j),2:5]
# The shared areas is the intersection of the two sets
shared.areas <- intersect(as.numeric(dyad[1,]), as.numeric(dyad[2,]))
}
# Take a vector of area-numbers and return a string that lists them.
# c(2,4,0) becomes "Area2, Area4".
list.areas <- function(vector){
result = c()
for(area in vector){
if(area != 0){
result <- c(result, paste("Area", area, sep=""))
}
}
paste(result, collapse=", ")
}
# Make a matrix of all possible dyadic combinations (two-way)
dyads <- expand.grid(1:nrow(df), 2:ncol(df)-1)
names(dyads) <- c("Group i", "Group j")
# Each row contains a dyad - a pair (i, and j) of groups.
# Generate a unique dyadic key
dyads$Key <- apply(dyads, 1, function(x) paste(sort(x), collapse='->'))
# For each row of dyads, that is to say, for each pair (i,j), check if
# any areas are shared using is.shared(), and convert the result to a
# string using list.areas()
dyads$Shared_Areas <- sapply(1:nrow(dyads), function(x)
list.areas(is.shared(dyads[x,1], dyads[x,2]) )
)
# Count the number of shared areas by splitting the string by commas
dyads$Shared_Area_Nums <- sapply(dyads$Shared_Areas, function(x)
length(strsplit(x,",")[[1]])
)
# Not that it's not as safe to count the result of is.shared() directly.
# If two groups share ALL areas with each other, no 0 will be returned in
# the vector. If we asume that no two groups reside in all areas, it would
# also be ok to generate dyad$Shared_Areas like this:
dyads$Shared_Areas_Unsafe <- sapply(1:nrow(dyads), function(x)
length(is.shared(dyads[x,1], dyads[x,2]))
) - 1
# Rename columns
dyads <- dyads[,c("Group i","Group j", "Key", "Shared_Area_Nums",
"Shared_Areas_Unsafe", "Shared_Areas")]
#生成相同的数据
组=c('Group1'、'Group2'、'Group3'、'Group4')
区域1=c(真、假、真、假)
区域2=c(真、真、假、假)
区域3=c(假、真、假、假)
区域4=c(假、假、假、真)
df=数据帧(组、区域1、区域2、区域3、区域4)
#分成两组(按编号),列出它们的共同点
我不确定我是否正确理解了你的问题。数据结构令人困惑。标题为Group2Group4
的二元{i=2,j=4}
真的有共同的区域3和4吗?我想不会
我不确定这里是否真的需要igraph
。然而,这可以设置为一个像G(V)这样的二部网络₁,v₂,E)
区分区域∈ v₁代码>来自组∈ v₂代码>并使二元始终从区域到组运行:eⁱʲ ∈ E我∈ v₁; J∈ v₂代码>。然后,通过列出每个组节点的邻域来获得共享区域,并通过计算共享区域的度数来获得共享区域的数量
如果你真的,真的想在代码中看到这一点,我会在有时间的时候发布
同时,我认为这是你喜欢的。我不会在这项比赛中赢得任何代码高尔夫比赛,但如果我正确理解了你的问题,它会起作用:
# Make that same data
Group = c('Group1', 'Group2', 'Group3', 'Group4')
Area1 = c(TRUE, FALSE, TRUE, FALSE)
Area2 = c(TRUE, TRUE, FALSE, FALSE)
Area3 = c(FALSE, TRUE, FALSE, FALSE)
Area4 = c(FALSE, FALSE, FALSE, TRUE)
df = data.frame(Group, Area1, Area2, Area3, Area4)
# Take two groups (by number) and list the areas they have in common
is.shared <- function(i, j){
# Make a dataframe with two rows (one for i and one for j) where
# The order of the areas are multiplied with the boolean that indicates
# if the group resides in area x. If so, set x, if not, set 0.
dyad <- as.data.frame(matrix(rep(2:ncol(df)-1,2), nrow=2, byrow=T)) * df[c(i,j),2:5]
# The shared areas is the intersection of the two sets
shared.areas <- intersect(as.numeric(dyad[1,]), as.numeric(dyad[2,]))
}
# Take a vector of area-numbers and return a string that lists them.
# c(2,4,0) becomes "Area2, Area4".
list.areas <- function(vector){
result = c()
for(area in vector){
if(area != 0){
result <- c(result, paste("Area", area, sep=""))
}
}
paste(result, collapse=", ")
}
# Make a matrix of all possible dyadic combinations (two-way)
dyads <- expand.grid(1:nrow(df), 2:ncol(df)-1)
names(dyads) <- c("Group i", "Group j")
# Each row contains a dyad - a pair (i, and j) of groups.
# Generate a unique dyadic key
dyads$Key <- apply(dyads, 1, function(x) paste(sort(x), collapse='->'))
# For each row of dyads, that is to say, for each pair (i,j), check if
# any areas are shared using is.shared(), and convert the result to a
# string using list.areas()
dyads$Shared_Areas <- sapply(1:nrow(dyads), function(x)
list.areas(is.shared(dyads[x,1], dyads[x,2]) )
)
# Count the number of shared areas by splitting the string by commas
dyads$Shared_Area_Nums <- sapply(dyads$Shared_Areas, function(x)
length(strsplit(x,",")[[1]])
)
# Not that it's not as safe to count the result of is.shared() directly.
# If two groups share ALL areas with each other, no 0 will be returned in
# the vector. If we asume that no two groups reside in all areas, it would
# also be ok to generate dyad$Shared_Areas like this:
dyads$Shared_Areas_Unsafe <- sapply(1:nrow(dyads), function(x)
length(is.shared(dyads[x,1], dyads[x,2]))
) - 1
# Rename columns
dyads <- dyads[,c("Group i","Group j", "Key", "Shared_Area_Nums",
"Shared_Areas_Unsafe", "Shared_Areas")]
#生成相同的数据
组=c('Group1'、'Group2'、'Group3'、'Group4')
区域1=c(真、假、真、假)
区域2=c(真、真、假、假)
区域3=c(假、真、假、假)
区域4=c(假、假、假、真)
df=数据帧(组、区域1、区域2、区域3、区域4)
#分成两组(按编号),列出它们的共同点
是.shared使用tcrossprod(as.matrix(df[-1]))可以获得的存在/重叠的基本计数。
-我想您可能需要igraph
或其他东西来获得更详细的输出。使用tcrossprod(as.matrix(df[-1])可以获得存在/重叠的基本计数
-我想您可能需要igraph
或其他东西才能获得更详细的输出。