使用R确定所有可能的唯一二元组合中共享逻辑值的数量

使用R确定所有可能的唯一二元组合中共享逻辑值的数量,r,dataframe,social-networking,R,Dataframe,Social Networking,我有一个包含组和逻辑向量的数据框,用于评估它们是否位于每个区域 # Create data frame Group = c('Group1', 'Group2', 'Group3', 'Group4') Area1 = c(TRUE, FALSE, TRUE, FALSE) Area2 = c(TRUE, TRUE, FALSE, FALSE) Area3 = c(FALSE, TRUE, FALSE, FALSE) Area4 = c(FALSE, FALSE, FALSE, TRUE

我有一个包含组和逻辑向量的数据框,用于评估它们是否位于每个区域

# Create data frame
Group = c('Group1', 'Group2', 'Group3', 'Group4') 
Area1 = c(TRUE, FALSE, TRUE, FALSE) 
Area2 = c(TRUE, TRUE, FALSE, FALSE) 
Area3 = c(FALSE, TRUE, FALSE, FALSE) 
Area4 = c(FALSE, FALSE, FALSE, TRUE) 
df = data.frame(Group, Area1, Area2, Area3, Area4) 

# Generate unique combinations of Groups
links <- expand.grid(df$Group, df$Group) #generates all possible combination
links$key <- apply(links, 1, function(x)paste(sort(x), collapse='')) 
undirected <- subset(links, !duplicated(links$key)) 
undirected$ID <- seq.int(nrow(undirected))

我不确定我是否正确理解了你的问题。数据结构令人困惑。标题为
Group2Group4
的二元
{i=2,j=4}
真的有共同的区域3和4吗?我想不会

我不确定这里是否真的需要
igraph
。然而,这可以设置为一个像
G(V)这样的二部网络₁,v₂,E) 
区分
区域∈ v₁来自
组∈ v₂并使二元始终从区域到组运行:
eⁱʲ ∈ E我∈ v₁; J∈ v₂。然后,通过列出每个组节点的邻域来获得共享区域,并通过计算共享区域的度数来获得共享区域的数量

如果你真的,真的想在代码中看到这一点,我会在有时间的时候发布

同时,我认为这是你喜欢的。我不会在这项比赛中赢得任何代码高尔夫比赛,但如果我正确理解了你的问题,它会起作用:

# Make that same data
Group = c('Group1', 'Group2', 'Group3', 'Group4') 
Area1 = c(TRUE, FALSE, TRUE, FALSE) 
Area2 = c(TRUE, TRUE, FALSE, FALSE) 
Area3 = c(FALSE, TRUE, FALSE, FALSE) 
Area4 = c(FALSE, FALSE, FALSE, TRUE) 
df = data.frame(Group, Area1, Area2, Area3, Area4)

# Take two groups (by number) and list the areas they have in common
is.shared <- function(i, j){
    # Make a dataframe with two rows (one for i and one for j) where
    # The order of the areas are multiplied with the boolean that indicates
    # if the group resides in area x. If so, set x, if not, set 0.
    dyad <- as.data.frame(matrix(rep(2:ncol(df)-1,2), nrow=2, byrow=T)) * df[c(i,j),2:5]
    # The shared areas is the intersection of the two sets
    shared.areas <- intersect(as.numeric(dyad[1,]), as.numeric(dyad[2,]))
}

# Take a vector of area-numbers and return a string that lists them.
# c(2,4,0) becomes "Area2, Area4".
list.areas <- function(vector){
    result = c()
    for(area in vector){
        if(area != 0){
            result <- c(result, paste("Area", area, sep=""))
        }
    }
    paste(result, collapse=", ")
}


# Make a matrix of all possible dyadic combinations (two-way)
dyads <- expand.grid(1:nrow(df), 2:ncol(df)-1)
names(dyads) <- c("Group i", "Group j")
# Each row contains a dyad - a pair (i, and j) of groups.

# Generate a unique dyadic key
dyads$Key <- apply(dyads, 1, function(x) paste(sort(x), collapse='->'))

# For each row of dyads, that is to say, for each pair (i,j), check if
# any areas are shared using is.shared(), and convert the result to a
# string using list.areas()
dyads$Shared_Areas <- sapply(1:nrow(dyads), function(x)
    list.areas(is.shared(dyads[x,1], dyads[x,2]) )
)

# Count the number of shared areas by splitting the string by commas
dyads$Shared_Area_Nums <- sapply(dyads$Shared_Areas, function(x)
    length(strsplit(x,",")[[1]])
    )

# Not that it's not as safe to count the result of is.shared() directly.
# If two groups share ALL areas with each other, no 0 will be returned in
# the vector. If we asume that no two groups reside in all areas, it would
# also be ok to generate dyad$Shared_Areas like this:
dyads$Shared_Areas_Unsafe <- sapply(1:nrow(dyads), function(x)
    length(is.shared(dyads[x,1], dyads[x,2]))
) - 1

# Rename columns
dyads <- dyads[,c("Group i","Group j", "Key", "Shared_Area_Nums",
            "Shared_Areas_Unsafe", "Shared_Areas")]
#生成相同的数据
组=c('Group1'、'Group2'、'Group3'、'Group4')
区域1=c(真、假、真、假)
区域2=c(真、真、假、假)
区域3=c(假、真、假、假)
区域4=c(假、假、假、真)
df=数据帧(组、区域1、区域2、区域3、区域4)
#分成两组(按编号),列出它们的共同点

我不确定我是否正确理解了你的问题。数据结构令人困惑。标题为
Group2Group4
的二元
{i=2,j=4}
真的有共同的区域3和4吗?我想不会

我不确定这里是否真的需要
igraph
。然而,这可以设置为一个像
G(V)这样的二部网络₁,v₂,E) 
区分
区域∈ v₁来自
组∈ v₂并使二元始终从区域到组运行:
eⁱʲ ∈ E我∈ v₁; J∈ v₂。然后,通过列出每个组节点的邻域来获得共享区域,并通过计算共享区域的度数来获得共享区域的数量

如果你真的,真的想在代码中看到这一点,我会在有时间的时候发布

同时,我认为这是你喜欢的。我不会在这项比赛中赢得任何代码高尔夫比赛,但如果我正确理解了你的问题,它会起作用:

# Make that same data
Group = c('Group1', 'Group2', 'Group3', 'Group4') 
Area1 = c(TRUE, FALSE, TRUE, FALSE) 
Area2 = c(TRUE, TRUE, FALSE, FALSE) 
Area3 = c(FALSE, TRUE, FALSE, FALSE) 
Area4 = c(FALSE, FALSE, FALSE, TRUE) 
df = data.frame(Group, Area1, Area2, Area3, Area4)

# Take two groups (by number) and list the areas they have in common
is.shared <- function(i, j){
    # Make a dataframe with two rows (one for i and one for j) where
    # The order of the areas are multiplied with the boolean that indicates
    # if the group resides in area x. If so, set x, if not, set 0.
    dyad <- as.data.frame(matrix(rep(2:ncol(df)-1,2), nrow=2, byrow=T)) * df[c(i,j),2:5]
    # The shared areas is the intersection of the two sets
    shared.areas <- intersect(as.numeric(dyad[1,]), as.numeric(dyad[2,]))
}

# Take a vector of area-numbers and return a string that lists them.
# c(2,4,0) becomes "Area2, Area4".
list.areas <- function(vector){
    result = c()
    for(area in vector){
        if(area != 0){
            result <- c(result, paste("Area", area, sep=""))
        }
    }
    paste(result, collapse=", ")
}


# Make a matrix of all possible dyadic combinations (two-way)
dyads <- expand.grid(1:nrow(df), 2:ncol(df)-1)
names(dyads) <- c("Group i", "Group j")
# Each row contains a dyad - a pair (i, and j) of groups.

# Generate a unique dyadic key
dyads$Key <- apply(dyads, 1, function(x) paste(sort(x), collapse='->'))

# For each row of dyads, that is to say, for each pair (i,j), check if
# any areas are shared using is.shared(), and convert the result to a
# string using list.areas()
dyads$Shared_Areas <- sapply(1:nrow(dyads), function(x)
    list.areas(is.shared(dyads[x,1], dyads[x,2]) )
)

# Count the number of shared areas by splitting the string by commas
dyads$Shared_Area_Nums <- sapply(dyads$Shared_Areas, function(x)
    length(strsplit(x,",")[[1]])
    )

# Not that it's not as safe to count the result of is.shared() directly.
# If two groups share ALL areas with each other, no 0 will be returned in
# the vector. If we asume that no two groups reside in all areas, it would
# also be ok to generate dyad$Shared_Areas like this:
dyads$Shared_Areas_Unsafe <- sapply(1:nrow(dyads), function(x)
    length(is.shared(dyads[x,1], dyads[x,2]))
) - 1

# Rename columns
dyads <- dyads[,c("Group i","Group j", "Key", "Shared_Area_Nums",
            "Shared_Areas_Unsafe", "Shared_Areas")]
#生成相同的数据
组=c('Group1'、'Group2'、'Group3'、'Group4')
区域1=c(真、假、真、假)
区域2=c(真、真、假、假)
区域3=c(假、真、假、假)
区域4=c(假、假、假、真)
df=数据帧(组、区域1、区域2、区域3、区域4)
#分成两组(按编号),列出它们的共同点

是.shared使用
tcrossprod(as.matrix(df[-1]))可以获得的存在/重叠的基本计数。
-我想您可能需要
igraph
或其他东西来获得更详细的输出。使用
tcrossprod(as.matrix(df[-1])可以获得存在/重叠的基本计数
-我想您可能需要
igraph
或其他东西才能获得更详细的输出。