在R中,将一列值与所有其他列值进行比较

在R中,将一列值与所有其他列值进行比较,r,R,我对R很陌生,我有一个问题对于这里的专家来说可能很简单 假设我有一个表“sales”,其中包括4个客户ID(123-126)和4个产品(a、B、C、D) 我想计算产品之间的重叠。因此,对于A,同时具有A和B的id的数量将是2。类似地,A和C之间的重叠将为0,A和D之间的重叠将为1。以下是我的A和B重叠代码: overlap <- sales [which(sales [,"A"] == 1 & sales [,"B"] == 1 ),] countAB <- count(ov

我对R很陌生,我有一个问题对于这里的专家来说可能很简单

假设我有一个表“sales”,其中包括4个客户ID(123-126)和4个产品(a、B、C、D)

我想计算产品之间的重叠。因此,对于A,同时具有A和B的id的数量将是2。类似地,A和C之间的重叠将为0,A和D之间的重叠将为1。以下是我的A和B重叠代码:

overlap <- sales [which(sales [,"A"] == 1 & sales [,"B"] == 1 ),]
countAB <- count(overlap,"ID")

overlap您可能想看看arules软件包。它正是你想要的。

提供用于表示、操作和分析事务数据和模式(频繁项集和关联规则)的基础结构。还为C.Borgelt的关联挖掘算法Apriori和Eclat的C实现提供接口。

以下是一个可能的解决方案:

sales <- 
read.csv(text=
"ID,A,B,C,D
123,0,1,1,0
124,1,1,0,0
125,1,1,0,1
126,0,0,0,1")

# get product names
prods <- colnames(sales)[-1]
# generate all products pairs (and transpose the matrix for convenience)
combs <- t(combn(prods,2))

# turn the combs into a data.frame with column P1,P2
res <- as.data.frame(combs)
colnames(res) <- c('P1','P2')  

# for each combination row :
# - subset sales selecting only the products in the row
# - count the number of rows summing to 2 (if sum=2 the 2 products have been sold together)
#   N.B.: length(which(logical_condition)) can be implemented with sum(logical_condition) 
#         since TRUE and FALSE are automatically coerced to 1 and 0
# finally add the resulting vector to the newly created data.frame
res$count <- apply(combs,1,function(comb){sum(rowSums(sales[,comb])==2)})

> res
  P1 P2 count
1  A  B     2
2  A  C     0
3  A  D     1
4  B  C     1
5  B  D     1
6  C  D     0
sales
#x1是您的数据帧

x1您可以使用矩阵乘法:

m <- as.matrix(d[-1])
z <- melt(crossprod(m,m))
z[as.integer(z$X1) < as.integer(z$X2),]
#    X1 X2 value
# 5   A  B     2
# 9   A  C     0
# 10  B  C     1
# 13  A  D     1
# 14  B  D     1
# 15  C  D     0
[更新]

要计算产品亲和力,可以执行以下操作:

z2 <- subset(z,X1!=X2)
do.call(rbind,lapply(split(z2,z2$X1),function(d) d[which.max(d$value),]))
#   X1 X2 value
# A  A  B     2
# B  B  A     2
# C  C  B     1
# D  D  A     1

z2非常感谢大家的及时回复!我很快就要用完了,但我回来后会仔细检查每个解决方案。再次感谢!
m <- as.matrix(d[-1])
z <- melt(crossprod(m,m))
z[as.integer(z$X1) < as.integer(z$X2),]
#    X1 X2 value
# 5   A  B     2
# 9   A  C     0
# 10  B  C     1
# 13  A  D     1
# 14  B  D     1
# 15  C  D     0
d <- structure(list(ID = 123:126, A = c(0L, 1L, 1L, 0L), B = c(1L, 1L, 1L, 0L), C = c(1L, 0L, 0L, 0L), D = c(0L, 0L, 1L, 1L)), .Names = c("ID", "A", "B", "C", "D"), class = "data.frame", row.names = c(NA, -4L))
z2 <- subset(z,X1!=X2)
do.call(rbind,lapply(split(z2,z2$X1),function(d) d[which.max(d$value),]))
#   X1 X2 value
# A  A  B     2
# B  B  A     2
# C  C  B     1
# D  D  A     1