SQL like:如何计算<;的交集和并集;项目,用户>;数据
需要SQL方面的帮助: 我有一个包含以下列的数据:SQL like:如何计算<;的交集和并集;项目,用户>;数据,sql,join,count,union,intersection,Sql,Join,Count,Union,Intersection,需要SQL方面的帮助: 我有一个包含以下列的数据: 项目ID 用户ID 每行表示某个用户购买了某个项目。 例如: 对于每一项,我想计算以下输出表: 用户(i):我购买的用户数 用户数(j):购买j的用户数 用户(i,j):同时购买i和j的用户数 用户(i,~j):购买i但不购买j的用户数 用户(~i,j):购买j但不购买i的用户数 输出示例(来自上面的示例): 注意: 数据表是巨大的(11GB),位于云端。我有一个SQL框架可以使用。因此,我无法下载文件并运行python(例如) 因此,
- 项目ID
- 用户ID
- 用户(i):我购买的用户数
- 用户数(j):购买j的用户数
- 用户(i,j):同时购买i和j的用户数
- 用户(i,~j):购买i但不购买j的用户数
- 用户(~i,j):购买j但不购买i的用户数
我不确定是否有一个“简单”的方法来实现这一点。有一种方法是蛮力的:使用
交叉连接
生成所有行。然后对每个计数使用子查询:
select i1.itemid, i2.itemid, i1.num as cnt1, i2.num as cnt2,
(select count(*)
from t u1 join
t u2
on u1.userid = u2.userid
where u1.itemid = i1.itemid and u2.itemid = i2.itemid
) as cnt_1_2,
(select count(*)
from t u1 left join
t u2
on u1.userid = u2.userid and u2.itemid = i2.itemid
where u1.itemid = i1.itemid and u2.itemid is null
) as cnt_1_not2,
(select count(*)
from t u1 left join
t u2
on u1.userid = u2.userid and u1.itemid = i1.itemid
where u2.itemid = i2.itemid and u1.itemid is null
) as cnt_not1_2
from (select itemid, count(*) as num from t group by itemid) i1 cross join
(select itemid, count(*) as num from t group by itemid) i2;
这是食谱
1) 创建一个临时表以收集I和J总计
免责声明:此示例使用MS SQL server数据类型:INT.
因此,将其更改为RDBMS支持的数字类型。
顺便说一句,在MS SQL Server中,临时表以# 2) 填上总数 3) 自联接临时表以获取所有总计
如果您使用的是Oracle数据库,则可以将嵌套表(集合)和多集运算符进行比较。并以基数获取集合中的元素数 因此,您可以做的是:
- 按itemid分组,将所有用户收集到一个嵌套表中
- 将此输出与自身交叉连接
- 根据需要,使用multiset intersect/except操作符获取集合中的元素数
create table t (
ItemId int, UserId varchar2(10)
);
insert into t values ( 200 , 'user1');
insert into t values ( 200 , 'user3');
insert into t values ( 200 , 'user4');
insert into t values ( 300 , 'user5');
insert into t values ( 300 , 'user3');
commit;
create or replace type users_t as table of varchar2(10);
/
with grps as (
select itemid, cast ( collect ( userid ) as users_t ) users
from t
group by itemid
)
select g1.itemid i, g2.itemid j,
cardinality ( g1.users ) num_i,
cardinality ( g2.users ) num_j,
cardinality ( g1.users multiset intersect g2.users ) i_and_j,
cardinality ( g1.users multiset except g2.users ) i_not_j,
cardinality ( g2.users multiset except g1.users ) j_not_i
from grps g1
cross join grps g2;
I J NUM_I NUM_J I_AND_J I_NOT_J J_NOT_I
200 200 3 3 3 0 0
200 300 3 2 1 2 1
300 200 2 3 1 1 2
300 300 2 2 2 0 0
如有必要,可以在i=j时跳过except运算符,以获得更高的性能,例如:
case
when g1.itemid = g2.itemid then 0
else cardinality ( g1.users multiset intersect g2.users )
end
请您也对i&J的数据进行采样,以便快速查看。请说明并使用您正在使用的SQL语言/环境。我举了一个小例子@rajatswalwhat@RajatJaiswalWhat,如果用户不止一次购买了一件物品?这算一次还是实际购买的数量?你用的是哪种?“SQL”只是一种查询语言,而不是特定数据库产品的名称。请为您正在使用的数据库产品添加标记,
postgresql
,oracle
,sql server
,db2
,…感谢您在所有join语句中强制使用itemId==itemId。但我还需要交叉连接来获得每个项对的计算,它不必是一条SQL语句。不会的efficient@SamerAamar . . . from
子句中有一个交叉连接
,所以我不理解你评论的这一部分。使用正确的索引,单个查询应该相当有效。
create table TempTotals (iItemId int, jItemId int, TotalUsers int);
delete from TempTotals;
insert into TempTotals (iItemId, jItemId, TotalUsers)
select
t1.ItemId as iItemId,
t2.ItemId as jItemId,
count(distinct t1.UserId) as TotalUsers
from YourTable t1
full join YourTable t2 on (t1.UserId = t2.UserId)
group by t1.ItemId, t2.ItemId;
select
ij.iItemId,
ij.jItemId,
i.TotalUsers as Users_I,
j.TotalUsers as Users_J,
ij.TotalUsers as Users_I_and_J,
(i.TotalUsers - ij.TotalUsers) as Users_I_no_J,
(j.TotalUsers - ij.TotalUsers) as Users_J_no_I
from TempTotals ij
left join TempTotals i on (i.iItemId = ij.iItemId and i.iItemId = i.jItemId)
left join TempTotals j on (j.jItemId = ij.jItemId and j.iItemId = j.jItemId)
create table t (
ItemId int, UserId varchar2(10)
);
insert into t values ( 200 , 'user1');
insert into t values ( 200 , 'user3');
insert into t values ( 200 , 'user4');
insert into t values ( 300 , 'user5');
insert into t values ( 300 , 'user3');
commit;
create or replace type users_t as table of varchar2(10);
/
with grps as (
select itemid, cast ( collect ( userid ) as users_t ) users
from t
group by itemid
)
select g1.itemid i, g2.itemid j,
cardinality ( g1.users ) num_i,
cardinality ( g2.users ) num_j,
cardinality ( g1.users multiset intersect g2.users ) i_and_j,
cardinality ( g1.users multiset except g2.users ) i_not_j,
cardinality ( g2.users multiset except g1.users ) j_not_i
from grps g1
cross join grps g2;
I J NUM_I NUM_J I_AND_J I_NOT_J J_NOT_I
200 200 3 3 3 0 0
200 300 3 2 1 2 1
300 200 2 3 1 1 2
300 300 2 2 2 0 0
case
when g1.itemid = g2.itemid then 0
else cardinality ( g1.users multiset intersect g2.users )
end