MySQL:计算Venn图集幂的有效方法
给定4个表,每个表包含项目并代表一组,如何获得绘制维恩图所需的每个隔间中的项目计数,如下所示。计算应该在MySQL服务器中进行,避免将项目传输到应用程序服务器 示例表:MySQL:计算Venn图集幂的有效方法,mysql,venn-diagram,set-intersection,Mysql,Venn Diagram,Set Intersection,给定4个表,每个表包含项目并代表一组,如何获得绘制维恩图所需的每个隔间中的项目计数,如下所示。计算应该在MySQL服务器中进行,避免将项目传输到应用程序服务器 示例表: s1: s2: s3: s4: +------+ +------+ +------+ +------+ | item | | item | | item | | item | +------+ +------+ +------+
s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |
现在,我想我会计算一些设定的幂。一些示例中,I
对应于s1
,II
到s2
,III
到s3
和IV
到s4
:
如果我将sx
重新解释为一个集合,我会写:
| s1∩ s2∩ s3∩ s4 |
-中间的白色25|(s1∩ s2∩ s4)\s3 |
-右侧下方相对于中心的白色15|(s1∩ s4)\(s2∪ s3)|
-底部的白色5| s1\(s2∪ s3∪ s4)|
-蓝色地面上的深蓝色60SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);
另一个查询是2
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s4
EXCEPT
SELECT item FROM s3);
依此类推,产生15个查询。以下过程:
我现在想到的是试试这样的方法:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
免责声明:我只在SQLite中测试过这个。您可能需要将sql\u mode='PIPES\u设置为\u CONCAT'
,以便ANSI字符串连接在MySQL中工作,或者改用CONCAT
函数。仅从MySQL的8.0版开始支持WITH
语法,但您可以适当地使用临时表或嵌套查询
如果集合非常大,您可能希望在查询之前为
项
列编制索引,以防SQL优化器无法自己找到它。问题有点复杂,因此答案非常简单。让我解释一下K.T.的答案
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
universe
导致所有表的并集(消除重复项),类似于
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
然后,s1、s2、s3和s4接合
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
并转换为一个称为区域的二进制字符串(0:if cell为NULL;1:else),其中第一个数字对应于s1,第二个对应于s2,依此类推
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
最后按区域进行汇总和分组
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
请注意,其中包含0个集合元素的区域不会显示在结果中,0000
永远不会(=项目不属于任何集合s1、s2、s3、s4)因此有15个区域
如果有人令人信服地告诉我,用博士后做这件事会容易得多,我会相应地改变这个问题。它应该读为“开源DBMS:…”,但这太宽泛了。MySQL中除了
之外没有INTERSECT
和。所以,您可以使用其他RDBMS,它提供了这些功能。@MadhurBhaiya没有意识到这一点。MariaDB在10.3.1中引入了set操作。当前解决方案: