Mysql 如何在SQL中（递归地）确定等价分组？_Mysql_Sql

Mysql 如何在SQL中（递归地）确定等价分组？

mysql sql

Mysql 如何在SQL中（递归地）确定等价分组？,mysql,sql,Mysql,Sql,我有一个由SKU标识的产品列表。为了简化它，我只把它们命名为A，B，C，D，。。。在这里默认情况下，这些SKU中的每一个都分配了一个已经存在的GroupID，为简单起见，我只将它们编号为1、2、3、，。。。在这里相同的GroupID意味着“这些SKU是等效的，因此可以使用/购买其中任何一个，因为这没有区别” 问题是，一些SKU不止一次地出现，因为它们来自不同的购买来源，但由于它们来自不同的来源，它们有不同的分组因此，目标是合并分组，并确保它们具有相同的分组如果我的插图不够漂亮，我已经道歉了

我有一个由SKU标识的产品列表。为了简化它，我只把它们命名为A，B，C，D，。。。在这里默认情况下，这些SKU中的每一个都分配了一个已经存在的

GroupID

，为简单起见，我只将它们编号为1、2、3、，。。。在这里

相同的

GroupID

意味着“这些SKU是等效的，因此可以使用/购买其中任何一个，因为这没有区别”

问题是，一些SKU不止一次地出现，因为它们来自不同的购买来源，但由于它们来自不同的来源，它们有不同的分组

因此，目标是合并分组，并确保它们具有相同的分组

如果我的插图不够漂亮，我已经道歉了，但我正在努力。下面是一个关于原始数据外观的小数据表示例（第一行是列名）：

结果应该是：

    Source      SKU  GroupID
    Seller1      A      1
    Seller1      B      1
    Seller1      C      1
    Seller2      B      1
    Seller2      D      1
    Seller2      E      1
    Seller3      A      1
    Seller3      B      1
    Seller4      F      4
    Seller4      G      4
    Seller4      H      4

基本上，如果GroupID X中的任何SKU是GroupID Y的子集，那么GroupID Y=GroupID X。但这应该应用于所有GroupID，因此它似乎是递归的

我希望我能展示我已经试过的代码，我已经试了几天了，但实际上我只产生了垃圾

在C#中，我知道如何处理这一点，但我似乎无法对SQL进行思考，因为我没有那么丰富的经验，不幸的是，我需要SQL中的这一点

我会感谢你们的任何帮助，即使这只是你们建议我尝试的一个提示或方向。非常感谢

首先是根据计数获取所有具有子集的卖家。然后使用GROUPBY进行过滤

select table1.Source, SKU, case when table1.Source = t6.Source and t6.cnt > 1 then 1 else 2 end as GroupID
from table1
left join
  (select t5.Source, count(t5.cnt) as cnt from (
    select distinct t4.Source, t4.cnt from (
      select t3.Source, count(t3.SKU) as cnt from (
        select t1.Source, t1.SKU from table1 t1
        left join table1 t2 on t2.SKU = t1.SKU ) t3
      group by t3.Source, t3.SKU
      order by t3.Source) t4) as t5
   group by t5.Source) t6 on t6.Source = table1.Source

首先是让所有这些卖家根据数量进行子集。然后使用GROUPBY进行过滤

select table1.Source, SKU, case when table1.Source = t6.Source and t6.cnt > 1 then 1 else 2 end as GroupID
from table1
left join
  (select t5.Source, count(t5.cnt) as cnt from (
    select distinct t4.Source, t4.cnt from (
      select t3.Source, count(t3.SKU) as cnt from (
        select t1.Source, t1.SKU from table1 t1
        left join table1 t2 on t2.SKU = t1.SKU ) t3
      group by t3.Source, t3.SKU
      order by t3.Source) t4) as t5
   group by t5.Source) t6 on t6.Source = table1.Source

您需要组之间的对应关系，可以使用递归CTE进行计算：

with recursive tt as (
      select distinct t1.groupid as groupid1, t2.groupid as groupid2
      from t t1 join
           t t2
           on t1.sku = t2.sku 
     ),
     cte as (
      select tt.groupid1, tt.groupid2, concat_ws(',', tt.groupid1, tt.groupid2) as visited 
      from tt
      union all
      select cte.groupid1, tt.groupid2, concat_ws(',', visited, tt.groupid2)
      from cte join
           tt
           on cte.groupid2 = tt.groupid1
      where find_in_set(tt.groupid2, cte.visited) = 0
     )
select groupid1, min(groupid2) as overall_group
from cte
group by groupid1;

然后，您可以将其连接回原始表，以获得“总体组”：

他是一把小提琴

注意：您的示例数据相当“完整”，因此对于该特定数据不需要递归CTE。但是，我猜实际组的重叠较少，在这种情况下，递归是必要的。

您需要组之间的对应关系，可以使用递归CTE计算：

with recursive tt as (
      select distinct t1.groupid as groupid1, t2.groupid as groupid2
      from t t1 join
           t t2
           on t1.sku = t2.sku 
     ),
     cte as (
      select tt.groupid1, tt.groupid2, concat_ws(',', tt.groupid1, tt.groupid2) as visited 
      from tt
      union all
      select cte.groupid1, tt.groupid2, concat_ws(',', visited, tt.groupid2)
      from cte join
           tt
           on cte.groupid2 = tt.groupid1
      where find_in_set(tt.groupid2, cte.visited) = 0
     )
select groupid1, min(groupid2) as overall_group
from cte
group by groupid1;

然后，您可以将其连接回原始表，以获得“总体组”：

他是一把小提琴

注意：您的示例数据相当“完整”，因此对于该特定数据不需要递归CTE。但是，我猜您的真实组重叠较少，在这种情况下，递归是必要的。

哪个版本的MySQL？@Nick我使用的是最新的8.0社区版版本Seller2没有任何子集，为什么它会返回groupID 1？@EdBangga B是groupID 1的一个子集我花了一些时间浏览了@GordonLinoff的代码并理解了它，但我感谢大家在这里的投入。非常感谢。哪个版本的MySQL？@Nick我使用的是最新的8.0社区版版本Seller2没有任何子集，为什么它返回groupID 1？@EdBangga B是groupID 1的一个子集我花了一些时间浏览了@GordonLinoff的代码并理解了它，但我感谢大家在这里的输入。非常感谢。你说得对，真正的组重叠较少，因此递归是必要的。非常感谢，你的回答非常有帮助，也很有见地：）你是对的，真正的组重叠较少，因此递归是必要的。非常感谢，你的回答很有帮助，也很有见地：）