从SQL Server检索数据，并基于分组在行上连接结果_Sql_Sql Server_Group By_Many To Many_Concatenation

从SQL Server检索数据，并基于分组在行上连接结果

sql sql-server

从SQL Server检索数据，并基于分组在行上连接结果,sql,sql-server,group-by,many-to-many,concatenation,Sql,Sql Server,Group By,Many To Many,Concatenation,我已经研究一个问题好几天了，终于找到了一个适合我的解决方案。如果这个解决方案对其他人有用，我会问一个问题，然后自己回答我对包含超过100万条记录的大型SQL Server数据库具有只读访问权限。数据库中的一些表通过查找表以多对多关系进行链接。为了简化问题，可以如下所示对表格进行说明： table names |-----------| | id | name | |----|------| | 1 | dave | | 2 | phil | | 3 | john | tabl

我已经研究一个问题好几天了，终于找到了一个适合我的解决方案。如果这个解决方案对其他人有用，我会问一个问题，然后自己回答

我对包含超过100万条记录的大型SQL Server数据库具有只读访问权限。数据库中的一些表通过查找表以多对多关系进行链接。为了简化问题，可以如下所示对表格进行说明：

table names
|-----------|
| id | name |
|----|------|
|  1 | dave |
|  2 | phil |
|  3 | john |       table foods_relationship        table clothes_relationship
|  4 | pete |       |--------------------------|    |----------------------------|
|-----------|       | id | names_id | foods_id |    | id | names_id | clothes_id |
                    |----|----------|----------|    |----|----------|------------|
table foods         |  1 |        1 |        1 |    |  1 |        1 |          1 |
|---------------|   |  2 |        1 |        3 |    |  2 |        1 |          3 |
| id | food     |   |  3 |        1 |        4 |    |  3 |        1 |          4 |
|----|----------|   |  4 |        2 |        2 |    |  4 |        2 |          2 |
|  1 | beef     |   |  5 |        2 |        3 |    |  5 |        2 |          3 |
|  2 | tomatoes |   |  6 |        2 |        4 |    |  6 |        2 |          4 |
|  3 | bacon    |   |  7 |        2 |        5 |    |  7 |        3 |          1 |
|  4 | cheese   |   |  8 |        3 |        3 |    |  8 |        3 |          3 |
|  5 | apples   |   |  9 |        3 |        5 |    |  9 |        3 |          5 |
|---------------|   | 10 |        4 |        1 |    | 10 |        4 |          2 |
                    | 11 |        4 |        2 |    | 11 |        4 |          4 |
table clothes       | 12 |        4 |        3 |    | 12 |        4 |          5 |
|---------------|   | 13 |        4 |        5 |    |----------------------------|
| id | clothes  |   |--------------------------|
|----|----------|
|  1 | trousers |
|  2 | shorts   |
|  3 | shirt    |
|  4 | socks    |
|  5 | jumper   |
|  6 | jacket   |
|---------------|

可以使用以下SQL重新创建表（改编自MySQL数据库，因此可能需要稍加调整才能在SQL Server中工作）：

我想查询数据库并以某种方式获得以下输出：

|-------------------------------------------------------------|
| name | food                         | clothes               |
|------|------------------------------|-----------------------|
| dave | beef,cheese,bacon            | trousers,socks,shirt  |
| john | apples,bacon                 | jumper,shirt,trousers |
| pete | beef,apples,bacon,tomatoes   | shorts,jumper,socks   |
| phil | bacon,tomatoes,apples,cheese | shirt,shorts,socks    |
|-------------------------------------------------------------|

但是，运行一个SELECT查询，将“名称”表与一个或两个其他表（通过各自的查找表）联接，会导致每个名称有多行。例如：

SELECT
    names.name,
    foods.food

FROM
    names
    LEFT JOIN food_relationships ON names.id = food_relationships.names_id
    LEFT JOIN foods ON food_relationships.foods_id = foods.id;

…生成以下一组结果：

|-----------------|
| name | food     |
|------|----------|
| dave | beef     |
| dave | bacon    |
| dave | cheese   |
| phil | tomatoes |
| phil | bacon    |
| phil | cheese   |
| phil | apples   |
| john | bacon    |
| john | apples   |
| pete | beef     |
| pete | tomatoes |
| pete | bacon    |
| pete | apples   |
|-----------------|

如果SELECT查询从两个表返回数据，则问题会更加复杂：

SELECT
    names.name,
    foods.food,
    clothes.clothes

FROM
    names
    LEFT JOIN food_relationships ON names.id = food_relationships.names_id
    LEFT JOIN foods ON food_relationships.foods_id = foods.id
    LEFT JOIN clothes_relationships ON names.id = clothes_relationships.names_id
    LEFT JOIN clothes ON clothes_relationships.clothes_id = clothes.id;

|-----------------------------|
| name | food     | clothes   |
|------|----------|-----------|
| dave | beef     | trousers  |
| dave | beef     | shirt     |
| dave | beef     | socks     |
| dave | bacon    | trousers  |
| dave | bacon    | shirt     |
| dave | bacon    | socks     |
| dave | cheese   | trousers  |
| dave | cheese   | shirt     |
| dave | cheese   | socks     |
| phil | tomatoes | shorts    |
| phil | tomatoes | shirt     |
| phil | tomatoes | socks     |
| phil | bacon    | shorts    |
| phil | bacon    | shirt     |
| phil | bacon    | socks     |
| phil | cheese   | shorts    |
| phil | cheese   | shirt     |
| phil | cheese   | socks     |
| phil | apples   | shorts    |
| phil | apples   | shirt     |
| phil | apples   | socks     |
| ...
| etc.

问题是，如何查询SQL Server数据库以检索所有数据，但将其处理为每人只有一行？

如果数据库是MySQL，解决方案将相对简单，因为MySQL有一个连接行的GROUP_CONCAT函数。因此，对于其中一个表，我可以使用：

SELECT
    names.name,
    GROUP_CONCAT(foods.food)

FROM
    names
    LEFT JOIN food_relationships ON names.id = food_relationships.names_id
    LEFT JOIN foods ON food_relationships.foods_id = foods.id

GROUP BY (names.name);

…给予：

name    food
dave    beef,cheese,bacon
john    apples,bacon
pete    beef,apples,bacon,tomatoes
phil    bacon,tomatoes,apples,cheese

要从“名称”和“衣服”表中获得等效数据，我可以使用以下方法：

SELECT
    temp_foods_table.name               AS 'name',
    temp_foods_table.food               AS 'food',
    temp_clothes_table.clothes          AS 'clothes'

FROM
(
    SELECT
        names.name,
        GROUP_CONCAT(foods.food)        AS 'food'

    FROM
        names
        LEFT JOIN food_relationships ON names.id = food_relationships.names_id
        LEFT JOIN foods ON food_relationships.foods_id = foods.id

    GROUP BY (names.name)

) AS temp_foods_table

LEFT JOIN

(
    SELECT
        names.name,
        GROUP_CONCAT(clothes.clothes)    AS 'clothes'

    FROM
        names
        LEFT JOIN clothes_relationships ON names.id = clothes_relationships.names_id
        LEFT JOIN clothes ON clothes_relationships.clothes_id = clothes.id

    GROUP BY (names.name)

) AS temp_clothes_table

ON temp_foods_table.name = temp_clothes_table.name;

…给出以下结果：

name    food                            clothes
dave    beef,cheese,bacon               trousers,socks,shirt
john    apples,bacon                    jumper,shirt,trousers
pete    beef,apples,bacon,tomatoes      shorts,jumper,socks
phil    bacon,tomatoes,apples,cheese    shirt,shorts,socks

然而，SQLServer中的情况似乎不那么直截了当。对于单个表，有一些建议的在线解决方案，包括使用公共表表达式或XML路径。然而，所有的解决方案似乎都有缺点，并给人一种明显的印象，即它们是变通解决方案，而不是专门设计的功能。每个建议的解决方案都有一些缺点（例如，for XML路径解决方案假定文本是XML，因此，文本中包含的特殊字符可能会导致问题）。此外，一些评论者表示担心，这些解决方案是基于未记录或不推荐的功能，因此从长远来看可能不可靠

因此，我决定不必拘泥于SQL，而是使用Python和Pandas处理检索后的数据。我总是将数据传输到Pandas数据框以进行绘图和分析，所以这并不是一个很大的不便。为了在多个列上连接数据，我使用了groupby（）。但是，由于有两个多对多表，因此每列中都存在重复，因此，最终连接的字符串包含所有这些重复。为了只有唯一的值，我使用了Python集（根据定义，Python集只能包含唯一的值）。这种方法唯一的潜在缺点是没有维护字符串的顺序，但在我的情况下，这不是一个问题。最终的Python解决方案如下所示：

导入必要的库：

>>> import pandas as pd
>>> import pymssql
>>> import getpass

输入连接到数据库所需的详细信息：

>>> myServer = input("Enter server address: ")
>>> myUser = input("Enter username: ")
>>> myPwd = getpass.getpass("Enter password: ")

创建连接：

>>> myConnection = pymssql.connect(server=myServer, user=myUser, password=myPwd, port='1433')

>>> myLatestData = pd.io.sql.read_sql(myQuery, con=myConnection)
>>> myConnection.close()

定义查询以检索必要的数据：

>>> myQuery = """SELECT
                         names.name,
                         foods.food,
                         clothes.clothes

                     FROM
                         names
                         LEFT JOIN food_relationships ON names.id = food_relationships.names_id
                         LEFT JOIN foods ON food_relationships.foods_id = foods.id
                         LEFT JOIN clothes_relationships ON names.id = clothes_relationships.names_id
                         LEFT JOIN clothes ON clothes_relationships.clothes_id = clothes.id """

运行查询，将结果放入dataframe并关闭连接：

>>> myConnection = pymssql.connect(server=myServer, user=myUser, password=myPwd, port='1433')

>>> myLatestData = pd.io.sql.read_sql(myQuery, con=myConnection)
>>> myConnection.close()

连接多行中的字符串并删除重复项：

>>> tempDF = tempDF.groupby('name').agg(lambda col: ','.join(set(col)))

打印最终数据帧：

>>> print(tempDF)

name                          food                clothes
dave             beef,bacon,cheese   socks,trousers,shirt
john                  bacon,apples  jumper,trousers,shirt
pete    tomatoes,beef,bacon,apples    socks,jumper,shorts
phil  tomatoes,bacon,cheese,apples     socks,shorts,shirt

对我来说，这个解决方案比尝试在SQL查询中进行所有数据处理更直观。希望这对其他人有所帮助。

如果是MS Sql Server

您可以使用STUFF函数。例如

声明@Heroes表( [HeroName]VARCHAR（20） )

插入@Heroes（[HeroName]）价值观（‘超人’、‘蝙蝠侠’、‘铁人’、‘金刚狼’）

选择内容（（选择“，”+[HeroName] 来自@英雄按[姓名]订购对于XML路径（“”），1，1，“”）作为[Output]

输出蝙蝠侠，铁人，超人，狼獾

我认为这应该回答你的问题

谢谢

其他帖子的一些评论者建议，如果字符串包含可以解释为XML的字符，那么使用FOR XML路径解决方法将失败。