Join CSV到列,与基于行的数据连接,分析和输出-可以高效地完成吗?

Join CSV到列,与基于行的数据连接,分析和输出-可以高效地完成吗?,join,sql-server-2008-r2,Join,Sql Server 2008 R2,我一直在努力解决一个复杂的SQL Server问题,但我被卡住了,我希望能得到一些帮助 我有两个以不同格式存储的数据表,需要将它们组合在一起以创建指定的输出。更糟糕的是,其中一个表中有一些关键数据存储在逗号分隔的值中(我知道这不是存储数据的方式-发发慈悲吧,我没有设计这些表!) 学生表格: | id | oldSkill | newSkill | +----+-----------------------+----

我一直在努力解决一个复杂的SQL Server问题,但我被卡住了,我希望能得到一些帮助

我有两个以不同格式存储的数据表,需要将它们组合在一起以创建指定的输出。更糟糕的是,其中一个表中有一些关键数据存储在逗号分隔的值中(我知道这不是存储数据的方式-发发慈悲吧,我没有设计这些表!)

学生表格:

| id |              oldSkill |                             newSkill |
+----+-----------------------+--------------------------------------+
|  1 |                  Word |                Excel,PowerPoint,Word |
|  2 | Excel,PowerPoint,Word |        Excel,Outlook,PowerPoint,Word |
|  3 |       PowerPoint,Word |                Excel,PowerPoint,Word |
|  4 |          Access,Excel | Access,Excel,Outlook,PowerPoint,Word |
|  5 |          Outlook,Word |        Excel,Outlook,PowerPoint,Word |
| id |      skill | assignment |
+----+------------+------------+
|  1 |       Word |          B |
|  1 |       Word |          P |
|  2 |      Excel |          P |
|  2 | PowerPoint |          B |
|  2 | PowerPoint |          P |
|  2 |       Word |          P |
|  3 | PowerPoint |          P |
|  3 |       Word |          P |
|  4 |     Access |          B |
|  4 |      Excel |          B |
|  4 |     Access |          P |
|  4 |      Excel |          P |
|  5 |    Outlook |          P |
|  5 |       Word |          B |
技能表:

| id |              oldSkill |                             newSkill |
+----+-----------------------+--------------------------------------+
|  1 |                  Word |                Excel,PowerPoint,Word |
|  2 | Excel,PowerPoint,Word |        Excel,Outlook,PowerPoint,Word |
|  3 |       PowerPoint,Word |                Excel,PowerPoint,Word |
|  4 |          Access,Excel | Access,Excel,Outlook,PowerPoint,Word |
|  5 |          Outlook,Word |        Excel,Outlook,PowerPoint,Word |
| id |      skill | assignment |
+----+------------+------------+
|  1 |       Word |          B |
|  1 |       Word |          P |
|  2 |      Excel |          P |
|  2 | PowerPoint |          B |
|  2 | PowerPoint |          P |
|  2 |       Word |          P |
|  3 | PowerPoint |          P |
|  3 |       Word |          P |
|  4 |     Access |          B |
|  4 |      Excel |          B |
|  4 |     Access |          P |
|  4 |      Excel |          P |
|  5 |    Outlook |          P |
|  5 |       Word |          B |
以下是我被要求输出的内容:

| id | skill_1 | skill_1_primary | skill_1_backup |    skill_2 | skill_2_primary | skill_2_backup |    skill_3 | skill_3_primary | skill_3_backup |    skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
|  1 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |              Y |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  2 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |              Y |       Word |               Y |         (null) |  (null) |          (null) |         (null) |
|  3 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |         (null) |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  4 |  Access |               Y |              Y |      Excel |               Y |              Y |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |    Word |               Y |         (null) |
|  5 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |          (null) |              Y |  (null) |          (null) |         (null) |
要分解它,我需要:

  • Students
    表中输出
    newSkill
    列中的所有项目。这些值需要分成单独的列,每个列都有一个相应的标志,以指示技能是主技能还是备份技能。请注意,
    newSkill
    列包含
    oldSkill

  • 如果技能是旧的,则从
    Skills
    表中获取标志值,其中p是主技能,B是备份技能

  • 如果该技能是新技能,只需在
    列上标记一个“y”值即可

我一直试图从不同的角度(CTE、枢轴、光标等)来看待这一点,并且我成功地使用UDF来拆分CSV列值,但是从
Skills
表的行中获取数据,并将其与
学生
数据组合成他们想要的格式,这让我不知所措

-- this pivots the skills table into a single row for each skill
select *
into #skillPiv
from 
(
  select id, skill, assignment,
    'assignment_'+cast(row_number() over(partition by id, skill order by skill) as varchar(10)) rn
  from skills
) d
pivot
(
  max(assignment)
  for rn in ([assignment_1], [assignment_2])
) piv
order by id;


-- this converts the student's oldSkills from CSV into rows and looks up the corresponding skill assignments in the #skills table
with st(id, skill, oldSkill) as (
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
    STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from students
union all
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
    STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from st
where oldSkill > ''
)
select st.id
    ,st.skill
    ,CASE WHEN sp.assignment_1 = 'P' OR sp.assignment_2 = 'P'
        THEN 'Y'
        ELSE ''
        END AS [primary]
    ,CASE WHEN sp.assignment_1 = 'B' OR sp.assignment_2 = 'B'
        THEN 'Y'
        ELSE ''
        END AS [backup]
into #oldSkills
from st
inner join #skillPiv sp on st.id = sp.id and st.skill = sp.skill
order by id;


-- convert the newSkills column from CSV to rows and insert our default skill assignment values
with tmp(id, skill, newSkill) as (
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
    STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from students
union all
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
    STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from tmp
where newSkill > ''
)
select id
    ,skill
    ,'Y' as [primary]
    ,'' as [backup]
into #newSkills
from tmp
where skill NOT IN (
    select skill from #oldSkills where id = tmp.id
    )
order by id;


-- now combine #oldSkills and #newSkills into one table that has all the values we need
select *
into #studentSkills
from (
    select * from #newSkills
    UNION
    select * from #oldSkills
) as ss;

select * from #studentSkills;
我还设置了一个SQL FIDLE来为本文构建测试数据:

提前感谢您的帮助或指导。。。SQL不是我最擅长的技能之一。我可能可以用另一种语言更容易地实现这一点,但我被要求将其构建为一个存储过程=P

更新: 在这一点上,我自己已经做了很多,使用了评论中的建议。我只需要最终输出的帮助。我认为这可以通过使用带有动态sql的pivot来完成,但是如何对三个与技能相关的列进行pivot和聚合,并按照指定的方式对它们进行编号,这让我感到困惑

-- this pivots the skills table into a single row for each skill
select *
into #skillPiv
from 
(
  select id, skill, assignment,
    'assignment_'+cast(row_number() over(partition by id, skill order by skill) as varchar(10)) rn
  from skills
) d
pivot
(
  max(assignment)
  for rn in ([assignment_1], [assignment_2])
) piv
order by id;


-- this converts the student's oldSkills from CSV into rows and looks up the corresponding skill assignments in the #skills table
with st(id, skill, oldSkill) as (
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
    STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from students
union all
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
    STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from st
where oldSkill > ''
)
select st.id
    ,st.skill
    ,CASE WHEN sp.assignment_1 = 'P' OR sp.assignment_2 = 'P'
        THEN 'Y'
        ELSE ''
        END AS [primary]
    ,CASE WHEN sp.assignment_1 = 'B' OR sp.assignment_2 = 'B'
        THEN 'Y'
        ELSE ''
        END AS [backup]
into #oldSkills
from st
inner join #skillPiv sp on st.id = sp.id and st.skill = sp.skill
order by id;


-- convert the newSkills column from CSV to rows and insert our default skill assignment values
with tmp(id, skill, newSkill) as (
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
    STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from students
union all
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
    STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from tmp
where newSkill > ''
)
select id
    ,skill
    ,'Y' as [primary]
    ,'' as [backup]
into #newSkills
from tmp
where skill NOT IN (
    select skill from #oldSkills where id = tmp.id
    )
order by id;


-- now combine #oldSkills and #newSkills into one table that has all the values we need
select *
into #studentSkills
from (
    select * from #newSkills
    UNION
    select * from #oldSkills
) as ss;

select * from #studentSkills;

我在让临时表在SQLFiddle上工作时遇到了问题,所以我将测试代码移到了RexTester

在我的实际代码中,我使用DelimitedSplit8K从
Students
表中解析出CSV值

上面的代码生成最终的表格:

| id |      skill | primary | backup |
|----|------------|---------|--------|
|  1 |      Excel |       Y | (null) |
|  1 | PowerPoint |       Y | (null) |
|  1 |       Word |       Y |      Y |
|  2 |      Excel |       Y | (null) |
|  2 |    Outlook |       Y | (null) |
|  2 | PowerPoint |       Y |      Y |
|  2 |       Word |       Y | (null) |
|  3 |      Excel |       Y | (null) |
|  3 | PowerPoint |       Y | (null) |
|  3 |       Word |       Y | (null) |
|  4 |     Access |       Y |      Y |
|  4 |      Excel |       Y |      Y |
|  4 |    Outlook |       Y | (null) |
|  4 | PowerPoint |       Y | (null) |
|  4 |       Word |       Y | (null) |
|  5 |      Excel |       Y | (null) |
|  5 |    Outlook |       Y | (null) |
|  5 | PowerPoint |       Y | (null) |
|  5 |       Word |  (null) |      Y |
现在,我只需要将其旋转,使其看起来像所需的输出:

| id | skill_1 | skill_1_primary | skill_1_backup |    skill_2 | skill_2_primary | skill_2_backup |    skill_3 | skill_3_primary | skill_3_backup |    skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
|  1 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |              Y |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  2 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |              Y |       Word |               Y |         (null) |  (null) |          (null) |         (null) |
|  3 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |         (null) |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  4 |  Access |               Y |              Y |      Excel |               Y |              Y |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |    Word |               Y |         (null) |
|  5 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |          (null) |              Y |  (null) |          (null) |         (null) |

谢谢你的帮助。谢谢

这个设计真的,真的,真的糟透了:-D

尽管如此,如果您必须坚持,您可以尝试以下方法:

注意:我相信你的陈述

请注意,newSkill列包含oldSkill值

我认为“没有旧技能,新技能中没有旧技能!”

该解决方案完全内联且基于集合:

DECLARE @students TABLE(id INT,oldSkill VARCHAR(100),newSkill VARCHAR(100));
INSERT INTO @students VALUES
 (1,'Word','Excel,PowerPoint,Word')
,(2,'Excel,PowerPoint,Word','Excel,Outlook,PowerPoint,Word')
,(3,'PowerPoint,Word','Excel,PowerPoint,Word')
,(4,'Access,Excel','Access,Excel,Outlook,PowerPoint,Word')
,(5,'Outlook,Word','Excel,Outlook,PowerPoint,Word');

DECLARE @skills TABLE(id INT, skill VARCHAR(100),assignment VARCHAR(1));
INSERT INTO @skills VALUES
 (1,'Word','B')
,(1,'Word','P')
,(2,'Excel','P')
,(2,'PowerPoint','B')
,(2,'PowerPoint','P')
,(2,'Word','P')
,(3,'PowerPoint','P')
,(3,'Word','P')
,(4,'Access','B')
,(4,'Excel','B')
,(4,'Access','P')
,(4,'Excel','P')
,(5,'Outlook','P')
,(5,'Word','B');
--第一个CTE将使用XML技巧分割逗号分隔的值

WITH Step1 AS
(
    SELECT id
          ,A.*     
    FROM @students AS s
    OUTER APPLY(
                 SELECT CAST('<x>' + REPLACE(s.oldSkill,',','</x><x>') + '</x>' AS XML) AS OldSkillXml
                       ,CAST('<x>' + REPLACE(s.newSkill,',','</x><x>') + '</x>' AS XML) AS NewSkillXml
                ) AS A
)
--此CTE获得新技能列表,所有技能都标记为“IsPrimary='Y'”

--中间列表是轴之前的结果

,IntermediateList AS
(
    SELECT ns.id
          ,ns.Skill
          ,ns.IsPrimary
          ,os.IsBackup
          ,ns.NewSkillOrder
    FROM NewSkills AS ns
    FULL OUTER JOIN OldSkills AS os ON os.id=ns.id AND os.Skill=ns.Skill 
)
--在这里,我使用“条件聚合”(老式的pivot),它非常适合使用多个列进行
pivot

SELECT id

      ,MAX(CASE WHEN NewSkillOrder = 1 THEN Skill END) AS skill_1
      ,MAX(CASE WHEN NewSkillOrder = 1 THEN IsPrimary END) AS skill_1_primary
      ,MAX(CASE WHEN NewSkillOrder = 1 THEN IsBackup END) AS skill_1_backup

      ,MAX(CASE WHEN NewSkillOrder = 2 THEN Skill END) AS skill_2
      ,MAX(CASE WHEN NewSkillOrder = 2 THEN IsPrimary END) AS skill_2_primary
      ,MAX(CASE WHEN NewSkillOrder = 2 THEN IsBackup END) AS skill_2_backup

      ,MAX(CASE WHEN NewSkillOrder = 3 THEN Skill END) AS skill_3
      ,MAX(CASE WHEN NewSkillOrder = 3 THEN IsPrimary END) AS skill_3_primary
      ,MAX(CASE WHEN NewSkillOrder = 3 THEN IsBackup END) AS skill_3_backup

      ,MAX(CASE WHEN NewSkillOrder = 4 THEN Skill END) AS skill_4
      ,MAX(CASE WHEN NewSkillOrder = 4 THEN IsPrimary END) AS skill_4_primary
      ,MAX(CASE WHEN NewSkillOrder = 4 THEN IsBackup END) AS skill_4_backup

      ,MAX(CASE WHEN NewSkillOrder = 5 THEN Skill END) AS skill_5
      ,MAX(CASE WHEN NewSkillOrder = 5 THEN IsPrimary END) AS skill_5_primary
      ,MAX(CASE WHEN NewSkillOrder = 5 THEN IsBackup END) AS skill_5_backup
FROM IntermediateList AS il
GROUP BY id; 
结果

+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| id | skill_1 | skill_1_primary | skill_1_backup | skill_2    | skill_2_primary | skill_2_backup | skill_3    | skill_3_primary | skill_3_backup | skill_4    | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 1  | Excel   | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | Y              | NULL       | NULL            | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 2  | Excel   | Y               | NULL           | Outlook    | Y               | NULL           | PowerPoint | Y               | Y              | Word       | Y               | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 3  | Excel   | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | NULL           | NULL       | NULL            | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 4  | Access  | Y               | Y              | Excel      | Y               | Y              | Outlook    | Y               | NULL           | PowerPoint | Y               | NULL           | Word    | Y               | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 5  | Excel   | Y               | NULL           | Outlook    | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | Y              | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
注意

有一个区别:你的学生5的技能“单词”为空/Y,我不理解,为什么“新技能”中包含的技能不应该是“主要的”。

如果你可以使用SQL 2016,你可以使用这个有用的函数,如果它只是把你的数据预处理成一个更好的形式,你就可以开始简单……现在就想想技巧1。您可以在一次选择中完成所有操作,为您需要的每个字段(列)输出使用相关子查询…让技能1工作,然后添加其余的…(您可能实际上不需要拆分字符串,您可以使用粗略但有效的CHARINDEX查看Excel单词是否存在于Excel、PowerPoint、word等块中)谢谢,不幸的是,我被SQL Server 2008 R2卡住了。我确实编写了一个用于拆分字符串的UDF,并将数据分为列,但我仍然不确定如何有效地分析和组合技能表中的数据。我也不想做一大堆相关的子查询,真正的用例有40个技能栏!我担心我会构建一个SP来挂起服务器…哈哈,你不会挂起服务器-有了这一小块数据:)我认为协同子查询会很棒…跳过拆分…尝试让skill 1与SELECT(协同子查询)和COL1一起工作…等等…然后你基本上可以剪切n paste来让其他39列正常工作,对于我的示例来说,这里的查询很小,但真正的查询可能需要处理数百或数千行,每行最多有40种技能。我将使用它,但我仍然担心它可能是一个嵌套查询噩梦。。。我感谢你的帮助!我在评论中解释了每个输入/输出的技能数。备份/主备份独立于旧的/新的..学生5技能4是单词,技能中没有P条目,因此主备份为空,它有B,因此备份为Y。带项目符号的算法与数据不一致。因此询问者必须澄清什么是正确的——但他们说他们得到了数据。他们莫名其妙地评论说,技能编号是一个学生id。而且还不清楚旧技能是否在输出中起作用。这个问题不够清楚,无法回答。@philipxy,来自学生5的照片,单词“我的结果相同”,似乎很接近。希望OP能回来解决这个问题。在某些情况下,学生可能不想被列为某项技能的主要联系人,因此,只有一个学生有一个没有主要作业单词的备份。关于技能id栏,这是我的一个疏忽,我试图创建它