Join CSV到列,与基于行的数据连接,分析和输出-可以高效地完成吗?
我一直在努力解决一个复杂的SQL Server问题,但我被卡住了,我希望能得到一些帮助 我有两个以不同格式存储的数据表,需要将它们组合在一起以创建指定的输出。更糟糕的是,其中一个表中有一些关键数据存储在逗号分隔的值中(我知道这不是存储数据的方式-发发慈悲吧,我没有设计这些表!) 学生表格:Join CSV到列,与基于行的数据连接,分析和输出-可以高效地完成吗?,join,sql-server-2008-r2,Join,Sql Server 2008 R2,我一直在努力解决一个复杂的SQL Server问题,但我被卡住了,我希望能得到一些帮助 我有两个以不同格式存储的数据表,需要将它们组合在一起以创建指定的输出。更糟糕的是,其中一个表中有一些关键数据存储在逗号分隔的值中(我知道这不是存储数据的方式-发发慈悲吧,我没有设计这些表!) 学生表格: | id | oldSkill | newSkill | +----+-----------------------+----
| id | oldSkill | newSkill |
+----+-----------------------+--------------------------------------+
| 1 | Word | Excel,PowerPoint,Word |
| 2 | Excel,PowerPoint,Word | Excel,Outlook,PowerPoint,Word |
| 3 | PowerPoint,Word | Excel,PowerPoint,Word |
| 4 | Access,Excel | Access,Excel,Outlook,PowerPoint,Word |
| 5 | Outlook,Word | Excel,Outlook,PowerPoint,Word |
| id | skill | assignment |
+----+------------+------------+
| 1 | Word | B |
| 1 | Word | P |
| 2 | Excel | P |
| 2 | PowerPoint | B |
| 2 | PowerPoint | P |
| 2 | Word | P |
| 3 | PowerPoint | P |
| 3 | Word | P |
| 4 | Access | B |
| 4 | Excel | B |
| 4 | Access | P |
| 4 | Excel | P |
| 5 | Outlook | P |
| 5 | Word | B |
技能表:
| id | oldSkill | newSkill |
+----+-----------------------+--------------------------------------+
| 1 | Word | Excel,PowerPoint,Word |
| 2 | Excel,PowerPoint,Word | Excel,Outlook,PowerPoint,Word |
| 3 | PowerPoint,Word | Excel,PowerPoint,Word |
| 4 | Access,Excel | Access,Excel,Outlook,PowerPoint,Word |
| 5 | Outlook,Word | Excel,Outlook,PowerPoint,Word |
| id | skill | assignment |
+----+------------+------------+
| 1 | Word | B |
| 1 | Word | P |
| 2 | Excel | P |
| 2 | PowerPoint | B |
| 2 | PowerPoint | P |
| 2 | Word | P |
| 3 | PowerPoint | P |
| 3 | Word | P |
| 4 | Access | B |
| 4 | Excel | B |
| 4 | Access | P |
| 4 | Excel | P |
| 5 | Outlook | P |
| 5 | Word | B |
以下是我被要求输出的内容:
| id | skill_1 | skill_1_primary | skill_1_backup | skill_2 | skill_2_primary | skill_2_backup | skill_3 | skill_3_primary | skill_3_backup | skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
| 1 | Excel | Y | (null) | PowerPoint | Y | (null) | Word | Y | Y | (null) | (null) | (null) | (null) | (null) | (null) |
| 2 | Excel | Y | (null) | Outlook | Y | (null) | PowerPoint | Y | Y | Word | Y | (null) | (null) | (null) | (null) |
| 3 | Excel | Y | (null) | PowerPoint | Y | (null) | Word | Y | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| 4 | Access | Y | Y | Excel | Y | Y | Outlook | Y | (null) | PowerPoint | Y | (null) | Word | Y | (null) |
| 5 | Excel | Y | (null) | Outlook | Y | (null) | PowerPoint | Y | (null) | Word | (null) | Y | (null) | (null) | (null) |
要分解它,我需要:
- 从
表中输出Students
列中的所有项目。这些值需要分成单独的列,每个列都有一个相应的标志,以指示技能是主技能还是备份技能。请注意,newSkill
列包含newSkill
值oldSkill
- 如果技能是旧的,则从
表中获取标志值,其中p是主技能,B是备份技能Skills
- 如果该技能是新技能,只需在
列上标记一个“y”值即可主
Skills
表的行中获取数据,并将其与学生
数据组合成他们想要的格式,这让我不知所措
-- this pivots the skills table into a single row for each skill
select *
into #skillPiv
from
(
select id, skill, assignment,
'assignment_'+cast(row_number() over(partition by id, skill order by skill) as varchar(10)) rn
from skills
) d
pivot
(
max(assignment)
for rn in ([assignment_1], [assignment_2])
) piv
order by id;
-- this converts the student's oldSkills from CSV into rows and looks up the corresponding skill assignments in the #skills table
with st(id, skill, oldSkill) as (
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from students
union all
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from st
where oldSkill > ''
)
select st.id
,st.skill
,CASE WHEN sp.assignment_1 = 'P' OR sp.assignment_2 = 'P'
THEN 'Y'
ELSE ''
END AS [primary]
,CASE WHEN sp.assignment_1 = 'B' OR sp.assignment_2 = 'B'
THEN 'Y'
ELSE ''
END AS [backup]
into #oldSkills
from st
inner join #skillPiv sp on st.id = sp.id and st.skill = sp.skill
order by id;
-- convert the newSkills column from CSV to rows and insert our default skill assignment values
with tmp(id, skill, newSkill) as (
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from students
union all
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from tmp
where newSkill > ''
)
select id
,skill
,'Y' as [primary]
,'' as [backup]
into #newSkills
from tmp
where skill NOT IN (
select skill from #oldSkills where id = tmp.id
)
order by id;
-- now combine #oldSkills and #newSkills into one table that has all the values we need
select *
into #studentSkills
from (
select * from #newSkills
UNION
select * from #oldSkills
) as ss;
select * from #studentSkills;
我还设置了一个SQL FIDLE来为本文构建测试数据:
提前感谢您的帮助或指导。。。SQL不是我最擅长的技能之一。我可能可以用另一种语言更容易地实现这一点,但我被要求将其构建为一个存储过程=P
更新:
在这一点上,我自己已经做了很多,使用了评论中的建议。我只需要最终输出的帮助。我认为这可以通过使用带有动态sql的pivot来完成,但是如何对三个与技能相关的列进行pivot和聚合,并按照指定的方式对它们进行编号,这让我感到困惑
-- this pivots the skills table into a single row for each skill
select *
into #skillPiv
from
(
select id, skill, assignment,
'assignment_'+cast(row_number() over(partition by id, skill order by skill) as varchar(10)) rn
from skills
) d
pivot
(
max(assignment)
for rn in ([assignment_1], [assignment_2])
) piv
order by id;
-- this converts the student's oldSkills from CSV into rows and looks up the corresponding skill assignments in the #skills table
with st(id, skill, oldSkill) as (
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from students
union all
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from st
where oldSkill > ''
)
select st.id
,st.skill
,CASE WHEN sp.assignment_1 = 'P' OR sp.assignment_2 = 'P'
THEN 'Y'
ELSE ''
END AS [primary]
,CASE WHEN sp.assignment_1 = 'B' OR sp.assignment_2 = 'B'
THEN 'Y'
ELSE ''
END AS [backup]
into #oldSkills
from st
inner join #skillPiv sp on st.id = sp.id and st.skill = sp.skill
order by id;
-- convert the newSkills column from CSV to rows and insert our default skill assignment values
with tmp(id, skill, newSkill) as (
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from students
union all
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from tmp
where newSkill > ''
)
select id
,skill
,'Y' as [primary]
,'' as [backup]
into #newSkills
from tmp
where skill NOT IN (
select skill from #oldSkills where id = tmp.id
)
order by id;
-- now combine #oldSkills and #newSkills into one table that has all the values we need
select *
into #studentSkills
from (
select * from #newSkills
UNION
select * from #oldSkills
) as ss;
select * from #studentSkills;
我在让临时表在SQLFiddle上工作时遇到了问题,所以我将测试代码移到了RexTester
在我的实际代码中,我使用DelimitedSplit8K从Students
表中解析出CSV值
上面的代码生成最终的表格:
| id | skill | primary | backup |
|----|------------|---------|--------|
| 1 | Excel | Y | (null) |
| 1 | PowerPoint | Y | (null) |
| 1 | Word | Y | Y |
| 2 | Excel | Y | (null) |
| 2 | Outlook | Y | (null) |
| 2 | PowerPoint | Y | Y |
| 2 | Word | Y | (null) |
| 3 | Excel | Y | (null) |
| 3 | PowerPoint | Y | (null) |
| 3 | Word | Y | (null) |
| 4 | Access | Y | Y |
| 4 | Excel | Y | Y |
| 4 | Outlook | Y | (null) |
| 4 | PowerPoint | Y | (null) |
| 4 | Word | Y | (null) |
| 5 | Excel | Y | (null) |
| 5 | Outlook | Y | (null) |
| 5 | PowerPoint | Y | (null) |
| 5 | Word | (null) | Y |
现在,我只需要将其旋转,使其看起来像所需的输出:
| id | skill_1 | skill_1_primary | skill_1_backup | skill_2 | skill_2_primary | skill_2_backup | skill_3 | skill_3_primary | skill_3_backup | skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
| 1 | Excel | Y | (null) | PowerPoint | Y | (null) | Word | Y | Y | (null) | (null) | (null) | (null) | (null) | (null) |
| 2 | Excel | Y | (null) | Outlook | Y | (null) | PowerPoint | Y | Y | Word | Y | (null) | (null) | (null) | (null) |
| 3 | Excel | Y | (null) | PowerPoint | Y | (null) | Word | Y | (null) | (null) | (null) | (null) | (null) | (null) | (null) |
| 4 | Access | Y | Y | Excel | Y | Y | Outlook | Y | (null) | PowerPoint | Y | (null) | Word | Y | (null) |
| 5 | Excel | Y | (null) | Outlook | Y | (null) | PowerPoint | Y | (null) | Word | (null) | Y | (null) | (null) | (null) |
谢谢你的帮助。谢谢 这个设计真的,真的,真的糟透了:-D 尽管如此,如果您必须坚持,您可以尝试以下方法: 注意:我相信你的陈述 请注意,newSkill列包含oldSkill值 我认为“没有旧技能,新技能中没有旧技能!” 该解决方案完全内联且基于集合:
DECLARE @students TABLE(id INT,oldSkill VARCHAR(100),newSkill VARCHAR(100));
INSERT INTO @students VALUES
(1,'Word','Excel,PowerPoint,Word')
,(2,'Excel,PowerPoint,Word','Excel,Outlook,PowerPoint,Word')
,(3,'PowerPoint,Word','Excel,PowerPoint,Word')
,(4,'Access,Excel','Access,Excel,Outlook,PowerPoint,Word')
,(5,'Outlook,Word','Excel,Outlook,PowerPoint,Word');
DECLARE @skills TABLE(id INT, skill VARCHAR(100),assignment VARCHAR(1));
INSERT INTO @skills VALUES
(1,'Word','B')
,(1,'Word','P')
,(2,'Excel','P')
,(2,'PowerPoint','B')
,(2,'PowerPoint','P')
,(2,'Word','P')
,(3,'PowerPoint','P')
,(3,'Word','P')
,(4,'Access','B')
,(4,'Excel','B')
,(4,'Access','P')
,(4,'Excel','P')
,(5,'Outlook','P')
,(5,'Word','B');
--第一个CTE将使用XML技巧分割逗号分隔的值
WITH Step1 AS
(
SELECT id
,A.*
FROM @students AS s
OUTER APPLY(
SELECT CAST('<x>' + REPLACE(s.oldSkill,',','</x><x>') + '</x>' AS XML) AS OldSkillXml
,CAST('<x>' + REPLACE(s.newSkill,',','</x><x>') + '</x>' AS XML) AS NewSkillXml
) AS A
)
--此CTE获得新技能列表,所有技能都标记为“IsPrimary='Y'”
--中间列表是轴之前的结果
,IntermediateList AS
(
SELECT ns.id
,ns.Skill
,ns.IsPrimary
,os.IsBackup
,ns.NewSkillOrder
FROM NewSkills AS ns
FULL OUTER JOIN OldSkills AS os ON os.id=ns.id AND os.Skill=ns.Skill
)
--在这里,我使用“条件聚合”(老式的pivot),它非常适合使用多个列进行pivot
SELECT id
,MAX(CASE WHEN NewSkillOrder = 1 THEN Skill END) AS skill_1
,MAX(CASE WHEN NewSkillOrder = 1 THEN IsPrimary END) AS skill_1_primary
,MAX(CASE WHEN NewSkillOrder = 1 THEN IsBackup END) AS skill_1_backup
,MAX(CASE WHEN NewSkillOrder = 2 THEN Skill END) AS skill_2
,MAX(CASE WHEN NewSkillOrder = 2 THEN IsPrimary END) AS skill_2_primary
,MAX(CASE WHEN NewSkillOrder = 2 THEN IsBackup END) AS skill_2_backup
,MAX(CASE WHEN NewSkillOrder = 3 THEN Skill END) AS skill_3
,MAX(CASE WHEN NewSkillOrder = 3 THEN IsPrimary END) AS skill_3_primary
,MAX(CASE WHEN NewSkillOrder = 3 THEN IsBackup END) AS skill_3_backup
,MAX(CASE WHEN NewSkillOrder = 4 THEN Skill END) AS skill_4
,MAX(CASE WHEN NewSkillOrder = 4 THEN IsPrimary END) AS skill_4_primary
,MAX(CASE WHEN NewSkillOrder = 4 THEN IsBackup END) AS skill_4_backup
,MAX(CASE WHEN NewSkillOrder = 5 THEN Skill END) AS skill_5
,MAX(CASE WHEN NewSkillOrder = 5 THEN IsPrimary END) AS skill_5_primary
,MAX(CASE WHEN NewSkillOrder = 5 THEN IsBackup END) AS skill_5_backup
FROM IntermediateList AS il
GROUP BY id;
结果
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| id | skill_1 | skill_1_primary | skill_1_backup | skill_2 | skill_2_primary | skill_2_backup | skill_3 | skill_3_primary | skill_3_backup | skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 1 | Excel | Y | NULL | PowerPoint | Y | NULL | Word | Y | Y | NULL | NULL | NULL | NULL | NULL | NULL |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 2 | Excel | Y | NULL | Outlook | Y | NULL | PowerPoint | Y | Y | Word | Y | NULL | NULL | NULL | NULL |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 3 | Excel | Y | NULL | PowerPoint | Y | NULL | Word | Y | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 4 | Access | Y | Y | Excel | Y | Y | Outlook | Y | NULL | PowerPoint | Y | NULL | Word | Y | NULL |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 5 | Excel | Y | NULL | Outlook | Y | NULL | PowerPoint | Y | NULL | Word | Y | Y | NULL | NULL | NULL |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
注意有一个区别:你的学生5的技能“单词”为空/Y,我不理解,为什么“新技能”中包含的技能不应该是“主要的”。如果你可以使用SQL 2016,你可以使用这个有用的函数,如果它只是把你的数据预处理成一个更好的形式,你就可以开始简单……现在就想想技巧1。您可以在一次选择中完成所有操作,为您需要的每个字段(列)输出使用相关子查询…让技能1工作,然后添加其余的…(您可能实际上不需要拆分字符串,您可以使用粗略但有效的CHARINDEX查看Excel单词是否存在于Excel、PowerPoint、word等块中)谢谢,不幸的是,我被SQL Server 2008 R2卡住了。我确实编写了一个用于拆分字符串的UDF,并将数据分为列,但我仍然不确定如何有效地分析和组合技能表中的数据。我也不想做一大堆相关的子查询,真正的用例有40个技能栏!我担心我会构建一个SP来挂起服务器…哈哈,你不会挂起服务器-有了这一小块数据:)我认为协同子查询会很棒…跳过拆分…尝试让skill 1与SELECT(协同子查询)和COL1一起工作…等等…然后你基本上可以剪切n paste来让其他39列正常工作,对于我的示例来说,这里的查询很小,但真正的查询可能需要处理数百或数千行,每行最多有40种技能。我将使用它,但我仍然担心它可能是一个嵌套查询噩梦。。。我感谢你的帮助!我在评论中解释了每个输入/输出的技能数。备份/主备份独立于旧的/新的..学生5技能4是单词,技能中没有P条目,因此主备份为空,它有B,因此备份为Y。带项目符号的算法与数据不一致。因此询问者必须澄清什么是正确的——但他们说他们得到了数据。他们莫名其妙地评论说,技能编号是一个学生id。而且还不清楚旧技能是否在输出中起作用。这个问题不够清楚,无法回答。@philipxy,来自学生5的照片,单词“我的结果相同”,似乎很接近。希望OP能回来解决这个问题。在某些情况下,学生可能不想被列为某项技能的主要联系人,因此,只有一个学生有一个没有主要作业单词的备份。关于技能id栏,这是我的一个疏忽,我试图创建它