Sql server 调整查询以解析SQL Server 2014上的XML数据

Sql server 调整查询以解析SQL Server 2014上的XML数据,sql-server,xpath,query-performance,query-tuning,Sql Server,Xpath,Query Performance,Query Tuning,我在SQL Server 2014数据库中有一个表,该表存储了VARCHARMAX列穷人CDC中记录更改的审核信息 该数据的格式如下: <span class="fieldname">Assigned To</span> changed from <span class="oldvalue">user1</span> to <span class="newvalue">user2</span><br /&g

我在SQL Server 2014数据库中有一个表,该表存储了VARCHARMAX列穷人CDC中记录更改的审核信息

该数据的格式如下:

<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />
...
存储过程试图通过将数据转换为XML,然后使用XPath检索必要的片段来实现这一点:

;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM dbo.Trans
   WHERE TransDate >= '11/1/2016'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);
这个查询的速度非常慢——这个示例只处理了10天的数据,随着数据的增加,速度会越来越慢


优化此查询的选项有哪些?

您真正需要加快的是一些xml索引。但是,由于您正在动态创建XML,因此不会发生这种情况。实际上,这相当于交叉连接,并且随着时间的推移,速度会呈指数级下降


有关详细讨论以及索引如何帮助的信息,请参阅。如果您想通过XML实现这一点,那么您确实需要存储XML,以便为XML编制索引。

交叉连接就是其中之一,它的伸缩性不太好,因为您的表越来越大,嵌套循环上的执行数呈指数增长。 在您提交的执行计划中,每个循环的数字超过60万。您的逻辑读取确实很低,但页面会一次又一次地进行处理。 如果您的查询大小超出了缓冲区大小,并将后台打印到磁盘上,那么您将受到真正的伤害

这是一个允许您利用XML索引的解决方案,它可能有助于您的情况

--PREPARE SAMPLE DATA
DROP TABLE #Trans
CREATE TABLE #Trans(TransID INT
                    ,TransDate DATE
                    ,TransDescription VARBINARY(MAX)
                    )
INSERT INTO #Trans VALUES
(
1, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(2, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(3, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))

---------------------------------------------------------------------------------------------------
--RUN BELOW THIS LINE COLLECTIVELY, THE ORIGINAL QUERY IS SHOWING UP WITH APPROX 93% OR OVERALL COST

--BUILD A TEMP TABLE TO RECIEVE XML FORMATTED DATA
DROP TABLE #XmlData
CREATE TABLE #XmlData (
    TransId INT NOT NULL,
    TransXml xml NOT NULL,
CONSTRAINT [PK_XmlData] PRIMARY KEY CLUSTERED (TransId)
) 

--INSERT DATA INTO XML TABLE
INSERT INTO #XmlData
SELECT TransId,
    CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM #Trans
WHERE TransDate >= '11/1/2015'
    AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%')

--CREATE AN XML INDEX
CREATE PRIMARY XML INDEX PXML_TransXml
ON #XmlData(TransXml)

--APPLY NODES QUERY AGAINST XML INDEX
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
FROM #XmlData TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

---------------------------------
--Original Query
;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM #Trans--dbo.Trans
   WHERE TransDate >= '11/1/2015'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))

SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
--INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

您只需要像“%Status%”这样的TransDescription,它包括像“%Ticket reopened…”这样的集合。。。“状态%”。不要认为它在性能方面会起到很大作用though@HoneyBadger是的,前导“%”使其不可挂起,因此性能已经因此受到影响。不过,根据IO统计数据,工作表受到了冲击。真正需要加速的是一些xml索引。但是,由于您正在动态创建XML,因此不会发生这种情况。实际上,这相当于交叉连接,随着时间的推移,速度会呈指数级下降。有关详细讨论和索引如何帮助的信息,请参阅。如果你想通过XML来实现这一点,你真的需要存储XML,这样你就可以索引XML了。@LaughingVergil首先将CTE转储到临时表中,然后分解XML确实解决了这个问题。非常感谢。你能把这个作为答案贴出来让我接受吗?
Table 'Trans'. Scan count 9, logical reads 27429, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 2964994, physical reads 0, read-ahead reads 0, lob logical reads 2991628, lob physical reads 0, lob read-ahead reads 0.
--PREPARE SAMPLE DATA
DROP TABLE #Trans
CREATE TABLE #Trans(TransID INT
                    ,TransDate DATE
                    ,TransDescription VARBINARY(MAX)
                    )
INSERT INTO #Trans VALUES
(
1, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(2, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))
,(3, '20160101'
,CAST('<span class="fieldname">Assigned To</span>
   changed from <span class="oldvalue">user1</span>
   to <span class="newvalue">user2</span><br />
<span class="fieldname">Status</span>
   changed from <span class="oldvalue">QA</span>
   to <span class="newvalue">Development</span><br />
<span class="fieldname">Progress</span>
   changed from <span class="oldvalue">Yes</span>
   to <span class="newvalue">No</span><br />' AS varbinary(MAX)))

---------------------------------------------------------------------------------------------------
--RUN BELOW THIS LINE COLLECTIVELY, THE ORIGINAL QUERY IS SHOWING UP WITH APPROX 93% OR OVERALL COST

--BUILD A TEMP TABLE TO RECIEVE XML FORMATTED DATA
DROP TABLE #XmlData
CREATE TABLE #XmlData (
    TransId INT NOT NULL,
    TransXml xml NOT NULL,
CONSTRAINT [PK_XmlData] PRIMARY KEY CLUSTERED (TransId)
) 

--INSERT DATA INTO XML TABLE
INSERT INTO #XmlData
SELECT TransId,
    CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
FROM #Trans
WHERE TransDate >= '11/1/2015'
    AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%')

--CREATE AN XML INDEX
CREATE PRIMARY XML INDEX PXML_TransXml
ON #XmlData(TransXml)

--APPLY NODES QUERY AGAINST XML INDEX
SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
FROM #XmlData TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);

---------------------------------
--Original Query
;WITH TT AS (
   SELECT TransId,
      CAST('<root><rec>' + REPLACE(REPLACE(TransDescription, 'Ticket reopened... Status', 'Status'), '<br />', '</rec><rec>') + '</rec></root>' AS XML) TransXml
   FROM #Trans--dbo.Trans
   WHERE TransDate >= '11/1/2015'
      AND (TransDescription LIKE '%Ticket reopened... Status%' OR TransDescription LIKE '%Status%'))

SELECT TransId,
   TransXml,
   FieldName = T.V.value('span[@class="fieldname"][1]', 'varchar(255)'),
   OldValue = NULLIF(T.V.value('span[@class="oldvalue"][1]', 'varchar(255)'), 'nothing'),
   NewValue = NULLIF(T.V.value('span[@class="newvalue"][1]', 'varchar(255)'), 'nothing')
--INTO #tmp
FROM TT
   CROSS APPLY TT.TransXml.nodes('root/rec') T(V);