SQL:大案例条件GROUPBY子句慢
我需要根据号码范围提取基于LC呼叫号码的报告。呼叫号码格式如下,我需要提取标点符号前的第二个字段进行分组:SQL:大案例条件GROUPBY子句慢,sql,oracle,Sql,Oracle,我需要根据号码范围提取基于LC呼叫号码的报告。呼叫号码格式如下,我需要提取标点符号前的第二个字段进行分组: CALL_NO_ID1 -------------- a!3243 .m43 12 a#435 234 1999 cs"345 1973. ... 下面是我的sql select count("CALL_NO_ID1") "No_of_Items", case WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE
CALL_NO_ID1
--------------
a!3243 .m43 12
a#435 234 1999
cs"345 1973.
...
下面是我的sql
select count("CALL_NO_ID1") "No_of_Items",
case
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
...
... (around 450 case conditions)
...
else "CALL_NO_ID1"
end "Primary Call"
from DWH_FACT_ITEMS
group by
case
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN (LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
...
... (around 450 case conditions)
...
然而,这将需要很长的时间才能得到结果(2~3hr),我想知道有什么建议可以改进我的sql吗
谢谢
莫里斯我会添加一个额外的列
CALL\u NO\u CLEARED
来保留这个号码。将表达式应用于所有值以填充人造列
您可以在插入/更新时添加触发器,以在添加或更改列时动态填充该列
然后,您可以在选择的引入索引中使用CALL\u NO\u CLEARED
,使其更快
更新:
我可以建议另一种方法。似乎最耗时的过程是通话
LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0')
因此,对于每一行,我们计算它450次(对于提到的中的每一行)
尝试将计算放在子查询中,然后稍后应用组,例如
select *
FROM (
select
CALL_NO_ID1,
LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') as sub_num
from DWH_FACT_ITEMS) sub
group by
case
WHEN (sub.sub_num BETWEEN 0 AND 999)AND (sub.CALL_NO_DESC1 LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
...
您有几种可能改进查询。我的测试表明
通过(使用子查询)消除group by中的case可以提高查询的可维护性和大小,但性能保持不变
通过对案例陈述进行排序,观察到了特别的改善,以便将最常见的情况放在顶部
这个想法很简单,如果在案例的早期完成匹配,则跳过其余条件
通过对WHEN语句中的谓词重新排序,实现了更好的改进。如果call\u NO\u DESC子字符串不匹配,则不会调用regexp处理
WHEN ("CALL_NO_DESC1" LIKE 'TR %') and
(LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') BETWEEN 1 AND 1050)
THEN 'TR 1-4050 - Photography'
最后一步是在子查询中只调用一次REGEXP处理
总而言之,我以这个查询结束,它大大减少了运行时间(使用我的测试数据)
使用WITH
子句和/*+MATERIALIZE*/
提示,让Oracle只执行一次昂贵的操作
这在400000行上的性能应该比2-3小时好得多:
WITH parsed_call_numbers as ( SELECT /*+ MATERIALIZE */
SELECT CALL_NO_ID1,
(LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') call_Number_part,
CALL_NO_DESC1
from DWH_FACT_ITEMS ) ,
primary_calls AS ( SELECT /*+ MATERIALIZE */
CALL_NO_ID1,
case
WHEN call_number_part BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN call_number_part BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN call_number_part BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN call_number_part BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
--...
--... (around 450 case conditions)
--...
else "CALL_NO_ID1"
end "Primary Call"
from parsed_call_numbers )
select count("CALL_NO_ID1") "No_of_Items", "Primary Call"
FROM primary_calls
group by "Primary Call"
您能告诉我们您的CASE
语句中使用的逻辑和分组过程吗?当分组基于LC分类,并且第一个或“第一和第二个”必须是英文字符时,您需要一个算法来剪切这些CASE。谢谢回复。这是信用证分类,第一个或“第一和第二个”必须是英文字符。然后后跟一个带有字段分隔符“空白”、!、#的数字或“a!3243.m43 12(以分贝记录)-->a(第一字段)-->3243(第二字段)-->.m43(第三字段)a#435 234 1999(以分贝记录)-->a(第一字段)-->234(第二字段)-->1999(第三字段)cs”345 1973。(在db中记录)-->cs(第一个字段)-->345(第二个字段)-->1973。(第三个字段)因为要求根据给定的分组第一和第二个字段提取范围内的记录总数,例如,字段1(以a开头)和字段2范围(1-400)是组字段1(以a开头),字段2范围(500-4000)是B组字段1(以B开头),字段2范围(1-400)是C组。。。。我认为以上情况是不规则的。谢谢你的建议,如果你能应用你的建议,应该会得到很好的结果。但db由供应商维护。修改该表可能违反合同,并可能导致未来升级出现问题。感谢您的建议,我尝试按照您的建议修改sql。执行时间没有差别。嗨,StanislavL,即使我改变了顺序或案例,我也能感觉到显著的改进。非常感谢,莫里斯
WITH parsed_call_numbers as ( SELECT /*+ MATERIALIZE */
SELECT CALL_NO_ID1,
(LPAD(CAST(regexp_replace(REGEXP_SUBSTR(REGEXP_REPLACE("CALL_NO_ID1",'["]|[#]|[!]', ' '),'[^ ]+|["]|[#]',1,2), '[^0-9]+', '') as number),7,'0') call_Number_part,
CALL_NO_DESC1
from DWH_FACT_ITEMS ) ,
primary_calls AS ( SELECT /*+ MATERIALIZE */
CALL_NO_ID1,
case
WHEN call_number_part BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KG %') THEN 'KG 0-999 - Federal law Common and collective state law Individual states US - Latin AmericaGeneral'
WHEN call_number_part BETWEEN 0 AND 999)AND ("CALL_NO_DESC1" LIKE 'KH %') THEN 'KH 0-999 - Federal law Common and collective state law Individual states US - South AmericaGeneral '
WHEN call_number_part BETWEEN 1 AND 100)AND ("CALL_NO_DESC1" LIKE 'DE %') THEN 'DE 1-100 - HistoryGeneral - The Mediterranean Region The Greco-Roman World'
WHEN call_number_part BETWEEN 1 AND 1050)AND ("CALL_NO_DESC1" LIKE 'TR %') THEN 'TR 1-1050 - Photography'
--...
--... (around 450 case conditions)
--...
else "CALL_NO_ID1"
end "Primary Call"
from parsed_call_numbers )
select count("CALL_NO_ID1") "No_of_Items", "Primary Call"
FROM primary_calls
group by "Primary Call"