Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在OracleSQLDeveloper上加速REGEXP级查询_Sql_Regex_Oracle_Performance_Insert Into - Fatal编程技术网

如何在OracleSQLDeveloper上加速REGEXP级查询

如何在OracleSQLDeveloper上加速REGEXP级查询,sql,regex,oracle,performance,insert-into,Sql,Regex,Oracle,Performance,Insert Into,我有一个INSERT INTO SELECT语句,它用一个函数解析的值填充一个表;在源表中: INSERT INTO PC_MATERIALS_BRIDGE (MATERIAL_BRIDGE_ID, VARIABLE_ID, MATERIAL_NAME) SELECT PC_VAR_MATERIALS_BRIDGE_SEQ.NEXTVAL, VARIABLE_ID, MATERIAL_NAME FROM (SELECT DISTINCT E.VARIABLE_ID, LOWER(TRIM(

我有一个INSERT INTO SELECT语句,它用一个函数解析的值填充一个表;在源表中:

INSERT INTO PC_MATERIALS_BRIDGE (MATERIAL_BRIDGE_ID, VARIABLE_ID, MATERIAL_NAME)
   SELECT PC_VAR_MATERIALS_BRIDGE_SEQ.NEXTVAL, VARIABLE_ID, MATERIAL_NAME FROM (SELECT DISTINCT E.VARIABLE_ID, LOWER(TRIM(REGEXP_SUBSTR(e.MATERIALS, '[^;]+', 1, LEVEL))) MATERIAL_NAME
        FROM (SELECT VARIABLE_ID, MATERIALS FROM SRC_VARS_OCEAN_ALL WHERE MATERIALS IS NOT NULL AND MATERIALS != 'N/A) e
        CONNECT BY LOWER(TRIM(REGEXP_SUBSTR(e.MATERIALS, '[^;]+', 1, LEVEL))) IS NOT NULL);
因此,源表中的数据

ID     MATERIAL_NAME
1      paper
2      paper; plastic
将显示为

MATERIAL_BRIDGE_ID     MATERIAL_NAME   
1                      paper
2                      paper
3                      plastic
在目标表中


脚本运行良好;但是,它非常昂贵,因为源表有近40000条记录,有些记录有三个值,例如paper;塑料橡胶我知道水平仪很贵。我将MATERIAL_NAME设置为VARCHAR2255字节。除了编写另一种类型的查询(例如递归查询),不确定如何改进,但这可能很困难。是不是也导致了它的速度减慢?DISTINCT可能不再需要,因为e.VARIABLE_ID现在是主键。

这是一种效率非常低的方法。在下面的简单演示中,您可以观察到当您删除DISTINCT时它会导致问题的原因:

create table SRC_VARS_OCEAN_ALL(
  VARIABLE_ID int, 
  MATERIALS varchar2(200)
);

insert into SRC_VARS_OCEAN_ALL values( 1, 'ala;ma;kota' );
insert into SRC_VARS_OCEAN_ALL values( 2, 'as;to;pies' );
insert into SRC_VARS_OCEAN_ALL values( 3, 'baba;jaga' );
insert into SRC_VARS_OCEAN_ALL values( 4, 'zupa;obiad' );
以及:

此查询仅为4个输入行生成52条输出记录,其中包含10个值。你可以猜到4万美元会有多少。 该查询生成数百个thausand甚至数百万行,然后对这个巨大的结果集进行DISTINCT排序以消除重复项

下面的查询应该执行得更好,因为它只生成10条记录,不多也不少,与执行此任务所需的记录相同:

SELECT  a.VARIABLE_ID, b.lev_el,
       trim( regexp_substr( a.MATERIALS, '[^;]+', 1, b.lev_el )) as MATERIAL_NAME
FROM SRC_VARS_OCEAN_ALL a
JOIN (
  SELECT level as lev_el
  FROM dual CONNECT BY level <= 100
) b
ON b.lev_el <= regexp_count( a.MATERIALS, ';' ) + 1

VARIABLE_ID     LEV_EL MATERIAL_NAME 
----------- ---------- --------------
          1          1 ala           
          2          1 as            
          3          1 baba          
          4          1 zupa          
          1          2 ma            
          2          2 to            
          3          2 jaga          
          4          2 obiad         
          1          3 kota          
          2          3 pies          

10 rows selected. 

我假设每个列表中的值不超过100个,每一行都有一个列表,其中的值不超过100个,所以这里有一个dual CONNECT BY level很棒,谢谢。我一直在使用DISTINCT,但忽略了一个事实,即在检索到所有可能的行之后对行进行排序。干杯
SELECT  a.VARIABLE_ID, b.lev_el,
       trim( regexp_substr( a.MATERIALS, '[^;]+', 1, b.lev_el )) as MATERIAL_NAME
FROM SRC_VARS_OCEAN_ALL a
JOIN (
  SELECT level as lev_el
  FROM dual CONNECT BY level <= 100
) b
ON b.lev_el <= regexp_count( a.MATERIALS, ';' ) + 1

VARIABLE_ID     LEV_EL MATERIAL_NAME 
----------- ---------- --------------
          1          1 ala           
          2          1 as            
          3          1 baba          
          4          1 zupa          
          1          2 ma            
          2          2 to            
          3          2 jaga          
          4          2 obiad         
          1          3 kota          
          2          3 pies          

10 rows selected.