Oracle 提取CLOB数据以进行插入

Oracle 提取CLOB数据以进行插入,oracle,plsql,clob,Oracle,Plsql,Clob,我有如下CLOB数据: 123456 (LED TV); 234543 (LED light); 654876 (LED monitor); 现在,我需要在我的案例中使用分隔符从CLOB中提取案例中的所有6位数字发票跟踪号;并为Insert into table选择该选项,但仅当记录不存在时 我已经看到了几个使用Instr&Substr或Regexp的例子,但是没有什么是我需要的或者超出了我对Oracle的理解。有人能给我举个例子,说明如何根据CLOB中的字符串将CLOB拆分成行,以便在以后插

我有如下CLOB数据:

123456 (LED TV); 234543 (LED light); 654876 (LED monitor);
现在,我需要在我的案例中使用分隔符从CLOB中提取案例中的所有6位数字发票跟踪号;并为Insert into table选择该选项,但仅当记录不存在时

我已经看到了几个使用Instr&Substr或Regexp的例子,但是没有什么是我需要的或者超出了我对Oracle的理解。有人能给我举个例子,说明如何根据CLOB中的字符串将CLOB拆分成行,以便在以后插入时使用它吗

备注:我更喜欢最快的解决方案,因为我的CLOB数据可能包含超过500万个发票记录。它最终将是一个从C启动的存储过程,但这一部分让我头疼。。。对于任何帮助-提前感谢

下面是一个例子

首先测试用例;测试表包含源数据:

SQL> create table test (col clob);

Table created.

SQL> insert into test
  2    select '123456 (LED TV); 234543 (LED light); 654876 (LED monitor);' from dual union all
  3    select '665988 (Notebook); 987654 (Mouse); 445577 (Dead Pixel);'    from dual;

2 rows created.

SQL>
目标表将包含从源中提取的值:

SQL> create table target (itn number, name varchar2(20));

Table created.

SQL> -- This value shouldn't be inserted as it already exists in the TARGET table:
SQL> insert into target values (234543, 'LED light');

1 row created.

SQL>
现在,一些有用的东西。其思想是将列值拆分为行,这就是分层查询中的regexp_substr部分所做的,然后将ID值与括在括号中的名称分开。不应插入目标表中存在的值,因此查询应插入5行:

SQL> insert into target (itn, name)
  2  with
  3  c2r as
  4    -- split column to rows, e.g. "123456 (LED TV)" is an example of such a row
  5    (select to_char(trim(regexp_substr(col, '[^;]+', 1, column_value))) val
  6     from test join table(cast(multiset(select level from dual
  7                                        connect by level <= regexp_count(col, ';')
  8                                       ) as sys.odcinumberlist)) on 1 = 1
  9    ),
 10  sep as
 11    -- separate ITN (invoice tracking nubmer) and NAME
 12    (select substr(val, 1, instr(val, ' ') - 1) itn,
 13            substr(val, instr(val, ' ') + 1) name
 14     from c2r
 15    )
 16  select s.itn, replace(replace(s.name, '(', ''), ')', '')
 17  from sep s
 18  -- don't insert values that already exist in the TARGET table
 19  where not exists (select null from target t
 20                    where t.itn = s.itn
 21                   );

5 rows created.

SQL>

我曾尝试使用DBMS_LOB包将它们转换为字符串除以;然后对它执行一些字符串操作以获得结果

请尝试以下操作:

INSERT INTO INVOICE_CATEGORIZED 
SELECT TAB.INVOICE_NUMBER, TAB.INVOICE_NAME FROM
(SELECT 
TRIM(dbms_lob.SUBSTR(INVOICE_INN,6 ,1)) AS INVOICE_NUMBER, 

SUBSTR(INVOICE_INN, 
INSTR(INVOICE_INN, '(') + 1,
INSTR(INVOICE_INN, ')') - INSTR(INVOICE_INN, '(') - 1 )
 AS INVOICE_NAME

-- HERE INVOICE_INN IS STRING NOW, SO WE CAN DO STRING OPERATIONS ON IT ONWARD

FROM
(
-- DIVIDING ; SEPARATED CLOB TO INDIVIDUAL STRING
SELECT
    TRIM(CASE WHEN INVOICE_SINGLE.COLUMN_VALUE = 1 THEN
    dbms_lob.SUBSTR(INVOICE, 
    dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1,
    1 
    )
    ELSE
    dbms_lob.SUBSTR(INVOICE, 
    dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1
    - dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1),
    dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1) + 1)
    END) AS INVOICE_INN
FROM
    INVOICES T,
    TABLE ( CAST(MULTISET(
        SELECT
            LEVEL
        FROM
            DUAL
        CONNECT BY
            dbms_lob.INSTR(INVOICE,';',1,LEVEL) <> 0
    ) AS SYS.ODCINUMBERLIST) ) INVOICE_SINGLE)) TAB 
     WHERE NOT EXISTS (SELECT 1 FROM INVOICE_CATEGORIZED IC
    WHERE IC.INVOICE_NUMBER  = TAB.INVOICE_NUMBER
    AND IC.INVOICE_NAME = TAB.INVOICE_NAME)


干杯

您是否测试过长度超过4k和32k的CLOB值?这些数字往往会破坏我们在处理CLOB数据时在PL/SQL中依赖的许多东西,例如will trim和regexp_substr在40k CLOB上工作?我不知道,但在解决方案中使用它之前最好先了解它。另外,看一点关于6-8行如何工作的解释也会很有帮助。@Littlefoot,谢谢你的回答。虽然有点复杂,但最让我担心的是regexp的使用。我希望以dmbs_lob.instr和dbms_lob.substr为例,因为我已经读到,在读取CLOB时,这些应该是最快的。还是我错了?从性能的角度来看,您认为您的解决方案更快吗?您是否使用原始数据尝试过该解决方案?是的,谢谢您的回答。当CLOB较大时,您的代码似乎运行得更快,所以我将您的解决方案标记为答案。非常感谢你:
INSERT INTO INVOICE_CATEGORIZED 
SELECT TAB.INVOICE_NUMBER, TAB.INVOICE_NAME FROM
(SELECT 
TRIM(dbms_lob.SUBSTR(INVOICE_INN,6 ,1)) AS INVOICE_NUMBER, 

SUBSTR(INVOICE_INN, 
INSTR(INVOICE_INN, '(') + 1,
INSTR(INVOICE_INN, ')') - INSTR(INVOICE_INN, '(') - 1 )
 AS INVOICE_NAME

-- HERE INVOICE_INN IS STRING NOW, SO WE CAN DO STRING OPERATIONS ON IT ONWARD

FROM
(
-- DIVIDING ; SEPARATED CLOB TO INDIVIDUAL STRING
SELECT
    TRIM(CASE WHEN INVOICE_SINGLE.COLUMN_VALUE = 1 THEN
    dbms_lob.SUBSTR(INVOICE, 
    dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1,
    1 
    )
    ELSE
    dbms_lob.SUBSTR(INVOICE, 
    dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1
    - dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1),
    dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1) + 1)
    END) AS INVOICE_INN
FROM
    INVOICES T,
    TABLE ( CAST(MULTISET(
        SELECT
            LEVEL
        FROM
            DUAL
        CONNECT BY
            dbms_lob.INSTR(INVOICE,';',1,LEVEL) <> 0
    ) AS SYS.ODCINUMBERLIST) ) INVOICE_SINGLE)) TAB 
     WHERE NOT EXISTS (SELECT 1 FROM INVOICE_CATEGORIZED IC
    WHERE IC.INVOICE_NUMBER  = TAB.INVOICE_NUMBER
    AND IC.INVOICE_NAME = TAB.INVOICE_NAME)