Oracle 提取CLOB数据以进行插入
我有如下CLOB数据:Oracle 提取CLOB数据以进行插入,oracle,plsql,clob,Oracle,Plsql,Clob,我有如下CLOB数据: 123456 (LED TV); 234543 (LED light); 654876 (LED monitor); 现在,我需要在我的案例中使用分隔符从CLOB中提取案例中的所有6位数字发票跟踪号;并为Insert into table选择该选项,但仅当记录不存在时 我已经看到了几个使用Instr&Substr或Regexp的例子,但是没有什么是我需要的或者超出了我对Oracle的理解。有人能给我举个例子,说明如何根据CLOB中的字符串将CLOB拆分成行,以便在以后插
123456 (LED TV); 234543 (LED light); 654876 (LED monitor);
现在,我需要在我的案例中使用分隔符从CLOB中提取案例中的所有6位数字发票跟踪号;并为Insert into table选择该选项,但仅当记录不存在时
我已经看到了几个使用Instr&Substr或Regexp的例子,但是没有什么是我需要的或者超出了我对Oracle的理解。有人能给我举个例子,说明如何根据CLOB中的字符串将CLOB拆分成行,以便在以后插入时使用它吗
备注:我更喜欢最快的解决方案,因为我的CLOB数据可能包含超过500万个发票记录。它最终将是一个从C启动的存储过程,但这一部分让我头疼。。。对于任何帮助-提前感谢 下面是一个例子
首先测试用例;测试表包含源数据:
SQL> create table test (col clob);
Table created.
SQL> insert into test
2 select '123456 (LED TV); 234543 (LED light); 654876 (LED monitor);' from dual union all
3 select '665988 (Notebook); 987654 (Mouse); 445577 (Dead Pixel);' from dual;
2 rows created.
SQL>
目标表将包含从源中提取的值:
SQL> create table target (itn number, name varchar2(20));
Table created.
SQL> -- This value shouldn't be inserted as it already exists in the TARGET table:
SQL> insert into target values (234543, 'LED light');
1 row created.
SQL>
现在,一些有用的东西。其思想是将列值拆分为行,这就是分层查询中的regexp_substr部分所做的,然后将ID值与括在括号中的名称分开。不应插入目标表中存在的值,因此查询应插入5行:
SQL> insert into target (itn, name)
2 with
3 c2r as
4 -- split column to rows, e.g. "123456 (LED TV)" is an example of such a row
5 (select to_char(trim(regexp_substr(col, '[^;]+', 1, column_value))) val
6 from test join table(cast(multiset(select level from dual
7 connect by level <= regexp_count(col, ';')
8 ) as sys.odcinumberlist)) on 1 = 1
9 ),
10 sep as
11 -- separate ITN (invoice tracking nubmer) and NAME
12 (select substr(val, 1, instr(val, ' ') - 1) itn,
13 substr(val, instr(val, ' ') + 1) name
14 from c2r
15 )
16 select s.itn, replace(replace(s.name, '(', ''), ')', '')
17 from sep s
18 -- don't insert values that already exist in the TARGET table
19 where not exists (select null from target t
20 where t.itn = s.itn
21 );
5 rows created.
SQL>
我曾尝试使用DBMS_LOB包将它们转换为字符串除以;然后对它执行一些字符串操作以获得结果 请尝试以下操作:
INSERT INTO INVOICE_CATEGORIZED
SELECT TAB.INVOICE_NUMBER, TAB.INVOICE_NAME FROM
(SELECT
TRIM(dbms_lob.SUBSTR(INVOICE_INN,6 ,1)) AS INVOICE_NUMBER,
SUBSTR(INVOICE_INN,
INSTR(INVOICE_INN, '(') + 1,
INSTR(INVOICE_INN, ')') - INSTR(INVOICE_INN, '(') - 1 )
AS INVOICE_NAME
-- HERE INVOICE_INN IS STRING NOW, SO WE CAN DO STRING OPERATIONS ON IT ONWARD
FROM
(
-- DIVIDING ; SEPARATED CLOB TO INDIVIDUAL STRING
SELECT
TRIM(CASE WHEN INVOICE_SINGLE.COLUMN_VALUE = 1 THEN
dbms_lob.SUBSTR(INVOICE,
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1,
1
)
ELSE
dbms_lob.SUBSTR(INVOICE,
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1
- dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1),
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1) + 1)
END) AS INVOICE_INN
FROM
INVOICES T,
TABLE ( CAST(MULTISET(
SELECT
LEVEL
FROM
DUAL
CONNECT BY
dbms_lob.INSTR(INVOICE,';',1,LEVEL) <> 0
) AS SYS.ODCINUMBERLIST) ) INVOICE_SINGLE)) TAB
WHERE NOT EXISTS (SELECT 1 FROM INVOICE_CATEGORIZED IC
WHERE IC.INVOICE_NUMBER = TAB.INVOICE_NUMBER
AND IC.INVOICE_NAME = TAB.INVOICE_NAME)
干杯 您是否测试过长度超过4k和32k的CLOB值?这些数字往往会破坏我们在处理CLOB数据时在PL/SQL中依赖的许多东西,例如will trim和regexp_substr在40k CLOB上工作?我不知道,但在解决方案中使用它之前最好先了解它。另外,看一点关于6-8行如何工作的解释也会很有帮助。@Littlefoot,谢谢你的回答。虽然有点复杂,但最让我担心的是regexp的使用。我希望以dmbs_lob.instr和dbms_lob.substr为例,因为我已经读到,在读取CLOB时,这些应该是最快的。还是我错了?从性能的角度来看,您认为您的解决方案更快吗?您是否使用原始数据尝试过该解决方案?是的,谢谢您的回答。当CLOB较大时,您的代码似乎运行得更快,所以我将您的解决方案标记为答案。非常感谢你:
INSERT INTO INVOICE_CATEGORIZED
SELECT TAB.INVOICE_NUMBER, TAB.INVOICE_NAME FROM
(SELECT
TRIM(dbms_lob.SUBSTR(INVOICE_INN,6 ,1)) AS INVOICE_NUMBER,
SUBSTR(INVOICE_INN,
INSTR(INVOICE_INN, '(') + 1,
INSTR(INVOICE_INN, ')') - INSTR(INVOICE_INN, '(') - 1 )
AS INVOICE_NAME
-- HERE INVOICE_INN IS STRING NOW, SO WE CAN DO STRING OPERATIONS ON IT ONWARD
FROM
(
-- DIVIDING ; SEPARATED CLOB TO INDIVIDUAL STRING
SELECT
TRIM(CASE WHEN INVOICE_SINGLE.COLUMN_VALUE = 1 THEN
dbms_lob.SUBSTR(INVOICE,
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1,
1
)
ELSE
dbms_lob.SUBSTR(INVOICE,
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE) - 1
- dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1),
dbms_lob.INSTR(INVOICE,';',1,INVOICE_SINGLE.COLUMN_VALUE - 1) + 1)
END) AS INVOICE_INN
FROM
INVOICES T,
TABLE ( CAST(MULTISET(
SELECT
LEVEL
FROM
DUAL
CONNECT BY
dbms_lob.INSTR(INVOICE,';',1,LEVEL) <> 0
) AS SYS.ODCINUMBERLIST) ) INVOICE_SINGLE)) TAB
WHERE NOT EXISTS (SELECT 1 FROM INVOICE_CATEGORIZED IC
WHERE IC.INVOICE_NUMBER = TAB.INVOICE_NUMBER
AND IC.INVOICE_NAME = TAB.INVOICE_NAME)