sql正则表达式解析文本以添加新行

sql正则表达式解析文本以添加新行,sql,regex,oracle,Sql,Regex,Oracle,我试图获取一个notes字段,它只是一个大的文本块,下面是示例数据,就像我将其插入到表中一样 create table test_table ( job_number number, notes varchar2(4000) ) insert into test_table (job_number,notes) values (12345,1022089483 notes notes notes notes 1022094450 notes notes notes notes 10220952

我试图获取一个notes字段,它只是一个大的文本块,下面是示例数据,就像我将其插入到表中一样

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table (job_number,notes)
values (12345,1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes)
我需要对它进行解析,以便每个notes条目都有一个单独的记录(notes前面的10位数字是unix时间戳)。因此,如果要导出到以管道分隔的对象,它将如下所示:

工作编号|备注

12345 | 1022089483注释

12345 | 1022094450注释

12345 | 1022095218注释


我真的希望这是有意义的。我非常感谢您的见解。

有几种方法可以做到这一点:

SQL> insert into test_table (job_number,notes)
  2  values (12345,'1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes');

1 row created.

SQL> insert into test_table (job_number,notes)
  2  values (12346,'1022089483 notes notes notes notes 1022094450 foo 1022095218 test notes 1022493228 the answer is 42');

1 row created.

SQL> commit;

Commit complete.
注意:我使用
[0-9]{10}
作为我的正则表达式来确定注释(即任何10位数字都被认为是注释的开始)

首先,我们可以采用计算任意给定行中注释的最大数量的方法,然后使用该数量的行进行笛卡尔连接。然后过滤掉每个注释:

SQL> with data
  2  as (select job_number, notes,
  3            (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  4        from test_table t)
  5  select job_number,
  6         substr(d.notes, regexp_instr(d.notes, '[0-9]{10}', 1, rn.l),
  7                       regexp_instr(d.notes||' 0000000000', '[0-9]{10}', 1, rn.l+1)
  8                       -regexp_instr(d.notes, '[0-9]{10}', 1, rn.l) -1
  9               ) note
 10    from data d
 11         cross join (select rownum l
 12                      from dual
 13                    connect by level <= (select max(num_of_notes)
 14                                           from data)) rn
 15   where rn.l <= d.num_of_notes
 16   order by job_number, rn.l;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.
或者从10g开始,我们可以使用model子句组成行:

SQL> with data as (select job_number, notes,
  2                       (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  3                  from test_table)
  4  select job_number, note
  5    from data
  6  model
  7  partition by (job_number)
  8  dimension by (1 as i)
  9  measures (notes, num_of_notes, cast(null as varchar2(4000)) note)
 10  rules
 11  (
 12    note[for i from 1 to num_of_notes[1] increment 1]
 13      = substr(notes[1],
 14               regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)),
 15               regexp_instr(notes[1]||' 0000000000', '[0-9]{10}', 1, cv(i)+1)
 16               -regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)) -1
 17              )
 18  )
 19  order by job_number, i;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

我假设每行的音符数量不同?还有,您使用的是什么版本的oracle?是的,注释的数量会有所不同。我想我们在8点或9点。regex不是内置的,但是我们已经创建了一些函数来执行一些regex的东西。
SQL> with data as (select job_number, notes,
  2                       (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  3                  from test_table)
  4  select job_number, note
  5    from data
  6  model
  7  partition by (job_number)
  8  dimension by (1 as i)
  9  measures (notes, num_of_notes, cast(null as varchar2(4000)) note)
 10  rules
 11  (
 12    note[for i from 1 to num_of_notes[1] increment 1]
 13      = substr(notes[1],
 14               regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)),
 15               regexp_instr(notes[1]||' 0000000000', '[0-9]{10}', 1, cv(i)+1)
 16               -regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)) -1
 17              )
 18  )
 19  order by job_number, i;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42