Regex Perl正则表达式解析10位数字上的注释块

Regex Perl正则表达式解析10位数字上的注释块,regex,perl,Regex,Perl,好吧,事情是这样的。我有一张旧的sql server文本格式的便笺。它将记录的所有注释放在一个大数据块中。我需要将这一块文本解析出来,为每个注释条目创建一行,并为时间戳、用户和注释文本创建单独的列。我能想到的唯一方法是使用正则表达式来定位每个注释的unix时间戳,并对其进行解析。我知道有一个用于对分隔符进行解析的split函数,但它删除了分隔符。我需要对\d{10}进行解析,但还需要保留10位数字。下面是一些示例数据 create table test_table ( job_number nu

好吧,事情是这样的。我有一张旧的sql server文本格式的便笺。它将记录的所有注释放在一个大数据块中。我需要将这一块文本解析出来,为每个注释条目创建一行,并为时间戳、用户和注释文本创建单独的列。我能想到的唯一方法是使用正则表达式来定位每个注释的unix时间戳,并对其进行解析。我知道有一个用于对分隔符进行解析的split函数,但它删除了分隔符。我需要对\d{10}进行解析,但还需要保留10位数字。下面是一些示例数据

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')
我希望每个音符都有一张这样的记录

JOB_NUMBER        DTTM    USER     NOTES_TEXT
----------    ----------  ----     ----------
12345         1234567890  USERNAME notes text notes text notes text notes text
12345         5468204562  USERNAME notes text notes text notes text notes text
12345         1025478510  USERNAME notes text notes text notes text notes text
12346         2345678901  USERNAME notes text notes text notes text notes text
12346         1523024512  USERNAME notes text notes text notes text notes text
12346         1578451236  USERNAME notes text notes text notes text notes text
12347         2345678902  USERNAME notes text notes text notes text notes text
12347         2365201214  USERNAME notes text notes text notes text notes text
12347         1202154215  USERNAME notes text notes text notes text notes text
感谢您提供的任何帮助。

可以处理带引号的字符串和逗号分隔。您可以使用触发器运算符1在输入中向前跳/值/。这种特殊的跳过方法可能需要修改

然后就是解析字符串的问题,这可以通过使用前瞻断言进行拆分,然后捕获每个子字符串中的各种条目来完成。拆分中的正则表达式:

my @entries = split /(?<!^)(?=\d{10})/, $data;
输出:


这一大块数据到底是什么样子的?我们不需要整个blob,但知道导入的内容会有所帮助。split可以保留分隔符:如果模式包含括号,则会从分隔符中的每个匹配子字符串创建其他列表元素。@DavidW。我给出的示例准确地表示了实际数据的格式。真实数据大约是1到100+个notes条目的100倍大。@Ekkehard.Horner,感谢您让我知道这一点。我不知道,谢谢。我要玩这个,看看我能做什么。另外,如果要从数据库表中获取数据,请选择job_number,notes from table;然后将结果写入以管道分隔的文本文件?可以肯定,这将是一项相当简单的任务。虽然完全不同。您可能会使用DBI和Text::CSV模块。
use strict;
use warnings;
use Text::ParseWords;

my $format = "%-12s %-12s %-10s %s\n";              # format for printing
my @headers = qw(JOB_NUMBER DTTM USER NOTES_TEXT);  
printf $format, @headers;
printf $format, map "-" x length, @headers;         # print underline
while (<DATA>) {
    next while 1 .. /values/;                       # skip to data
    s/^\(|\)$//g;                                   # remove parentheses
    my ($job, $data) = quotewords('\s*,\s*',0, $_); # parse string
    my @entries = split /(?<!^)(?=\d{10})/, $data;  # split into entries
    for my $entry (@entries) {                      # parse each entry
        my ($dttm, $user, $notes) = $entry =~ /^(\d+)\s+(\S+)\s+(.*)/;
        printf $format, $job, $dttm, $user, $entry;
    }
}

__DATA__
create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')
JOB_NUMBER   DTTM         USER       NOTES_TEXT
----------   ----         ----       ----------
12345        1234567890   username   1234567890 username notes text notes text notes text notes text
12345        5468204562   username   5468204562 username notes text notes text notes text notes text
12345        1025478510   username   1025478510 username notes text notes text notes text notes text
12346        2345678901   username   2345678901 username notes text notes text notes text notes text
12346        1523024512   username   1523024512 username notes text notes text notes text notes text
12346        1578451236   username   1578451236 username notes text notes text notes text notes text
12347        2345678902   username   2345678902 username notes text notes text notes text notes text
12347        2365201214   username   2365201214 username notes text notes text notes text notes text
12347        1202154215   username   1202154215 username notes text notes text notes text notes text