Regex Perl正则表达式解析10位数字上的注释块
好吧,事情是这样的。我有一张旧的sql server文本格式的便笺。它将记录的所有注释放在一个大数据块中。我需要将这一块文本解析出来,为每个注释条目创建一行,并为时间戳、用户和注释文本创建单独的列。我能想到的唯一方法是使用正则表达式来定位每个注释的unix时间戳,并对其进行解析。我知道有一个用于对分隔符进行解析的split函数,但它删除了分隔符。我需要对\d{10}进行解析,但还需要保留10位数字。下面是一些示例数据Regex Perl正则表达式解析10位数字上的注释块,regex,perl,Regex,Perl,好吧,事情是这样的。我有一张旧的sql server文本格式的便笺。它将记录的所有注释放在一个大数据块中。我需要将这一块文本解析出来,为每个注释条目创建一行,并为时间戳、用户和注释文本创建单独的列。我能想到的唯一方法是使用正则表达式来定位每个注释的unix时间戳,并对其进行解析。我知道有一个用于对分隔符进行解析的split函数,但它删除了分隔符。我需要对\d{10}进行解析,但还需要保留10位数字。下面是一些示例数据 create table test_table ( job_number nu
create table test_table
(
job_number number,
notes varchar2(4000)
)
insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')
我希望每个音符都有一张这样的记录
JOB_NUMBER DTTM USER NOTES_TEXT
---------- ---------- ---- ----------
12345 1234567890 USERNAME notes text notes text notes text notes text
12345 5468204562 USERNAME notes text notes text notes text notes text
12345 1025478510 USERNAME notes text notes text notes text notes text
12346 2345678901 USERNAME notes text notes text notes text notes text
12346 1523024512 USERNAME notes text notes text notes text notes text
12346 1578451236 USERNAME notes text notes text notes text notes text
12347 2345678902 USERNAME notes text notes text notes text notes text
12347 2365201214 USERNAME notes text notes text notes text notes text
12347 1202154215 USERNAME notes text notes text notes text notes text
感谢您提供的任何帮助。可以处理带引号的字符串和逗号分隔。您可以使用触发器运算符1在输入中向前跳/值/。这种特殊的跳过方法可能需要修改
然后就是解析字符串的问题,这可以通过使用前瞻断言进行拆分,然后捕获每个子字符串中的各种条目来完成。拆分中的正则表达式:
my @entries = split /(?<!^)(?=\d{10})/, $data;
输出:
这一大块数据到底是什么样子的?我们不需要整个blob,但知道导入的内容会有所帮助。split可以保留分隔符:如果模式包含括号,则会从分隔符中的每个匹配子字符串创建其他列表元素。@DavidW。我给出的示例准确地表示了实际数据的格式。真实数据大约是1到100+个notes条目的100倍大。@Ekkehard.Horner,感谢您让我知道这一点。我不知道,谢谢。我要玩这个,看看我能做什么。另外,如果要从数据库表中获取数据,请选择job_number,notes from table;然后将结果写入以管道分隔的文本文件?可以肯定,这将是一项相当简单的任务。虽然完全不同。您可能会使用DBI和Text::CSV模块。
use strict;
use warnings;
use Text::ParseWords;
my $format = "%-12s %-12s %-10s %s\n"; # format for printing
my @headers = qw(JOB_NUMBER DTTM USER NOTES_TEXT);
printf $format, @headers;
printf $format, map "-" x length, @headers; # print underline
while (<DATA>) {
next while 1 .. /values/; # skip to data
s/^\(|\)$//g; # remove parentheses
my ($job, $data) = quotewords('\s*,\s*',0, $_); # parse string
my @entries = split /(?<!^)(?=\d{10})/, $data; # split into entries
for my $entry (@entries) { # parse each entry
my ($dttm, $user, $notes) = $entry =~ /^(\d+)\s+(\S+)\s+(.*)/;
printf $format, $job, $dttm, $user, $entry;
}
}
__DATA__
create table test_table
(
job_number number,
notes varchar2(4000)
)
insert into test_table values
(12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text')
(12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text')
(12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')
JOB_NUMBER DTTM USER NOTES_TEXT
---------- ---- ---- ----------
12345 1234567890 username 1234567890 username notes text notes text notes text notes text
12345 5468204562 username 5468204562 username notes text notes text notes text notes text
12345 1025478510 username 1025478510 username notes text notes text notes text notes text
12346 2345678901 username 2345678901 username notes text notes text notes text notes text
12346 1523024512 username 1523024512 username notes text notes text notes text notes text
12346 1578451236 username 1578451236 username notes text notes text notes text notes text
12347 2345678902 username 2345678902 username notes text notes text notes text notes text
12347 2365201214 username 2365201214 username notes text notes text notes text notes text
12347 1202154215 username 1202154215 username notes text notes text notes text notes text