Perl 确定动态清单项目
我正在尝试识别列表数据代码是:Perl 确定动态清单项目,perl,list,search,identify,Perl,List,Search,Identify,我正在尝试识别列表数据代码是: my $listdata = ' List Items: (1)LIST 1 data (a)sub data (b)sub data (c)sub data (d)sub data (i)sub-sub data (ii)sub-sub data (A)sub-sub-sub data (B)sub-sub-sub data (iii)sub-sub data (e)sub data (2)LI
my $listdata = '
List Items:
(1)LIST 1 data
(a)sub data
(b)sub data
(c)sub data
(d)sub data
(i)sub-sub data
(ii)sub-sub data
(A)sub-sub-sub data
(B)sub-sub-sub data
(iii)sub-sub data
(e)sub data
(2)LIST 2 data
(3)LIST 3 data
';
#print "\n\n\n$listdata\n\n";
###Array of multi-level patterns
my @level_check =('\(\d+\)','(?<!\()\d+\)','\([a-h]\)','(?<!\()[a-h]\)','\([A-H]\)','(?<!\()[A-H]\)','\d+\.',
'\([IVX]+\)','(?<!\()[IVX]+\)','\([ivx]+\)','(?<!\()[ivx]+\)','\-');
###pattern for each levels
my ($first_level,$second_level,$third_level,$fourth_level);
###First from each pattern
my ($first_occur,$second_occur,$third_occur,$fourth_occur);
#++++++++++++++++++++++++Pattern for multilevel list+++++++++++++++++++++++#
my $pattern = '((?:[IVX\-\(\)\d\.\-][a-z]?\)?)+)';
$listdata =~ s{$pattern}{
my ($leveltemp) = ($1);
$first_occur = $leveltemp if !$first_occur;
#print "$data";
#print "all_level: $leveltemp##\n";
#########First Level Start
for($i=0; $i<scalar(@level_check);$i++){
if($first_occur =~ /^$level_check[$i]$/){
$first_level = $level_check[$i] if !$first_level;
#print "$level_check[$i] =>is Ist: $first_level\n";
}
}
for($i=0; $i<scalar(@level_check);$i++){
if($leveltemp =~ /^$first_level$/){
$leveltemp =~ s{$pattern}{<<LIST1>>$2$3};
#print"**$data level matched: $leveltemp => $first_level\n";
############First Level End
}
else
{
######Second level Start
if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([a-h]{3,})/i){
$second_occur = $leveltemp if !$second_occur;
#print "$leveltemp :$second_occur\n";
for($i=0; $i<scalar(@level_check);$i++){
if($second_occur =~ /^$level_check[$i]$/){
$second_level = $level_check[$i] if !$second_level;
#print "$leveltemp =>is IInd: $second_level\n";
}
}
if($leveltemp =~ /^$second_level/){
$leveltemp =~ s{$pattern}{<<LIST2>>$2$3};
#print"**level matched: $leveltemp => $seconf_level\n";
######Second level End
}
else
{
########Third Level Start
if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([A-h]{3,})/i){
$third_occur = $leveltemp if !$third_occur;
for($i=0; $i<scalar(@level_check);$i++){
if($third_occur =~ /^$level_check[$i]$/){
$third_level = $level_check[$i] if !$third_level;
#print "$leveltemp =>is IIIrd: $third_level\n";
}
}
if($leveltemp =~ /^$third_level/){
$leveltemp =~ s{$pattern}{<<LIST3>>$2$3};
#print"**level matched: $leveltemp => $third_level\n";
#########Third Level End
}
else
{
########Fourth Level Start
if($leveltemp !~ /^(?:<<LIST+>>|\d{3,}\,?|\([A-z]{3,})/i){
$fourth_occur = $leveltemp if !$fourth_occur;
#print "$leveltemp :$fourth_occur\n";
for($i=0; $i<scalar(@level_check);$i++){
if($fourth_occur =~ /^$level_check[$i]$/){
$fourth_level = $level_check[$i] if !$fourth_level;
#print "$leveltemp =>is IVrth: $fourth_level\n";
}
}
if($leveltemp =~ /^$fourth_level/){
$leveltemp =~ s{$pattern}{<<LIST4>>$2$3};
#print"**$fourth_occur level matched: $leveltemp => $fourth_level\n";
#########Fourth Level End
}
#######Add Next Levels Here If Any in else loop
}
}#IV lvl else loop end
}
}#III lvl else loop end
}
}#IInd lvl else loop end
}#Ist lvl for loop end
"$leveltemp"
}gsixe;
print "$listdata\n";
my$listdata='1〕
清单项目:
(1) 清单1数据
(a) 子数据
(b) 子数据
(c) 子数据
(d) 子数据
(i) 子数据
(二)分项数据
(A) 子数据
(B) 子数据
(三)分项数据
(e) 子数据
(2) 清单2数据
(3) 清单3数据
';
#打印“\n\n\n$listdata\n\n”;
###多级模式数组
我的@level\u check=(“\(\d+\)”,“(?尝试逐行处理。以下内容确定了每个列表项所处的级别。只需跟踪上一级别,即可确定某个内容是否为子项,以及上一级别的最大值,以验证其顺序是否正确:
use strict;
use warnings;
###Array of multi-level patterns
my @level_check = (
'\(\d+\)',
'(?<!\()\d+\)',
'\([a-h]\)',
'(?<!\()[a-h]\)',
'\([A-H]\)',
'(?<!\()[A-H]\)',
'\d+\.',
'\([IVX]+\)',
'(?<!\()[IVX]+\)',
'\([ivx]+\)',
'(?<!\()[ivx]+\)',
'\-',
);
while (<DATA>) {
chomp(my $line = $_);
my $match = 0;
for my $i (0..$#level_check) {
if ($line =~ /^\s*$level_check[$i]/) {
$match = $i + 1;
last;
}
}
if ($match) {
print "Level $match - $line\n";
} else {
print "No Match - $line\n";
}
}
1;
__END__
(1)LIST 1 data
(a)sub data
(b)sub data
(c)sub data
(d)sub data
(i)sub-sub data
(ii)sub-sub data
(A)sub-sub-sub data
(B)sub-sub-sub data
(iii)sub-sub data
(e)sub data
(2)LIST 2 data
(3)LIST 3 data
使用堆栈跟踪“打开”样式,以确定新样式是子样式还是父样式
use strict;
use warnings;
my @styles = (
'\(\d+\)', '\d+\)', '\d+\.',
'\([a-h]\)', '[a-h]\)', '\([A-H]\)', '[A-H]\)',
'\([IVX]+\)', '[IVX]+\)', '\([ivx]+\)', '[ivx]+\)',
'-',
);
my @stack;
while (<>) {
for my $i (reverse 0..$#stack) {
if (/$stack[$i]/) {
splice(@stack, $i+1);
goto DONE_LINE;
}
}
for my $style (@styles) {
if (my ($spaces) = /^( *)$style/) {
push @stack, qr/^$spaces$style/;
goto DONE_LINE;
}
}
die "Unrecognized format at line $. - $_";
DONE_LINE:
s/^ *//;
printf("<<LIST%d>>%s", 0+@stack, $_);
}
改变
/^( *)$style/
到
缩进是否重要?Re“一个人只需要跟踪上一个级别来确定某个对象是否是子对象”,这不是真的。它也可能是父对象(例如第10、11和12行)。你需要跟踪所有打开的关卡,以确定某个东西是否为儿童。是的,我已经说过他需要跟踪每个关卡的最大值。他的数据到底意味着什么,我不完全确定,但希望这至少会让他走上比多个for循环更好的轨道。你不需要跟踪任何最大值是的。请看我的答案。您确实不需要最大值,我只需要跟踪它们来验证数据。无论如何,感谢您也分享了答案。您能解释一下吗-printf($)(0+@stack,$);哦,应该是printf(“%s”,0+@stack,$);
.Fixed.0+@stack
在标量上下文中计算@stack
,结果是@stack
中的元素数。其余的是对printf
的简单调用。
use strict;
use warnings;
my @styles = (
'\(\d+\)', '\d+\)', '\d+\.',
'\([a-h]\)', '[a-h]\)', '\([A-H]\)', '[A-H]\)',
'\([IVX]+\)', '[IVX]+\)', '\([ivx]+\)', '[ivx]+\)',
'-',
);
my @stack;
while (<>) {
for my $i (reverse 0..$#stack) {
if (/$stack[$i]/) {
splice(@stack, $i+1);
goto DONE_LINE;
}
}
for my $style (@styles) {
if (my ($spaces) = /^( *)$style/) {
push @stack, qr/^$spaces$style/;
goto DONE_LINE;
}
}
die "Unrecognized format at line $. - $_";
DONE_LINE:
s/^ *//;
printf("<<LIST%d>>%s", 0+@stack, $_);
}
my %re_cache = map { $_ => qr/^( *)$_/ } @styles;
/^( *)$style/
/$re_cache{$style}/