Perl 确定动态清单项目

Perl 确定动态清单项目,perl,list,search,identify,Perl,List,Search,Identify,我正在尝试识别列表数据代码是: my $listdata = ' List Items: (1)LIST 1 data (a)sub data (b)sub data (c)sub data (d)sub data (i)sub-sub data (ii)sub-sub data (A)sub-sub-sub data (B)sub-sub-sub data (iii)sub-sub data (e)sub data (2)LI

我正在尝试识别列表数据代码是:

my $listdata = '
List Items:     
(1)LIST 1 data 
(a)sub data
(b)sub data
(c)sub data
(d)sub data
    (i)sub-sub data
    (ii)sub-sub data
        (A)sub-sub-sub data
        (B)sub-sub-sub data
    (iii)sub-sub data
(e)sub data
(2)LIST 2 data 
(3)LIST 3 data 
';

    #print "\n\n\n$listdata\n\n";

    ###Array of multi-level patterns 
    my @level_check =('\(\d+\)','(?<!\()\d+\)','\([a-h]\)','(?<!\()[a-h]\)','\([A-H]\)','(?<!\()[A-H]\)','\d+\.',
                      '\([IVX]+\)','(?<!\()[IVX]+\)','\([ivx]+\)','(?<!\()[ivx]+\)','\-');

    ###pattern for each levels
    my ($first_level,$second_level,$third_level,$fourth_level);

    ###First from each pattern
    my ($first_occur,$second_occur,$third_occur,$fourth_occur);

    #++++++++++++++++++++++++Pattern for multilevel list+++++++++++++++++++++++#
    my $pattern = '((?:[IVX\-\(\)\d\.\-][a-z]?\)?)+)';

    $listdata =~ s{$pattern}{
        my ($leveltemp) = ($1);
        $first_occur = $leveltemp if !$first_occur;

        #print "$data";
        #print "all_level: $leveltemp##\n";

        #########First Level Start
        for($i=0; $i<scalar(@level_check);$i++){
            if($first_occur =~ /^$level_check[$i]$/){
                $first_level = $level_check[$i] if !$first_level;
                #print "$level_check[$i] =>is Ist: $first_level\n";
            }
        }

        for($i=0; $i<scalar(@level_check);$i++){

            if($leveltemp =~ /^$first_level$/){
                $leveltemp =~ s{$pattern}{<<LIST1>>$2$3};
                #print"**$data level matched: $leveltemp => $first_level\n";
                ############First Level End
            }
            else
            {
                ######Second level Start
                if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([a-h]{3,})/i){
                    $second_occur = $leveltemp if !$second_occur;
                    #print "$leveltemp :$second_occur\n";

                    for($i=0; $i<scalar(@level_check);$i++){
                        if($second_occur =~ /^$level_check[$i]$/){
                        $second_level = $level_check[$i] if !$second_level;
                        #print "$leveltemp =>is IInd: $second_level\n";
                        }
                    }

                    if($leveltemp =~ /^$second_level/){
                        $leveltemp =~ s{$pattern}{<<LIST2>>$2$3};
                        #print"**level matched: $leveltemp => $seconf_level\n";
                        ######Second level End
                    }
                    else
                    {
                        ########Third Level Start   
                        if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([A-h]{3,})/i){
                            $third_occur = $leveltemp if !$third_occur;

                            for($i=0; $i<scalar(@level_check);$i++){
                                if($third_occur =~ /^$level_check[$i]$/){
                                    $third_level = $level_check[$i] if !$third_level;
                                    #print "$leveltemp =>is IIIrd: $third_level\n";
                                }
                            }

                            if($leveltemp =~ /^$third_level/){
                                $leveltemp =~ s{$pattern}{<<LIST3>>$2$3};
                                #print"**level matched: $leveltemp => $third_level\n";
                            #########Third Level End
                            }
                            else
                            {
                                ########Fourth Level Start  
                                if($leveltemp !~ /^(?:<<LIST+>>|\d{3,}\,?|\([A-z]{3,})/i){

                                    $fourth_occur = $leveltemp if !$fourth_occur;
                                        #print "$leveltemp :$fourth_occur\n";
                                    for($i=0; $i<scalar(@level_check);$i++){
                                        if($fourth_occur =~ /^$level_check[$i]$/){
                                            $fourth_level = $level_check[$i] if !$fourth_level;
                                            #print "$leveltemp =>is IVrth: $fourth_level\n";
                                        }
                                    }

                                    if($leveltemp =~ /^$fourth_level/){
                                        $leveltemp =~ s{$pattern}{<<LIST4>>$2$3};
                                        #print"**$fourth_occur  level matched: $leveltemp => $fourth_level\n";
                                        #########Fourth Level End
                                    }
                                    #######Add Next Levels Here If Any in else loop


                                }
                            }#IV lvl else loop end
                        }   
                    }#III lvl else loop end
                }
            }#IInd lvl else loop end

        }#Ist lvl for loop end

        "$leveltemp"
    }gsixe;

print "$listdata\n";
my$listdata='1〕
清单项目:
(1) 清单1数据
(a) 子数据
(b) 子数据
(c) 子数据
(d) 子数据
(i) 子数据
(二)分项数据
(A) 子数据
(B) 子数据
(三)分项数据
(e) 子数据
(2) 清单2数据
(3) 清单3数据
';
#打印“\n\n\n$listdata\n\n”;
###多级模式数组

我的@level\u check=(“\(\d+\)”,“(?尝试逐行处理。以下内容确定了每个列表项所处的级别。只需跟踪上一级别,即可确定某个内容是否为子项,以及上一级别的最大值,以验证其顺序是否正确:

use strict;
use warnings;


###Array of multi-level patterns 
my @level_check = (
    '\(\d+\)',
    '(?<!\()\d+\)',
    '\([a-h]\)',
    '(?<!\()[a-h]\)',
    '\([A-H]\)',
    '(?<!\()[A-H]\)',
    '\d+\.',
    '\([IVX]+\)',
    '(?<!\()[IVX]+\)',
    '\([ivx]+\)',
    '(?<!\()[ivx]+\)',
    '\-',
);

while (<DATA>) {
    chomp(my $line = $_);

    my $match = 0;
    for my $i (0..$#level_check) {
        if ($line =~ /^\s*$level_check[$i]/) {
            $match = $i + 1;
            last;
        }
    }

    if ($match) {
        print "Level $match - $line\n";
    } else {
        print "No Match - $line\n";
    }
}

1;

__END__
(1)LIST 1 data 
(a)sub data
(b)sub data
(c)sub data
(d)sub data
    (i)sub-sub data
    (ii)sub-sub data
        (A)sub-sub-sub data
        (B)sub-sub-sub data
    (iii)sub-sub data
(e)sub data
(2)LIST 2 data 
(3)LIST 3 data 

使用堆栈跟踪“打开”样式,以确定新样式是子样式还是父样式

use strict;
use warnings;

my @styles = (
    '\(\d+\)',     '\d+\)',     '\d+\.',
    '\([a-h]\)',   '[a-h]\)',   '\([A-H]\)',   '[A-H]\)',
    '\([IVX]+\)',  '[IVX]+\)',  '\([ivx]+\)',  '[ivx]+\)',
    '-',
);

my @stack;
while (<>) {
   for my $i (reverse 0..$#stack) {
      if (/$stack[$i]/) {
         splice(@stack, $i+1);
         goto DONE_LINE;
      }
   }

   for my $style (@styles) {
      if (my ($spaces) = /^( *)$style/) {
         push @stack, qr/^$spaces$style/;
         goto DONE_LINE;
      }
   }

   die "Unrecognized format at line $. - $_";

DONE_LINE:
   s/^ *//;
   printf("<<LIST%d>>%s", 0+@stack, $_);
}
改变

/^( *)$style/


缩进是否重要?Re“一个人只需要跟踪上一个级别来确定某个对象是否是子对象”,这不是真的。它也可能是父对象(例如第10、11和12行)。你需要跟踪所有打开的关卡,以确定某个东西是否为儿童。是的,我已经说过他需要跟踪每个关卡的最大值。他的数据到底意味着什么,我不完全确定,但希望这至少会让他走上比多个for循环更好的轨道。你不需要跟踪任何最大值是的。请看我的答案。您确实不需要最大值,我只需要跟踪它们来验证数据。无论如何,感谢您也分享了答案。您能解释一下吗-printf($)(0+@stack,$);哦,应该是
printf(“%s”,0+@stack,$);
.Fixed.
0+@stack
在标量上下文中计算
@stack
,结果是
@stack
中的元素数。其余的是对
printf
的简单调用。
use strict;
use warnings;

my @styles = (
    '\(\d+\)',     '\d+\)',     '\d+\.',
    '\([a-h]\)',   '[a-h]\)',   '\([A-H]\)',   '[A-H]\)',
    '\([IVX]+\)',  '[IVX]+\)',  '\([ivx]+\)',  '[ivx]+\)',
    '-',
);

my @stack;
while (<>) {
   for my $i (reverse 0..$#stack) {
      if (/$stack[$i]/) {
         splice(@stack, $i+1);
         goto DONE_LINE;
      }
   }

   for my $style (@styles) {
      if (my ($spaces) = /^( *)$style/) {
         push @stack, qr/^$spaces$style/;
         goto DONE_LINE;
      }
   }

   die "Unrecognized format at line $. - $_";

DONE_LINE:
   s/^ *//;
   printf("<<LIST%d>>%s", 0+@stack, $_);
}
my %re_cache = map { $_ => qr/^( *)$_/ } @styles;
/^( *)$style/
/$re_cache{$style}/