Regex 如何使用perl在xml文件中添加缺少的范围？_Regex_Perl

Regex 如何使用perl在xml文件中添加缺少的范围？

regex perl

Regex 如何使用perl在xml文件中添加缺少的范围？,regex,perl,Regex,Perl,我有xml文件作为输入。在这些xml文件中，有如下标记：一审： <xref ref-type="bibr" rid="perl-ch006-bib080">80</xref>–<xref ref-type="bibr" rid="perl-ch006-bib082">82</xref>

我有xml文件作为输入。在这些xml文件中，有如下标记：

一审：

<xref ref-type="bibr" rid="perl-ch006-bib080"><sup>80</sup></xref><sup>&#x2013;</sup><xref   ref-type="bibr" rid="perl-ch006-bib082"><sup>82</sup></xref>

<xref ref-type="bibr" rid="perl-ch001-bib009"><sup>9</sup></xref><sup>&#x2013;</sup><xref ref-type="bibr" rid="perl-ch001-bib012"><sup>12</sup></xref><sup>,</sup><xref ref-type="bibr" rid="perl-ch001-bib057"><sup>57</sup></xref><sup>&#x2013;</sup><xref ref-type="bibr" rid="perl-ch001-bib059"><sup>59</sup></xref>

80–；82

第二审：

<xref ref-type="bibr" rid="perl-ch006-bib080"><sup>80</sup></xref><sup>&#x2013;</sup><xref   ref-type="bibr" rid="perl-ch006-bib082"><sup>82</sup></xref>

<xref ref-type="bibr" rid="perl-ch001-bib009"><sup>9</sup></xref><sup>&#x2013;</sup><xref ref-type="bibr" rid="perl-ch001-bib012"><sup>12</sup></xref><sup>,</sup><xref ref-type="bibr" rid="perl-ch001-bib057"><sup>57</sup></xref><sup>&#x2013;</sup><xref ref-type="bibr" rid="perl-ch001-bib059"><sup>59</sup></xref>

9–；12,57–；59

在上述两个例子中，数字80和82，其中81缺失，9-12,57-59和–是-（hypen）的实体。我需要复制xml文件的整个数据，并在该特定位置添加缺少的范围

输出应如下所示：例如：（即在下面的模式80 81-82中）

8081–；82

例如第二个例子：（即，在下面的模式9 10 11-12，57 58-59中）

91011–；125758–；59

所有的更改都要在输出文件中完成，这样输入文件就不会受到阻碍

代码：

#/usr/bin/perl
严格使用；
使用化学武器；
使用File:：Basename；
使用文件：：复制；
my$path1=getcwd；
opendir（INP，“$path1\/Input”）；
my@out=grep（/（xml）$/，readdir（INP））；
关闭INP；
foreach我的$final（@out）
{
my$filetobecopied=“Input\/”$final；
my$newfile=$final；
复制（$filetobecopied，$newfile）或“无法复制文件。”；
}
opendir DIR、$path1或die“无法打开DIR”；
my@files=grep/（.*）\（xml）$/，（readdir）；
closedir；
打开（F6，“>Ref.txt”）；
打印F6“文件名\t匹配字符串\t输出\n”；
foreach my$f（@files）
{
打开（F1，“$path1\/输出\/$f”）；
打印出$xml_列表；
收尾
}
foreach my$del（@files）
{
取消链接$del
}

如果您能提供任何帮助，我们将不胜感激。

您的程序已经走得相当远了。主要缺失的只是在正确位置添加缺失的

xref

零件。此添加到

$xml_列表中

可以使用；插入的偏移量可以从阵列中获得。然后，代码的核心就变成了：

#$xml_list=~s/&#x2013;/-/gs;    don't do this (gives trouble when changing back)
#$xml_list=~s/&#x02013;/-/gs;   don't do this (gives trouble when changing back)

while ($xml_list=~/(<xref\ ref-type="(bibr?)"\ rid="(.*?)-ch(\d+)-(bibr?)(\d+)">
                       (<sup>)?(\d+)(<\/sup>)?
                    <\/xref>)<sup>(&\#x0?2013;)+<\/sup>
                   (<xref\ +ref-type="(bibr?)"\ rid="(.*?)-ch(\d+)-bib(\d+)">
                       (<sup>)?(\d+)(<\/sup>)?
                    <\/xref>)
                  /igsx)
{
    my $insert=$+[1];   # end of first (<xref.../xref>) submatch; here we insert
    my ($bibr,$rid,$ch,$bib)=($2,$3,$4,$5.$6);
    my $num=$8;
    my $num1=$17;
    my $endpos = pos $xml_list;
    for (my $counter=$num; ++$counter<$num1; )
    {
        ++$bib;
        my $insertion = "<xref ref-type=\"$bibr\" rid=\"$rid-ch$ch-$bib\">"
                           ."<sup>$counter</sup>"
                       ."</xref>\n";    # insert this into $xml_list at $insert 
        substr $xml_list, $insert, 0, $insertion;
        $insert += length $insertion;   # push start of next insert to the right
        $endpos += length $insertion;   # push start of next search to the right
    }
    pos $xml_list = $endpos;    # set start position of next search
}

#$xml_list=~s/-/&#x2013;/gs;    trouble: would also change normal hyphens

#$xml_list=~s/&x2013/-/gs；不要这样做（换回来时会有麻烦）
#$xml_list=~s/&x02013/-/gs；不要这样做（换回来时会有麻烦）
而（$xml_）列表=~/(
（）？（\d+）（）？
)（&\#x0？2013；）+
(
（）？（\d+）（）？
)
/igsx）
{
my$insert=$+[1]#第一个（）子匹配的结尾；这里我们插入
我的（$bibr、$rid、$ch、$bib）=（$2、$3、$4、$5.$6）；
我的$num=$8；
my$num1=$17；
我的$endpos=pos$xml\u列表；
对于（my$counter=$num；++$countplease，使用XML解析器而不是正则表达式），请参见（对于（X）HTML，它也适用于XML）有没有理由，你仍然保留最后一部分的连字符？听斯特芬说，他是对的。考虑一下，或者谢谢你。我们一定会看看上面的链接。请大家解释一下如何使用和安装模块，因为我是很新的使用模块和所有…@ PATRICKJ。是的，我们需要连字符。（没有这样的理由，但这是要求）.我们是否有可能跳过使用模块并运行上面的代码进行更改？我需要在上面的代码中做哪些更改？有人能帮我完成代码吗？我的口吃很严重…有人能帮我尽快完成上面的代码吗？我试过使用替换，但它不起作用。有新的吗dea将不胜感激。。。
#$xml_list=~s/&#x2013;/-/gs;    don't do this (gives trouble when changing back)
#$xml_list=~s/&#x02013;/-/gs;   don't do this (gives trouble when changing back)

while ($xml_list=~/(<xref\ ref-type="(bibr?)"\ rid="(.*?)-ch(\d+)-(bibr?)(\d+)">
                       (<sup>)?(\d+)(<\/sup>)?
                    <\/xref>)<sup>(&\#x0?2013;)+<\/sup>
                   (<xref\ +ref-type="(bibr?)"\ rid="(.*?)-ch(\d+)-bib(\d+)">
                       (<sup>)?(\d+)(<\/sup>)?
                    <\/xref>)
                  /igsx)
{
    my $insert=$+[1];   # end of first (<xref.../xref>) submatch; here we insert
    my ($bibr,$rid,$ch,$bib)=($2,$3,$4,$5.$6);
    my $num=$8;
    my $num1=$17;
    my $endpos = pos $xml_list;
    for (my $counter=$num; ++$counter<$num1; )
    {
        ++$bib;
        my $insertion = "<xref ref-type=\"$bibr\" rid=\"$rid-ch$ch-$bib\">"
                           ."<sup>$counter</sup>"
                       ."</xref>\n";    # insert this into $xml_list at $insert 
        substr $xml_list, $insert, 0, $insertion;
        $insert += length $insertion;   # push start of next insert to the right
        $endpos += length $insertion;   # push start of next search to the right
    }
    pos $xml_list = $endpos;    # set start position of next search
}

#$xml_list=~s/-/&#x2013;/gs;    trouble: would also change normal hyphens