尝试使用XML::LibXML模块拆分XML文件时出错
我一直在尝试使用该模块分割XML数据,但它会引发如下错误尝试使用XML::LibXML模块拆分XML文件时出错,xml,perl,xpath,xml-libxml,Xml,Perl,Xpath,Xml Libxml,我一直在尝试使用该模块分割XML数据,但它会引发如下错误 Can't call method "findnodes" without a package or object reference <xml> <bhap id="2"> <label>cylind – II</label> <title>AUTHORITIES AND ITS EMPLOYEES</title> <rect i
Can't call method "findnodes" without a package or object reference
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
我的意见
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S1">
<title>Short</title>
<label>1.</label>
<p><text>welcome</text></p>
</rect>
<rect id="S2">
<title>Definite</title>
<label>2.</label>
<p><text>welcome1</text></p>
</rect>
</bhap>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
<rect id=S4">
<title>Term</title>
<label>4.</label>
<p><text>welcome4</text></p>
</rect>
</bhap>
</xml>
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
cylind-I
总理
短
1.
欢迎光临
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
一定的
2.
欢迎1
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
赛林德-II
当局及其雇员
诺蒂—;
3.
欢迎3
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
学期
4.
欢迎4
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
我的代码
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
use XML::LibXML;
my $file = shift || die "usage $0 <xmlfile>";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my @nodes = $doc->findnodes('//bhap');
foreach my $node1 (@nodes) {
my $bhap = $node1->toString(), "\n";
if ( $bhap =~ m/(<bhap.+?>.+?<\/title>)(.+?)(<\/bhap>)/is ) {
my $bhap1 = $1;
my $bhap2 = $2;
my $bhap3 = $3;
my $nodes1 = $bhap->findnodes('//rect');
foreach my $node (@$nodes1) {
my $rect = $node->toString();
if ( $rect =~ m/(<rect\s*id="(.+?)">.+?<\/rect>)/is ) {
my $var1 = $1;
my $var2 = $2;
print "file" $var2;
print "<xml>" print $bhap1;
print $var1;
print $bhap3;
print "</xml>";
}
}
}
}
使用XML::LibXML;
my$file=shift | | die“用法$0”;
my$parser=XML::LibXML->new();
my$doc=$parser->parse_文件($file);
my@nodes=$doc->findnodes('//bhap');
foreach my$node1(@nodes){
我的$bhap=$node1->toString(),“\n”;
如果($bhap=~m/(.+?)(.+?)()/is){
我的$bhap1=1美元;
我的$bhap2=2美元;
我的$bhap3=3美元;
my$nodes1=$bhap->findnodes('//rect');
foreach my$node(@$nodes1){
my$rect=$node->toString();
如果($rect=~m/(.+?)/is){
my$var1=$1;
my$var2=$2;
打印“文件”$var2;
“打印”打印$bhap1;
打印$var1;
打印$bhap3;
打印“”;
}
}
}
}
好的,那么你开始的时候很好,但是。。。落入“正则表达式”陷阱。使用正则表达式解析XML不是一件好事,因为它太复杂了——做好它——您需要处理/验证标记嵌套、换行以及各种基本上使正则表达式成为脆弱代码的事情。所以请不要
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
但最重要的是,在发布查询之前,始终使用严格
和警告。这是您进行故障排除的第一个调用端口
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
如果你这样做了,你会看到如下情况:
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
print "file" $var2;
那根本不起作用。还有很多其他的东西不能在“你的代码”中正常工作,所以这才是真正的起点
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
另外-您的XML无效-我认为您的“S4”缺少引号
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
无论如何,假设这只是一个输入错误,我会从开始(因为我比LibXML更了解它,而不是任何特定的原因),然后做如下事情:
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my %children_of;
#as we process, extract all the 'rect' elements - along with a reference to their context.
sub process_rect {
my ( $twig, $rect ) = @_;
push( @{ $children_of{ $rect->parent } }, $rect->cut );
}
my $twig = XML::Twig->new(
'pretty_print' => 'indented',
'twig_handlers' => { 'rect' => \&process_rect },
);
$twig->parse( \*DATA );
#run through all the 'bhap' elements.
foreach my $bhap ( $twig->root->children('bhap') ) {
#find the rect elements under this bhap.
foreach my $rect ( @{ $children_of{$bhap} } ) {
#create a new XML document - copy the 'root' name from your original document.
my $xml = XML::Twig::Elt->new( $twig -> root -> name );
#duplicate this 'bhap' element by copying it, rather than cutting it,
#so we can paste it more than once (e.g. per 'rect')
my $subset = $bhap->copy;
#insert the 'bhap' into our new xml.
$subset->paste( last_child => $xml );
#insert our cut rect beneath this bhap.
$rect->paste( last_child => $subset );
#print the resulting XML.
print "--\n";
$xml->print;
}
}
__DATA__
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S1">
<title>Short</title>
<label>1.</label>
<p><text>welcome</text></p>
</rect>
<rect id="S2">
<title>Definite</title>
<label>2.</label>
<p><text>welcome1</text></p>
</rect>
</bhap>
<bhap id="2">
<label>cylind - II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
<rect id="S4">
<title>Term</title>
<label>4.</label>
<p><text>welcome4</text></p>
</rect></bhap>
</xml>
这看起来至少相当接近你想要生产的产品。我跳过了读入文件和打印内容,因为重构XML是最困难的部分
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
我还建议您看看XML::Twig
中提供了哪些功能,因为这可能正是您想要的 xml_split是否是一个选项:分配给$bhap
等,然后从$bhap
读取。使用使用警告;严格使用代码>捕获这类内容。my$nodes1=$bhap->findnodes('//rect')
你在这里用一个字符串调用findnodes
。我有一个脚本已经运行了一年多,刚刚开始抛出这个脚本。机器的perl或包安装发生了什么变化?令人厌烦。我相信这都是很好的建议,但问题是错误“在没有包或对象引用的情况下无法调用方法”findnodes“,而您对此一无所知。
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>