从xml文件中提取数据
我有一个包含数千个条目的xml文件,如:从xml文件中提取数据,xml,perl,sed,awk,text-processing,Xml,Perl,Sed,Awk,Text Processing,我有一个包含数千个条目的xml文件,如: <gml:featureMember> <Feature> <featureType>JCSOutput</featureType> <property name="gml2_coordsys"></property> <gml:PointProperty> <gml:Point> <g
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
<gml:Point>
<gml:coordinates>4048313.294966287,5374397.792158723 </gml:coordinates>
</gml:Point>
</gml:PointProperty>
<property name="BEZEICHNUN">Anton-Bosch-Gasse</property>
<property name="WL_NUMMER">68</property>
</Feature>
</gml:featureMember>
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
<gml:Point>
<gml:coordinates>4044355.0231338665,5365146.95116724 </gml:coordinates>
</gml:Point>
</gml:PointProperty>
<property name="BEZEICHNUN">Anschützgasse</property>
<property name="WL_NUMMER">67</property>
</Feature>
</gml:featureMember>
JCSOutput
4048313.294966287,5374397.792158723
安东·博什·加斯
68
JCSOutput
4044355.0231338665,5365146.95116724
安斯库茨加斯
67
脚本应搜索列表中给定的名称(例如安东·博什·加斯),并将以
开头的整个段落复制到新文件中
为此,您将使用什么工具—awk、sed、perl?sed和awk不是解析XML的正确工具。接触Perl:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $search = 'Anton-Bosch-Gasse';
# Put your real values here!
my $file = '1.xml';
my $uri = 'http://1.2.3';
my $xpc = XML::LibXML::XPathContext->new;
$xpc->registerNs('gml', $uri);
my $xml = XML::LibXML->load_xml(location => $file);
my $r = $xml->find("//property[.='$search']/ancestor::gml:featureMember");
print $_->serialize for @$r;
或者,如果您发现上述示例过于冗长,可以使用:
Sed和awk不是解析XML的合适工具。接触Perl:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
my $search = 'Anton-Bosch-Gasse';
# Put your real values here!
my $file = '1.xml';
my $uri = 'http://1.2.3';
my $xpc = XML::LibXML::XPathContext->new;
$xpc->registerNs('gml', $uri);
my $xml = XML::LibXML->load_xml(location => $file);
my $r = $xml->find("//property[.='$search']/ancestor::gml:featureMember");
print $_->serialize for @$r;
或者,如果您发现上述示例过于冗长,可以使用:
使用随附的
$ xml_grep --root 'gml:featureMember' \
--cond 'property[string()="Anton-Bosch-Gasse"]' \
to_grep.xml > extract.xml
$xml\u grep--根“gml:featureMember”\
--cond'属性[字符串()=“Anton Bosch Gasse”]”\
要使用附带的_grep.xml>extract.xml,您可以编写
$ xml_grep --root 'gml:featureMember' \
--cond 'property[string()="Anton-Bosch-Gasse"]' \
to_grep.xml > extract.xml
$xml\u grep--根“gml:featureMember”\
--cond'属性[字符串()=“Anton Bosch Gasse”]”\
为了_grep.xml>extract.xml,这里有一个类似choroba的解决方案,但是使用了这个套件。它的模块使用css3选择器而不是xpath遍历XML 在这里,我首先找到所有
gml:featureMember
元素,然后提取第一个具有匹配后代的元素
#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::DOM;
use Mojo::Util qw/slurp spurt/;
my $dom = Mojo::DOM->new->xml(1);
# read in from file
# $dom->parse( slurp 'myfile.xml' );
# but for the demo ...
$dom->parse(do{ local $/; <DATA> });
my $found =
$dom->find('gml\:featureMember')
->first(sub{
$_->find('property[name="BEZEICHNUN"]')
->first( qr/\QAnton-Bosch-Gasse/ )
});
spurt "$found", 'output.xml';
__DATA__
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
<gml:Point>
<gml:coordinates>4048313.294966287,5374397.792158723 </gml:coordinates>
</gml:Point>
</gml:PointProperty>
<property name="BEZEICHNUN">Anton-Bosch-Gasse</property>
<property name="WL_NUMMER">68</property>
</Feature>
</gml:featureMember>
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
<gml:Point>
<gml:coordinates>4044355.0231338665,5365146.95116724 </gml:coordinates>
</gml:Point>
</gml:PointProperty>
<property name="BEZEICHNUN">Anschützgasse</property>
<property name="WL_NUMMER">67</property>
</Feature>
</gml:featureMember>
这里有一个类似choroba的解决方案,但使用的是套件。它的模块使用css3选择器而不是xpath遍历XML 在这里,我首先找到所有
gml:featureMember
元素,然后提取第一个具有匹配后代的元素
#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::DOM;
use Mojo::Util qw/slurp spurt/;
my $dom = Mojo::DOM->new->xml(1);
# read in from file
# $dom->parse( slurp 'myfile.xml' );
# but for the demo ...
$dom->parse(do{ local $/; <DATA> });
my $found =
$dom->find('gml\:featureMember')
->first(sub{
$_->find('property[name="BEZEICHNUN"]')
->first( qr/\QAnton-Bosch-Gasse/ )
});
spurt "$found", 'output.xml';
__DATA__
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
<gml:Point>
<gml:coordinates>4048313.294966287,5374397.792158723 </gml:coordinates>
</gml:Point>
</gml:PointProperty>
<property name="BEZEICHNUN">Anton-Bosch-Gasse</property>
<property name="WL_NUMMER">68</property>
</Feature>
</gml:featureMember>
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
<gml:Point>
<gml:coordinates>4044355.0231338665,5365146.95116724 </gml:coordinates>
</gml:Point>
</gml:PointProperty>
<property name="BEZEICHNUN">Anschützgasse</property>
<property name="WL_NUMMER">67</property>
</Feature>
</gml:featureMember>
非常感谢,我使用了你的解决方案,它对我来说是最容易实现的:-)非常感谢,我使用了你的解决方案,它对我来说是最容易实现的:-)