从xml文件中提取数据

从xml文件中提取数据,xml,perl,sed,awk,text-processing,Xml,Perl,Sed,Awk,Text Processing,我有一个包含数千个条目的xml文件,如: <gml:featureMember> <Feature> <featureType>JCSOutput</featureType> <property name="gml2_coordsys"></property> <gml:PointProperty> <gml:Point> <g

我有一个包含数千个条目的xml文件,如:

<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
                <gml:Point>
                  <gml:coordinates>4048313.294966287,5374397.792158723 </gml:coordinates>
                </gml:Point>

</gml:PointProperty>
<property name="BEZEICHNUN">Anton-Bosch-Gasse</property>
<property name="WL_NUMMER">68</property>
</Feature>
</gml:featureMember>


<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
                <gml:Point>
                  <gml:coordinates>4044355.0231338665,5365146.95116724 </gml:coordinates>
                </gml:Point>

</gml:PointProperty>
<property name="BEZEICHNUN">Anschützgasse</property>
<property name="WL_NUMMER">67</property>
</Feature>
</gml:featureMember>

JCSOutput
4048313.294966287,5374397.792158723 
安东·博什·加斯
68
JCSOutput
4044355.0231338665,5365146.95116724 
安斯库茨加斯
67
脚本应搜索列表中给定的名称(例如安东·博什·加斯),并将以
开头的整个段落复制到新文件中


为此,您将使用什么工具—awk、sed、perl?

sed和awk不是解析XML的正确工具。接触Perl:

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $search = 'Anton-Bosch-Gasse';

# Put your real values here!
my $file = '1.xml';
my $uri  = 'http://1.2.3';

my $xpc = XML::LibXML::XPathContext->new;
$xpc->registerNs('gml', $uri);

my $xml = XML::LibXML->load_xml(location => $file);
my $r = $xml->find("//property[.='$search']/ancestor::gml:featureMember");
print $_->serialize for @$r;
或者,如果您发现上述示例过于冗长,可以使用:


Sed和awk不是解析XML的合适工具。接触Perl:

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $search = 'Anton-Bosch-Gasse';

# Put your real values here!
my $file = '1.xml';
my $uri  = 'http://1.2.3';

my $xpc = XML::LibXML::XPathContext->new;
$xpc->registerNs('gml', $uri);

my $xml = XML::LibXML->load_xml(location => $file);
my $r = $xml->find("//property[.='$search']/ancestor::gml:featureMember");
print $_->serialize for @$r;
或者,如果您发现上述示例过于冗长,可以使用:

使用随附的

$ xml_grep --root 'gml:featureMember' \ --cond 'property[string()="Anton-Bosch-Gasse"]' \ to_grep.xml > extract.xml $xml\u grep--根“gml:featureMember”\ --cond'属性[字符串()=“Anton Bosch Gasse”]”\ 要使用附带的_grep.xml>extract.xml,您可以编写

$ xml_grep --root 'gml:featureMember' \ --cond 'property[string()="Anton-Bosch-Gasse"]' \ to_grep.xml > extract.xml $xml\u grep--根“gml:featureMember”\ --cond'属性[字符串()=“Anton Bosch Gasse”]”\
为了_grep.xml>extract.xml,这里有一个类似choroba的解决方案,但是使用了这个套件。它的模块使用css3选择器而不是xpath遍历XML

在这里,我首先找到所有
gml:featureMember
元素,然后提取第一个具有匹配后代的元素

#!/usr/bin/env perl

use strict;
use warnings;

use Mojo::DOM;
use Mojo::Util qw/slurp spurt/;

my $dom = Mojo::DOM->new->xml(1);

# read in from file
# $dom->parse( slurp 'myfile.xml' );
# but for the demo ...
$dom->parse(do{ local $/; <DATA> });

my $found = 
  $dom->find('gml\:featureMember')
      ->first(sub{ 
        $_->find('property[name="BEZEICHNUN"]')
          ->first( qr/\QAnton-Bosch-Gasse/ )
      });

spurt "$found", 'output.xml';


__DATA__
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
                <gml:Point>
                  <gml:coordinates>4048313.294966287,5374397.792158723 </gml:coordinates>
                </gml:Point>

</gml:PointProperty>
<property name="BEZEICHNUN">Anton-Bosch-Gasse</property>
<property name="WL_NUMMER">68</property>
</Feature>
</gml:featureMember>


<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
                <gml:Point>
                  <gml:coordinates>4044355.0231338665,5365146.95116724 </gml:coordinates>
                </gml:Point>

</gml:PointProperty>
<property name="BEZEICHNUN">Anschützgasse</property>
<property name="WL_NUMMER">67</property>
</Feature>
</gml:featureMember>

这里有一个类似choroba的解决方案,但使用的是套件。它的模块使用css3选择器而不是xpath遍历XML

在这里,我首先找到所有
gml:featureMember
元素,然后提取第一个具有匹配后代的元素

#!/usr/bin/env perl

use strict;
use warnings;

use Mojo::DOM;
use Mojo::Util qw/slurp spurt/;

my $dom = Mojo::DOM->new->xml(1);

# read in from file
# $dom->parse( slurp 'myfile.xml' );
# but for the demo ...
$dom->parse(do{ local $/; <DATA> });

my $found = 
  $dom->find('gml\:featureMember')
      ->first(sub{ 
        $_->find('property[name="BEZEICHNUN"]')
          ->first( qr/\QAnton-Bosch-Gasse/ )
      });

spurt "$found", 'output.xml';


__DATA__
<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
                <gml:Point>
                  <gml:coordinates>4048313.294966287,5374397.792158723 </gml:coordinates>
                </gml:Point>

</gml:PointProperty>
<property name="BEZEICHNUN">Anton-Bosch-Gasse</property>
<property name="WL_NUMMER">68</property>
</Feature>
</gml:featureMember>


<gml:featureMember>
<Feature>
<featureType>JCSOutput</featureType>
<property name="gml2_coordsys"></property>
<gml:PointProperty>
                <gml:Point>
                  <gml:coordinates>4044355.0231338665,5365146.95116724 </gml:coordinates>
                </gml:Point>

</gml:PointProperty>
<property name="BEZEICHNUN">Anschützgasse</property>
<property name="WL_NUMMER">67</property>
</Feature>
</gml:featureMember>

非常感谢,我使用了你的解决方案,它对我来说是最容易实现的:-)非常感谢,我使用了你的解决方案,它对我来说是最容易实现的:-)