如何使用python解析具有深层结构的xml字符串_Python_Xml

如何使用python解析具有深层结构的xml字符串

python xml

如何使用python解析具有深层结构的xml字符串,python,xml,Python,Xml,这里有人问了一个类似的问题（），但我无法找到我感兴趣的内容如果分类方案标签值为CPC，我需要提取标签专利分类之间包含的所有信息。有多个这样的元素，它们被封装在专利分类标签中在下面给出的示例中，有三个这样的值：c07 k16 22 I，a61 k2039 505 A和c07 k2317 21 A <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/3.0/style/exc

这里有人问了一个类似的问题（），但我无法找到我感兴趣的内容

如果

分类方案

标签值为

CPC

，我需要提取标签

专利分类

之间包含的所有信息。有多个这样的元素，它们被封装在

专利分类

标签中

在下面给出的示例中，有三个这样的值：

c07 k16 22 I

，

a61 k2039 505 A

和

c07 k2317 21 A

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/3.0/style/exchange.xsl"?>
<ops:world-patent-data xmlns="http://www.epo.org/exchange" xmlns:ops="http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">
    <ops:meta name="elapsed-time" value="21"/>
    <exchange-documents>
        <exchange-document system="ops.epo.org" family-id="39103486" country="US" doc-number="2009234106" kind="A1">
            <bibliographic-data>
                <publication-reference>
                    <document-id document-id-type="docdb">
                        <country>US</country>
                        <doc-number>2009234106</doc-number>
                        <kind>A1</kind>
                        <date>20090917</date>
                    </document-id>
                    <document-id document-id-type="epodoc">
                        <doc-number>US2009234106</doc-number>
                        <date>20090917</date>
                    </document-id>
                </publication-reference>
                <classifications-ipcr>
                    <classification-ipcr sequence="1">
                        <text>C07K  16/    44            A I                    </text>
                    </classification-ipcr>
                </classifications-ipcr>
                <patent-classifications>
                    <patent-classification sequence="1">
                        <classification-scheme office="" scheme="CPC"/>
                        <section>C</section>
                        <class>07</class>
                        <subclass>K</subclass>
                        <main-group>16</main-group>
                        <subgroup>22</subgroup>
                        <classification-value>I</classification-value>
                    </patent-classification>
                    <patent-classification sequence="2">
                        <classification-scheme office="" scheme="CPC"/>
                        <section>A</section>
                        <class>61</class>
                        <subclass>K</subclass>
                        <main-group>2039</main-group>
                        <subgroup>505</subgroup>
                        <classification-value>A</classification-value>
                    </patent-classification>
                    <patent-classification sequence="7">
                        <classification-scheme office="" scheme="CPC"/>
                        <section>C</section>
                        <class>07</class>
                        <subclass>K</subclass>
                        <main-group>2317</main-group>
                        <subgroup>92</subgroup>
                        <classification-value>A</classification-value>
                    </patent-classification>
                    <patent-classification sequence="1">
                        <classification-scheme office="US" scheme="UC"/>
                        <classification-symbol>530/387.9</classification-symbol>
                    </patent-classification>
                </patent-classifications>
            </bibliographic-data>
        </exchange-document>
    </exchange-documents>
</ops:world-patent-data>


美国
2009234106
A1
20090917
US2009234106
20090917
C07K 16/44 A I
C
07
K
16
22
我
A.
61
K
2039
505
A.
C
07
K
2317
92
A.
530/387.9

如果没有，请安装BeautifulSoup：

$easy\u安装BeautifulSoup4

试试这个：

from bs4 import BeautifulSoup

xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)

# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
    if pa.find('classification-scheme', {'scheme': 'CPC'} ):
        print pa.getText()

如果没有，请安装BeautifulSoup：

$easy\u安装BeautifulSoup4

试试这个：

from bs4 import BeautifulSoup

xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)

# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
    if pa.find('classification-scheme', {'scheme': 'CPC'} ):
        print pa.getText()

如果没有，请安装BeautifulSoup：

$easy\u安装BeautifulSoup4

试试这个：

from bs4 import BeautifulSoup

xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)

# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
    if pa.find('classification-scheme', {'scheme': 'CPC'} ):
        print pa.getText()

如果没有，请安装BeautifulSoup：

$easy\u安装BeautifulSoup4

试试这个：

from bs4 import BeautifulSoup

xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)

# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
    if pa.find('classification-scheme', {'scheme': 'CPC'} ):
        print pa.getText()

您可以使用python

xml

标准模块：

import xml.etree.ElementTree as ET

root = ET.parse('a.xml').getroot()

for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
    data = []
    for d in node.getchildren():
        if d.text:
            data.append(d.text)
    print ' '.join(data)

您可以使用python

xml

标准模块：

import xml.etree.ElementTree as ET

root = ET.parse('a.xml').getroot()

for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
    data = []
    for d in node.getchildren():
        if d.text:
            data.append(d.text)
    print ' '.join(data)

您可以使用python

xml

标准模块：

import xml.etree.ElementTree as ET

root = ET.parse('a.xml').getroot()

for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
    data = []
    for d in node.getchildren():
        if d.text:
            data.append(d.text)
    print ' '.join(data)

您可以使用python

xml

标准模块：

import xml.etree.ElementTree as ET

root = ET.parse('a.xml').getroot()

for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
    data = []
    for d in node.getchildren():
        if d.text:
            data.append(d.text)
    print ' '.join(data)

谢谢，但是

xml

在哪里被用作变量呢？xml变量就是加载xml的地方。实际上，要尝试使用准确的代码，请创建一个文件名

example.xml

，并在其中写入您在问题上发布的内容，然后我编辑了我的答案，因为我遗漏了一行。Thanks@user1140126再次检查答案，我更新了它。我遗漏了一行，谢谢，但是

xml

在哪里被用作变量？xml变量就是加载xml的地方。实际上，要尝试使用准确的代码，请创建一个文件名

example.xml

xml

在哪里被用作变量？xml变量就是加载xml的地方。实际上，要尝试使用准确的代码，请创建一个文件名

example.xml

xml

在哪里被用作变量？xml变量就是加载xml的地方。实际上，要尝试使用准确的代码，请创建一个文件名

example.xml

，并在其中写入您在问题上发布的内容，然后我编辑了我的答案，因为我遗漏了一行。Thanks@user1140126再次检查答案，我更新了它。我漏了一行