如何使用python解析具有深层结构的xml字符串
这里有人问了一个类似的问题(),但我无法找到我感兴趣的内容 如果如何使用python解析具有深层结构的xml字符串,python,xml,Python,Xml,这里有人问了一个类似的问题(),但我无法找到我感兴趣的内容 如果分类方案标签值为CPC,我需要提取标签专利分类之间包含的所有信息。有多个这样的元素,它们被封装在专利分类标签中 在下面给出的示例中,有三个这样的值:c07 k16 22 I,a61 k2039 505 A和c07 k2317 21 A <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/3.0/style/exc
分类方案
标签值为CPC
,我需要提取标签专利分类
之间包含的所有信息。有多个这样的元素,它们被封装在专利分类
标签中
在下面给出的示例中,有三个这样的值:c07 k16 22 I
,a61 k2039 505 A
和c07 k2317 21 A
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/3.0/style/exchange.xsl"?>
<ops:world-patent-data xmlns="http://www.epo.org/exchange" xmlns:ops="http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">
<ops:meta name="elapsed-time" value="21"/>
<exchange-documents>
<exchange-document system="ops.epo.org" family-id="39103486" country="US" doc-number="2009234106" kind="A1">
<bibliographic-data>
<publication-reference>
<document-id document-id-type="docdb">
<country>US</country>
<doc-number>2009234106</doc-number>
<kind>A1</kind>
<date>20090917</date>
</document-id>
<document-id document-id-type="epodoc">
<doc-number>US2009234106</doc-number>
<date>20090917</date>
</document-id>
</publication-reference>
<classifications-ipcr>
<classification-ipcr sequence="1">
<text>C07K 16/ 44 A I </text>
</classification-ipcr>
</classifications-ipcr>
<patent-classifications>
<patent-classification sequence="1">
<classification-scheme office="" scheme="CPC"/>
<section>C</section>
<class>07</class>
<subclass>K</subclass>
<main-group>16</main-group>
<subgroup>22</subgroup>
<classification-value>I</classification-value>
</patent-classification>
<patent-classification sequence="2">
<classification-scheme office="" scheme="CPC"/>
<section>A</section>
<class>61</class>
<subclass>K</subclass>
<main-group>2039</main-group>
<subgroup>505</subgroup>
<classification-value>A</classification-value>
</patent-classification>
<patent-classification sequence="7">
<classification-scheme office="" scheme="CPC"/>
<section>C</section>
<class>07</class>
<subclass>K</subclass>
<main-group>2317</main-group>
<subgroup>92</subgroup>
<classification-value>A</classification-value>
</patent-classification>
<patent-classification sequence="1">
<classification-scheme office="US" scheme="UC"/>
<classification-symbol>530/387.9</classification-symbol>
</patent-classification>
</patent-classifications>
</bibliographic-data>
</exchange-document>
</exchange-documents>
</ops:world-patent-data>
美国
2009234106
A1
20090917
US2009234106
20090917
C07K 16/44 A I
C
07
K
16
22
我
A.
61
K
2039
505
A.
C
07
K
2317
92
A.
530/387.9
如果没有,请安装BeautifulSoup:
$easy\u安装BeautifulSoup4
试试这个:
from bs4 import BeautifulSoup
xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)
# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
if pa.find('classification-scheme', {'scheme': 'CPC'} ):
print pa.getText()
如果没有,请安装BeautifulSoup:
$easy\u安装BeautifulSoup4
试试这个:
from bs4 import BeautifulSoup
xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)
# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
if pa.find('classification-scheme', {'scheme': 'CPC'} ):
print pa.getText()
如果没有,请安装BeautifulSoup:
$easy\u安装BeautifulSoup4
试试这个:
from bs4 import BeautifulSoup
xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)
# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
if pa.find('classification-scheme', {'scheme': 'CPC'} ):
print pa.getText()
如果没有,请安装BeautifulSoup:
$easy\u安装BeautifulSoup4
试试这个:
from bs4 import BeautifulSoup
xml = open('example.xml', 'rb').read()
bs = BeautifulSoup(xml)
# find patent-classification
patents = bs.findAll('patent-classification')
# filter the ones with CPC
for pa in patents:
if pa.find('classification-scheme', {'scheme': 'CPC'} ):
print pa.getText()
您可以使用python
xml
标准模块:
import xml.etree.ElementTree as ET
root = ET.parse('a.xml').getroot()
for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
data = []
for d in node.getchildren():
if d.text:
data.append(d.text)
print ' '.join(data)
您可以使用python
xml
标准模块:
import xml.etree.ElementTree as ET
root = ET.parse('a.xml').getroot()
for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
data = []
for d in node.getchildren():
if d.text:
data.append(d.text)
print ' '.join(data)
您可以使用python
xml
标准模块:
import xml.etree.ElementTree as ET
root = ET.parse('a.xml').getroot()
for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
data = []
for d in node.getchildren():
if d.text:
data.append(d.text)
print ' '.join(data)
您可以使用python
xml
标准模块:
import xml.etree.ElementTree as ET
root = ET.parse('a.xml').getroot()
for node in root.iterfind(".//{http://www.epo.org/exchange}classification-scheme[@scheme='CPC']/.."):
data = []
for d in node.getchildren():
if d.text:
data.append(d.text)
print ' '.join(data)
谢谢,但是
xml
在哪里被用作变量呢?xml变量就是加载xml的地方。实际上,要尝试使用准确的代码,请创建一个文件名example.xml
,并在其中写入您在问题上发布的内容,然后我编辑了我的答案,因为我遗漏了一行。Thanks@user1140126再次检查答案,我更新了它。我遗漏了一行,谢谢,但是xml
在哪里被用作变量?xml变量就是加载xml的地方。实际上,要尝试使用准确的代码,请创建一个文件名example.xml
,并在其中写入您在问题上发布的内容,然后我编辑了我的答案,因为我遗漏了一行。Thanks@user1140126再次检查答案,我更新了它。我遗漏了一行,谢谢,但是xml
在哪里被用作变量?xml变量就是加载xml的地方。实际上,要尝试使用准确的代码,请创建一个文件名example.xml
,并在其中写入您在问题上发布的内容,然后我编辑了我的答案,因为我遗漏了一行。Thanks@user1140126再次检查答案,我更新了它。我遗漏了一行,谢谢,但是xml
在哪里被用作变量?xml变量就是加载xml的地方。实际上,要尝试使用准确的代码,请创建一个文件名example.xml
,并在其中写入您在问题上发布的内容,然后我编辑了我的答案,因为我遗漏了一行。Thanks@user1140126再次检查答案,我更新了它。我漏了一行