如何在Python中从XML标记中获取值？_Python_Xml

如何在Python中从XML标记中获取值？

python xml

如何在Python中从XML标记中获取值？,python,xml,Python,Xml,我有如下XML文件 <?xml version="1.0" encoding="UTF-8"?><searching> <query>query01</query> <document id="0"> <title>lord of the rings.</title> <snippet> this is a snippet of a document.

我有如下XML文件

<?xml version="1.0" encoding="UTF-8"?><searching>
   <query>query01</query>
   <document id="0">
      <title>lord of the rings.</title>
    <snippet>
      this is a snippet of a document.
    </snippet>
      <url>http://www.google.com/</url>
   </document>
   <document id="1">
      <title>harry potter.</title>
    <snippet>
            this is a snippet of a document.
    </snippet>
      <url>http://www.google.com/</url>
   </document>
   ........ #and other documents .....

  <group id="0" size="298" score="145">
      <title>
         <phrase>GROUP A</phrase>
      </title>
      <document refid="0"/>
      <document refid="1"/>
      <document refid="84"/>
   </group>
  <group id="0" size="298" score="55">
      <title>
         <phrase>GROUP B</phrase>
      </title>
      <document refid="2"/>
      <document refid="13"/>
      <document refid="3"/>
   </group>
   </<searching>>

此外，我也尝试过BeautifulSoup，但对它很陌生。我不知道该怎么做。这就是我正在做的代码

def outputCluster(rFile):
    documentInReadFile = {}         #dictionary to store all document in readFile

    myfile = codecs.open(rFile, mode='r', encoding="utf8")
    soup = BeautifulSoup(myfile)
    # print all text in readFile:
    # print soup.prettify()

    # print soup.find+_all('title')

outputCluster("file.xml")

请给我一些建议。谢谢。

您看过语法分析器了吗？web上有很多示例。

非常适合浏览XML。如果您进入文档，它将向您展示如何以多种方式操作XML，包括如何获取标记的内容。文档中的exmaple是：
XML:

你可以很容易地操纵它来做你想做的事。

之前的海报有权使用它。etree文档可在以下位置找到：

我可以帮你。下面是一个可能实现这一技巧的代码示例（部分取自上面的链接）：

或者，如果您希望将ID存储在group标记中，您可以使用

ID=group.get（'ID'）

而不是搜索所有

refid

s。使用BeautifulSoup很好，一开始有点令人惊讶

soup = BeautifulSoup(myfile)

soup将成为整个文件，然后您必须在其中搜索以找到所需的部分，例如：

group = soup.find(name="group, attrs={'id':'0', 'size':'298'}")

组现在包含标记组及其内容（找到的第一个匹配组）：

将包含您的答案，其中仍将包含标签，因此您需要根据bs版本使用其他功能。

使用findall创建列表，您可以在列表上迭代查找多个元素，并且可以随时跟踪旧标记，以便以后可以查找其他信息，而不是执行soup=soup.find（…），这意味着您只查找一个特定的内容，而在这两者之间丢失标记，这与执行soup=find（…）.find（…）.findall（…）[-1]。find相同（…）['id']，例如。

太棒了。谢谢。：）这就是我想要的。谢谢。：）

>>> for country in root.findall('country'):
...   rank = country.find('rank').text
...   name = country.get('name')
...   print name, rank
...
Liechtenstein 1
Singapore 4
Panama 68

import xml.etree.ElementTree as ET
tree = ET.parse('your_file.xml')
root = tree.getroot()

for group in root.findall('group'):
  title = group.find('title')
  titlephrase = title.find('phrase').text
  for doc in group.findall('document'):
    refid = doc.get('refid')

soup = BeautifulSoup(myfile)

group = soup.find(name="group, attrs={'id':'0', 'size':'298'}")

<group>blabla its contents<tag inside it>blabla</tag inside it>etc.</group>

lastthingyoufound.find(name='phrase')