用Python刮取XML文件_Python_Xml_Csv_Parsing

用Python刮取XML文件

python xml csv parsing

用Python刮取XML文件,python,xml,csv,parsing,Python,Xml,Csv,Parsing,我一直在尝试从2个标记（仅代码和源代码）中提取一个XML文件来复制内容。xml文件如下所示： <Series xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RunDate>2018-06-12</RunDate> <Instruments> <Instrument> <Code>27BA1</Code> &l

我一直在尝试从2个标记（仅代码和源代码）中提取一个XML文件来复制内容。xml文件如下所示：

<Series xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RunDate>2018-06-12</RunDate>
  <Instruments>
    <Instrument>
      <Code>27BA1</Code>
      <Source>YYY</Source>
    </Instrument>
    <Instrument>
      <Code>28BA1</Code>
      <Source>XXX</Source>
    </Instrument>
      <Code>29BA1</Code>
      <Source>XXX</Source>
    </Instrument>
      <Code>30BA1</Code>
      <Source>DDD</Source>
    </Instrument>
  </Instruments>
</Series>

我只是把第一个代码擦掉了。下面是代码。有人能帮忙吗

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("data.xml")
csv_fname = "data.csv"
root = tree.getroot()

f = open(csv_fname, 'w')
csvwriter = csv.writer(f)
count = 0
head = ['Code', 'Source']

csvwriter.writerow(head)

for time in root.findall('Instruments'):
    row = []
    job_name = time.find('Instrument').find('Code').text
    row.append(job_name)
    job_name_1 = time.find('Instrument').find('Source').text
    row.append(job_name_1)
    csvwriter.writerow(row)
f.close()

如果您能够针对您的文档运行xslt—我想您可以—另一种方法将使这变得非常简单：

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:text>Code,Source</xsl:text><xsl:text>&#xa;</xsl:text>
    <xsl:apply-templates select="//Instrument"/>
  </xsl:template>
  <xsl:template match="Instrument">
<xsl:value-of select="Code"/>,<xsl:value-of select="Source"/><xsl:text>&#xa;</xsl:text>
</xsl:template>
</xsl:stylesheet>

要在Python中运行此功能，我想您需要类似于中建议的方法：

我不使用Python，所以我不知道这是否正确。

哎哟-我还忘了提到您的XML文档无效-第11行和第14行缺少开头的

元素。将它们添加到它们所属的位置可以使文档正确转换。

如果您能够对文档运行xslt（我想您可以），另一种方法将使这一点非常简单：

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:text>Code,Source</xsl:text><xsl:text>&#xa;</xsl:text>
    <xsl:apply-templates select="//Instrument"/>
  </xsl:template>
  <xsl:template match="Instrument">
<xsl:value-of select="Code"/>,<xsl:value-of select="Source"/><xsl:text>&#xa;</xsl:text>
</xsl:template>
</xsl:stylesheet>

要在Python中运行此功能，我想您需要类似于中建议的方法：

我不使用Python，所以我不知道这是否正确。

哎哟-我还忘了提到您的XML文档无效-第11行和第14行缺少开头的

元素。将这些文件添加到它们所属的位置可以正确转换文档。

您在帖子中给出的XML文件无效。通过将文件粘贴到此处进行检查

我认为有效的xml应该是

<Series xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RunDate>2018-06-12</RunDate>
  <Instruments>
    <Instrument>
      <Code>27BA1</Code>
      <Source>YYY</Source>
    </Instrument>
    <Instrument>
      <Code>28BA1</Code>
      <Source>XXX</Source>
    </Instrument>
    <Instrument>
      <Code>29BA1</Code>
      <Source>XXX</Source>
    </Instrument>
    <Instrument>
      <Code>30BA1</Code>
      <Source>DDD</Source>
    </Instrument>
  </Instruments>
</Series>

打印代码和源标记中的值

from lxml import etree
root = etree.parse('data.xml').getroot()
instruments = root.find('Instruments')
instrument = instruments.findall('Instrument')
for grandchild in instrument:
    code, source = grandchild.find('Code'), grandchild.find('Source')
    print (code.text), (source.text)

您在帖子中提供的XML文件无效。通过将文件粘贴到此处进行检查

我认为有效的xml应该是

<Series xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RunDate>2018-06-12</RunDate>
  <Instruments>
    <Instrument>
      <Code>27BA1</Code>
      <Source>YYY</Source>
    </Instrument>
    <Instrument>
      <Code>28BA1</Code>
      <Source>XXX</Source>
    </Instrument>
    <Instrument>
      <Code>29BA1</Code>
      <Source>XXX</Source>
    </Instrument>
    <Instrument>
      <Code>30BA1</Code>
      <Source>DDD</Source>
    </Instrument>
  </Instruments>
</Series>

打印代码和源标记中的值

from lxml import etree
root = etree.parse('data.xml').getroot()
instruments = root.find('Instruments')
instrument = instruments.findall('Instrument')
for grandchild in instrument:
    code, source = grandchild.find('Code'), grandchild.find('Source')
    print (code.text), (source.text)

你好我不知道该怎么做。任何指导都将不胜感激。感谢您没有指定您所使用的语言或环境。我不认识你问题中的语言，因此我也不知道你在用什么来执行它。针对文档运行样式表的最佳方式取决于您的工具—请您在问题中指定。谢谢。我想这不是我想要的。我只是想找个人[请看我的Python代码，告诉我我做错了什么。我不想使用xslt。谢谢你的帮助。嗨，我不知道怎么做。请提供任何指导。谢谢你没有说明你正在使用的语言或环境。我不认识你问题中的语言，所以我也不知道您正在使用什么来执行它。针对文档运行样式表的最佳方式取决于您的工具—请您在问题中指定。谢谢。我不认为这是我要找的。我只是想找个人[请看一下我的Python代码，告诉我我做错了什么。不想使用xslt。谢谢您的帮助。