Python 读取CSV文件并替换xml标记_Python_Xml_Parsing_Xml Parsing_Xml.etree

Python 读取CSV文件并替换xml标记

python xml parsing

Python 读取CSV文件并替换xml标记,python,xml,parsing,xml-parsing,xml.etree,Python,Xml,Parsing,Xml Parsing,Xml.etree,我想读取一个CSV文件，并用CSV文件的第二列替换xml文件中的标记。标记“name”值位于第一列中 A | B Value1 | ValueX Value2 | ValueX Value3 | ValueY XML结构看起来像 <products> <product> <name>Value1</name> </product> <pro

我想读取一个CSV文件，并用CSV文件的第二列替换xml文件中的标记。标记“name”值位于第一列中

A         |    B

Value1    |    ValueX
Value2    |    ValueX
Value3    |    ValueY

XML结构看起来像

<products>
   <product>
      <name>Value1</name>
   </product>
   <product>
      <name>Values2</name>
   </product>
   <product>
      <name>Values3</name>
   </product>
</products>

我怎么办

以下是CSV文件的外观：

输出应如下所示：

好的，这是我的解决方案：

import lxml.etree as ET


arr = ["Value1", "Value2", "Value3"]
arr2 = ["ValuX", "ValuX", "ValueY"]

with open('file.xml', 'rb+') as f:
    tree = ET.parse(f)
    root = tree.getroot()
    for i, item in enumerate(arr):
         for elem in root.findall('.//Value1'):
             print(elem);
             if elem.tag:
                 print(item)
                 print(arr2[i])

                 elem.text = elem.text.replace(item, arr2[i])



    f.seek(0)
    f.write(ET.tostring(tree, encoding='UTF-8', xml_declaration=True))
    f.truncate()

我使用的是数组。我可以将文件中的值复制到数组中。对于大文件，它需要更好的代码

考虑使用一种特殊用途的声明性语言来重新构造XML文件。与大多数其他通用语言（包括ASP、C#、Java、PHP、Perl、VB）一样，Python维护XSLT1.0处理器，特别是在其

lxml

模块中

出于您的目的，您可以动态创建可用于转换的XSLT字符串。唯一需要的循环是通过csv数据循环：

import csv
import lxml.etree as ET

# READ IN CSV DATA AND APPEND TO LIST
csvdata = []
with open('file.csv'), 'r') as csvfile:
    readCSV = csv.reader(csvfile)
    for line in readCSV:
        csvdata.append(line)

# DYNAMICALLY CREATE XSLT STRING
xsltstr = '''<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
            <xsl:output version="1.0" encoding="UTF-8" indent="yes" />
            <xsl:strip-space elements="*"/>

              <!-- Identity Transform -->
              <xsl:template match="@*|node()">
                <xsl:copy>
                  <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
              </xsl:template>

        '''

for i in range(len(csvdata)):
    xsltstr = xsltstr + \
              '''<xsl:template match="name[.='{0}']">
                  <xsl:element name="{1}">
                     <xsl:apply-templates />
                  </xsl:element>
              </xsl:template>

              '''.format(*csvdata[i])

xsltstr = xsltstr + '</xsl:transform>'

# PARSE ORIGINAL FILE AND XSLT STRING
dom = ET.parse('jolly.xml')
xslt = ET.fromstring(xsltstr)

# TRANSFORM XML
transform = ET.XSLT(xslt)
newdom = transform(dom)

# OUTPUT FINAL XML (PRETTY PRINT)
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)

xmlfile = open('final.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

导入csv
将lxml.etree作为ET导入
#读入CSV数据并附加到列表中
csvdata=[]
将open（'file.csv'），'r'）作为csvfile：
readCSV=csv.reader（csvfile）
对于readCSV中的行：
csvdata.append（行）
#动态创建XSLT字符串
xsltstr=''
'''
对于范围内的i（len（csvdata））：
xsltstr=xsltstr+\
'''
''。格式（*csvdata[i]）
xsltstr=xsltstr+“”
#解析原始文件和XSLT字符串
dom=ET.parse（'jolly.xml'）
xslt=ET.fromstring（xsltstr）
#转换XML
transform=ET.XSLT（XSLT）
newdom=变换（dom）
#输出最终XML（漂亮打印）
tree\u out=ET.tostring（newdom，encoding='UTF-8'，pretty\u print=True，xml\u declaration=True）
xmlfile=open（'final.xml'），'wb'）
xmlfile.write（树输出）
xmlfile.close（）

输出

<?xml version='1.0' encoding='UTF-8'?>
<products>
  <product>
    <ValueX>Value1</ValueX>
  </product>
  <product>
    <ValueY>Value2</ValueY>
  </product>
  <product>
    <ValueZ>Value3</ValueZ>
  </product>
</products>


价值1
价值2
价值3

您在Python文档中查找过csv和ElementTree模块吗？你写了什么代码？很好，你有一些代码。有什么问题吗？谢谢你的帮助。Iam收到以下错误：回溯（最近一次调用上次）：文件“readCSVReplaceTags.py”，第2行，作为ET ImportError导入lxml.etree：没有名为lxml.etree的模块。我已经安装了lxml，但它不工作。有没有其他模块我也可以这样做？您没有安装lxml。我已经安装了lxml，但它不工作？尝试重新安装

pip install lxml

，您还需要

libxml2 dev

和

libxslt1 dev

。请参阅我正在使用Mac OS X 10.11回溯（上次调用）：文件“readCSVReplaceTags.py”，第11行，在readCSV:File“/usr/local/cillar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/codecs.py”中的第321行，在解码（结果，消耗）=self.\u缓冲区\解码（数据，self.errors，final）UnicodeDecodeError:“utf-8”编解码器无法解码位置33处的字节0xdf：无效的延续字节您的csv文件中有特殊字符：重音、外语项等，可能需要编码。发布实际数据，以便我们查看。注意XML标记不应该有空格，也不应该以数字开头。所以检查B列。

<?xml version='1.0' encoding='UTF-8'?>
<products>
  <product>
    <ValueX>Value1</ValueX>
  </product>
  <product>
    <ValueY>Value2</ValueY>
  </product>
  <product>
    <ValueZ>Value3</ValueZ>
  </product>
</products>