Python 按变量拆分Textdocument，并根据特定字符串命名新文件_Python_Xml_Batch File_Split

Python 按变量拆分Textdocument，并根据特定字符串命名新文件

python xml batch-file

Python 按变量拆分Textdocument，并根据特定字符串命名新文件,python,xml,batch-file,split,Python,Xml,Batch File,Split,我有一个文本文档，包含头和体等元素。例如： Project.xml： <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE datafile PUBLIC""> <datafile> <header> <name>name</name> <description>description</description>

我有一个文本文档，包含头和体等元素。例如：

Project.xml：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE datafile PUBLIC"">

<datafile>
    <header>
        <name>name</name>
        <description>description</description>
    </header>
    <master name="Project1">
        <description>Project1</description>
        <slave name="random information"/>
        <slave name="random information"/>
    </master>
    <master name="Project2">
        <description>Project2</description>
        <slave name="random information"/>
        <slave name="random information"/>
        <slave name="random information"/>
        <slave name="random information"/>
    </master>
    <master name="Project3">
        <description>Project3</description>
        <slave name="random information"/>
        <slave name="random information"/>
    </master>
</datafile>


名称
描述
项目1
项目2
项目3

我试图复制标题，但将

值

替换为


项目1
描述

并剪切/粘贴每个“主”标签：


项目1

并附加结束标记

</datafile>

这三个部分合并成一个新文档。文件名应取自“主名称”标记。在这种情况下，它是“Project1”

因此，基本上，输出将是如下所示的三个文件：

Project1.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE datafile PUBLIC"">

<datafile>
    <header>
        <name>Project1</name>
        <description>description</description>
    </header>
    <master name="Project1">
        <description>Project1</description>
        <slave name="random information"/>
        <slave name="random information"/>
    </master>
</datafile>


项目1
描述
项目1

Project2.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE datafile PUBLIC"">

<datafile>
    <header>
        <name>Project2</name>
        <description>description</description>
    </header>
    <master name="Project2">
        <description>Project2</description>
        <slave name="random information"/>
        <slave name="random information"/>
        <slave name="random information"/>
        <slave name="random information"/>
    </master>
</datafile>


项目2
描述
项目2

Proejct3.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE datafile PUBLIC"">

<datafile>
    <header>
        <name>Project3</name>
        <description>description</description>
    </header>
    <master name="Project3">
        <description>Project3</description>
        <slave name="random information"/>
        <slave name="random information"/>
    </master>
</datafile>


项目3
描述
项目3

项目名称的值中也可以有空格

遗憾的是，到目前为止我还没有任何真正的代码。我只知道如何使用记事本++查找和复制特定部分。但就是这样-

所以我非常感谢你的帮助。这种方法对我来说并不重要。它可以是batchfile、python或其他任何形式。谢谢

哦，如果这可以作为多个文档的循环，那就更完美了：）

所以这不是免费的编码服务。为了得到任何帮助，你必须表现出自己的努力。然而，在这种情况下，我做了一个例外

下面的批处理文件将您请求的文件拆分为一个.xml文件。但是，修改此代码以处理多个

*.xml

文件非常简单，但您必须自己研究才能做到这一点（提示：它需要一个

for

命令和一个

dir

命令）

编辑：我添加了在

标签中插入每个项目名称的新要求

@echo off
setlocal EnableDelayedExpansion

call :SplitFile < Project.xml
goto :EOF


:SplitFile

rem First part: extract the header until first "<master" line
del Header.xml 2>NUL
:Header
set "line="
set /P "line="
if not defined line echo/>> Header.xml & goto Header
if "!line:master=!" equ "!line!" (
   if "!line:name=!" neq "!line!" (
      for /F "tokens=1-4 delims=<>" %%a in ("!line!") do set "line=%%a<%%b>^!file^!<%%d>"
   )
   echo !line!>> Header.xml
   goto Header
)

rem Second part: extract each "<master" section into its own file
set "files="
:Master0
for /F "tokens=2 delims==>" %%a in ("!line!") do set "file=%%~a"
set "files=%files% %file%"
(
for /F "delims=" %%a in (Header.xml) do echo/%%a
echo !line!
) > %file%.xml
:Master1
set "line="
set /P "line="
if not defined line echo/>> %file%.xml & goto Master1
if "!line:master=!" equ "!line!" echo !line!>> %file%.xml & goto Master1
echo !line!>> %file%.xml
:Master2
set "line="
set /P "line="
if not defined line goto Master2
if "!line:master=!" neq "!line!" goto Master0

rem Third part: add the last line to all files
for %%a in (%files%) do echo !line!>> %%a.xml
del Header.xml

exit /B

@echo关闭
setlocal EnableDelayedExpansion
调用：SplitFilerem第一部分：提取标题直到第一个“，所以这不是一个免费的编码服务。你必须展示你自己的努力才能得到任何帮助。但是，我在这个例子中做了一个例外
下面的批处理文件将您请求的文件拆分为一个.xml文件。但是，为了处理多个*.xml
文件，修改此代码非常简单，但您必须自己进行研究才能做到这一点（提示：它需要for
命令与dir
命令相结合）
编辑：我添加了在
标签中插入每个项目名称的新要求
@echo off
setlocal EnableDelayedExpansion

call :SplitFile < Project.xml
goto :EOF


:SplitFile

rem First part: extract the header until first "<master" line
del Header.xml 2>NUL
:Header
set "line="
set /P "line="
if not defined line echo/>> Header.xml & goto Header
if "!line:master=!" equ "!line!" (
   if "!line:name=!" neq "!line!" (
      for /F "tokens=1-4 delims=<>" %%a in ("!line!") do set "line=%%a<%%b>^!file^!<%%d>"
   )
   echo !line!>> Header.xml
   goto Header
)

rem Second part: extract each "<master" section into its own file
set "files="
:Master0
for /F "tokens=2 delims==>" %%a in ("!line!") do set "file=%%~a"
set "files=%files% %file%"
(
for /F "delims=" %%a in (Header.xml) do echo/%%a
echo !line!
) > %file%.xml
:Master1
set "line="
set /P "line="
if not defined line echo/>> %file%.xml & goto Master1
if "!line:master=!" equ "!line!" echo !line!>> %file%.xml & goto Master1
echo !line!>> %file%.xml
:Master2
set "line="
set /P "line="
if not defined line goto Master2
if "!line:master=!" neq "!line!" goto Master0

rem Third part: add the last line to all files
for %%a in (%files%) do echo !line!>> %%a.xml
del Header.xml

exit /B

@echo关闭
setlocal EnableDelayedExpansion
调用：SplitFilerem第一部分：提取标题直到第一个“我在这里发现了一个类似的问题，并利用了这个答案

我根据自己的要求修改了代码
import xml.etree.ElementTree as ET

# Load the xml
doc = ET.parse(r"C:\1\Project.xml")
root = doc.getroot()
# Get the header element
header = root.find("header")
# loop over the masters and create the new xml file
for master in root.findall('master'):
    top = ET.Element(root.tag)
    top.append(header)
    top.append(master)
    out_master = ET.ElementTree(top)
    # the output file name will be the ID of the master
    out_path = "%s.xml" % master.attrib["name"]
    out_master.write(out_path, encoding='UTF-8', xml_declaration=True)

这将从master.attrib[“name”]生成具有正确文件名的拆分xml文件。现在，我只需要找到一种方法来替换标题中的值
，还需要使用master.attrib[“name”]值
*编辑：我将在此处粘贴我正在使用的来自Parfait的代码：
import os 
import lxml.etree as et

dir = "C:\\1\\"

xslstr = '''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="/datafile">
    <xsl:copy>
         <xsl:apply-templates select="header"/>
         <xsl:copy-of select="master[{0}]"/>      
    </xsl:copy>
  </xsl:template>

  <xsl:template match="header">
    <xsl:copy>
        <name><xsl:value-of select="ancestor::datafile/master[{0}]/@name"/></name>
       <xsl:copy-of select="description"/> 
    </xsl:copy>
  </xsl:template>      
</xsl:stylesheet>'''

# LOOP THROUGH FILES IN DIRECTORY
for f in os.listdir(dir):
    if f.endswith('.xml'):

        for i in range(1,len(doc.xpath('//master'))+1):
            # PARSE XML
            doc = et.parse(os.path.join(dir, f))

            # PARSE XSLT (same xslstr as above -no need to loop it)
            xsl = et.fromstring(xslstr.format(i))

            # TRANFORM INPUT TO OUTPUT
            transform = et.XSLT(xsl)
            result = transform(doc)

            # SAVE OUTPUT
            outfile = doc.xpath('//master[{}]/@name'.format(i))[0] + '.xml'
            with open(os.path(dir, outfile), 'wb') as f:
                f.write(result)

导入操作系统
将lxml.etree作为et导入
dir=“C:\\1\\”
xslstr=''
'''
#循环浏览目录中的文件
对于os.listdir（dir）中的f：
如果f.endswith（'.xml'）：
对于范围（1，len（doc.xpath（'//master'））+1）内的i：
#解析XML
doc=et.parse（os.path.join（dir，f））
#解析XSLT（与上面的xslstr相同-无需循环）
xsl=et.fromstring（xslstr.format（i））
#转换输入到输出
transform=et.XSLT（xsl）
结果=转换（文档）
#保存输出
outfile=doc.xpath（'//master[{}]/@name'.format（i））[0]+'.xml'
将open（os.path（dir，outfile），“wb”）作为f：
f、 写入（结果）

这导致：
C:\1>C:\1\strip.py
Traceback (most recent call last):
  File "C:\1\strip.py", line 29, in <module>
    for i in range(1,len(doc.xpath('//machine'))+1):
NameError: name 'doc' is not defined

C:\1>C:\1\strip.py
回溯（最近一次呼叫最后一次）：
文件“C:\1\strip.py”，第29行，在
对于范围为（1，len（doc.xpath（'//machine'））+1的i：
名称错误：未定义名称“doc”
我在这里找到了一个类似的问题，并利用了这个答案

我根据自己的要求修改了代码
import xml.etree.ElementTree as ET

# Load the xml
doc = ET.parse(r"C:\1\Project.xml")
root = doc.getroot()
# Get the header element
header = root.find("header")
# loop over the masters and create the new xml file
for master in root.findall('master'):
    top = ET.Element(root.tag)
    top.append(header)
    top.append(master)
    out_master = ET.ElementTree(top)
    # the output file name will be the ID of the master
    out_path = "%s.xml" % master.attrib["name"]
    out_master.write(out_path, encoding='UTF-8', xml_declaration=True)

这将从master.attrib[“name”]生成具有正确文件名的拆分xml文件。现在，我只需要找到一种方法来替换标题中的值
，还需要使用master.attrib[“name”]值
*编辑：我将在此处粘贴我正在使用的来自Parfait的代码：
import os 
import lxml.etree as et

dir = "C:\\1\\"

xslstr = '''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="/datafile">
    <xsl:copy>
         <xsl:apply-templates select="header"/>
         <xsl:copy-of select="master[{0}]"/>      
    </xsl:copy>
  </xsl:template>

  <xsl:template match="header">
    <xsl:copy>
        <name><xsl:value-of select="ancestor::datafile/master[{0}]/@name"/></name>
       <xsl:copy-of select="description"/> 
    </xsl:copy>
  </xsl:template>      
</xsl:stylesheet>'''

# LOOP THROUGH FILES IN DIRECTORY
for f in os.listdir(dir):
    if f.endswith('.xml'):

        for i in range(1,len(doc.xpath('//master'))+1):
            # PARSE XML
            doc = et.parse(os.path.join(dir, f))

            # PARSE XSLT (same xslstr as above -no need to loop it)
            xsl = et.fromstring(xslstr.format(i))

            # TRANFORM INPUT TO OUTPUT
            transform = et.XSLT(xsl)
            result = transform(doc)

            # SAVE OUTPUT
            outfile = doc.xpath('//master[{}]/@name'.format(i))[0] + '.xml'
            with open(os.path(dir, outfile), 'wb') as f:
                f.write(result)

导入操作系统
将lxml.etree作为et导入
dir=“C:\\1\\”
xslstr=''
'''
#循环浏览目录中的文件
对于os.listdir（dir）中的f：
如果f.endswith（'.xml'）：
对于范围（1，len（doc.xpath（'//master'））+1）内的i：
#解析XML
doc=et.parse（os.path.join（dir，f））
#解析XSLT（与上面的xslstr相同-无需循环）
xsl=et.fromstring（xslstr.format（i））
#转换输入到输出
transform=et.XSLT（xsl）
结果=转换（文档）
#保存输出
outfile=doc.xpath（'//master[{}]/@name'.format（i））[0]+'.xml'
将open（os.path（dir，outfile），“wb”）作为f：
f、 写入（结果）

这导致：
C:\1>C:\1\strip.py
Traceback (most recent call last):
  File "C:\1\strip.py", line 29, in <module>
    for i in range(1,len(doc.xpath('//machine'))+1):
NameError: name 'doc' is not defined

C:\1>C:\1\strip.py
回溯（最近一次呼叫最后一次）：
文件“C:\1\strip.py”，第29行，在
对于范围内的i（1，le