Python 文本文件创建问题，其中新行不是真正下线时创建的_Python_Xml_Regex_Csv_Cygwin

Python 文本文件创建问题，其中新行不是真正下线时创建的

python xml regex csv cygwin

Python 文本文件创建问题，其中新行不是真正下线时创建的,python,xml,regex,csv,cygwin,Python,Xml,Regex,Csv,Cygwin,我正在将我用python创建的一组文件（将元数据/xml记录转换为文本）中的一些文本数据导入excel。除了在文本仅位于段落中的位置插入新行外，它基本上工作正常。这是文件创建过程中的一个问题是否可以自动清理数据以保持数据在同一行中，直到它遇到转义/新字符由于这个网站不允许附件，我附上了一些例子 anz*_log.txt——我使用“^”作为分隔符的原始文本文件。我可以强制它在每个已知行的末尾添加另一个字符，如果excel可以使用它仅在存在新行时创建新行 anz*_xml.xls Excel导入

我正在将我用python创建的一组文件（将元数据/xml记录转换为文本）中的一些文本数据导入excel。除了在文本仅位于段落中的位置插入新行外，它基本上工作正常。这是文件创建过程中的一个问题

是否可以自动清理数据以保持数据在同一行中，直到它遇到转义/新字符

由于这个网站不允许附件，我附上了一些例子

anz*_log.txt——我使用“^”作为分隔符的原始文本文件。我可以强制它在每个已知行的末尾添加另一个字符，如果excel可以使用它仅在存在新行时创建新行

anz*_xml.xls Excel导入-工作表（*日志）原始导入数据），并清理了我使用公式正确获取值的地方

rowChar_anz*log.txt-带有“：；：”的原始文本文件在每一行的开头显示它应该是一个新行（与1相同，但为行添加了分隔符）

这只是一个测试数据集，我需要在1000个文件上运行它。见第9、13、54行中的问题

我可以使用python（或者如果必要的话使用cygwing/SED）来

查找“行的开始”字符串-'：；：'和“行尾”字符串“；：；”

若两行都不存在，那个么将行追加到前一行

或者（理想情况下）可以在使用以下代码创建文件时执行此操作？可能使用re.compile（如中所示）

找到答案…只需将.replace（'\n'，''）添加到写入每个条目的命令。几个小时前就该想到这个了

#-------------------------------------------------------------------------------
# Name:        Convert xml data to csv with anzlic tagged data kept seperate
# Purpose:  Also has an excel template to convert the data into standard columns
#
# Author:      georgec@atgis.com.au
#
# Created:     05/03/2013
# Copyright:   (c) ATGIS. georgec 2013
# Licence:     Creative Commons
#-------------------------------------------------------------------------------

import os, xml, shutil, datetime
from xml.etree import ElementTree as et

SourceDIR=r'L:\Vector_Data'
rootDir=os.getcwd()
log_name='vector'
x=0

def locatexml(SourceDIR,x, rootDir):
    xmllist=[]
    for root, dirs, files in os.walk(SourceDIR, topdown=False):
        for fl in files:
            currentFile=os.path.join(root, fl)
            ext=fl[fl.rfind('.')+1:]
            if ext=='xml':
                xmllist.append(currentFile)
                print currentFile
                x+=1
                try:
                    processxml(currentFile,x, rootDir)
                except:
                    print "Issue with file: "+ currentFile
                    log=open(rootDir+'\\'+log_name+'issue_xml_log.txt','a')
                    log.write(str(x)+'^'+currentFile+'\n')
                    log.close

    print "finished"
    return xmllist, x, currentFile

def processxml(currentFile,x, rootDir):
    from lxml import etree
    seperator='^'
    with open(currentFile) as f:
        tree = etree.parse(f)
    xmltaglist=[]
    for tagn in tree.iter(tag=None):
        #print tagn.tag
        xmltaglist.append(tagn.tag)
    if 'anzmeta' in str(tree.getroot()):
        log=open(rootDir+'\\'+log_name+'anzmeta_xml_log.txt','a')
        log.write(':;:'+seperator+str(x)+seperator+currentFile+seperator)
        for xmltag in xmltaglist:
            for element in tree.iter(xmltag):
                #print element[x]
                for child in element.getchildren():
                    print "{0.tag}: {0.text}".format(child)
                    log.write("{0.tag}".format(child)+"::"+"{0.text}".format(child)+seperator)
        log.write('\n')
        log.close
    else:
        print currentFile+" not an anzlic metadata file...logging seperately"
        log=open(rootDir+'\\'+log_name+'non_anzmeta_xml_log.txt','a')
        log.write(':;:'+seperator+str(x)+seperator+currentFile+seperator)
        for xmltag in xmltaglist:
            for element in tree.iter(xmltag):
                #print element[x]
                for child in element.getchildren():
                    print "{0.tag}: {0.text}".format(child)
                    log.write("{0.tag}".format(child)+"::"+"{0.text}".format(child)+seperator)
        log.write('\n')
        log.close

locatexml(SourceDIR,x, rootDir)