python中的文件处理

python中的文件处理,python,file-io,python-textprocessing,Python,File Io,Python Textprocessing,我正在使用Python处理文本文件。 我有一个文本文件(ctl_Files.txt),其中包含以下内容/或类似内容: ------------------------ Changeset: 143 User: Sarfaraz Date: Tuesday, April 05, 2011 5:34:54 PM Comment: Initial add, all objects. Items: add $/Systems/DB/Expences/Loader add $/System

我正在使用Python处理文本文件。 我有一个文本文件(ctl_Files.txt),其中包含以下内容/或类似内容:

------------------------
Changeset: 143
User: Sarfaraz
Date: Tuesday, April 05, 2011 5:34:54 PM

Comment:
  Initial add, all objects.

Items:
  add $/Systems/DB/Expences/Loader
  add $/Systems/DB/Expences/Loader/AAA.txt
  add $/Systems/DB/Expences/Loader/BBB.txt
  add $/Systems/DB/Expences/Loader/CCC.txt  

Check-in Notes:
  Code Reviewer:
  Performance Reviewer:
  Reviewer:
  Security Reviewer:
------------------------
Changeset: 145
User: Sarfaraz
Date: Thursday, April 07, 2011 5:34:54 PM

Comment:
  edited objects.

Items:
  edit $/Systems/DB/Expences/Loader
  edit $/Systems/DB/Expences/Loader/AAA.txt
  edit $/Systems/DB/Expences/Loader/AAB.txt  

Check-in Notes:
  Code Reviewer:
  Performance Reviewer:
  Reviewer:
  Security Reviewer:
------------------------
Changeset: 147
User: Sarfaraz
Date: Wednesday, April 06, 2011 5:34:54 PM

Comment:
  Initial add, all objects.

Items:
  delete, source rename $/Systems/DB/Expences/Loader/AAA.txt;X34892
  rename                $/Systems/DB/Expences/Loader/AAC.txt.

Check-in Notes:
  Code Reviewer:
  Performance Reviewer:
  Reviewer:
  Security Reviewer:
------------------------
为了处理此文件,我编写了以下代码:

#Tags - used for spliting the information

tag1 = 'Changeset:'
tag2 = 'User:'
tag3 = 'Date:'
tag4 = 'Comment:'
tag5 = 'Items:'
tag6 = 'Check-in Notes:'

#opening and reading the input file
#In path to input file use '\' as escape character
with open ("C:\\Users\\md_sarfaraz\\Desktop\\ctl_Files.txt", "r") as myfile:
    val=myfile.read().replace('\n', ' ')

#counting the occurence of any one of the above tag
#As count will be same for all the tags
occurence = val.count(tag1)

#initializing row variable
row=""

#passing the count - occurence to the loop
for count in  range(1, occurence+1):
   row += ( (val.split(tag1)[count].split(tag2)[0]).strip() + '|' \
    + (val.split(tag2)[count].split(tag3)[0]).strip() + '|' \
    + (val.split(tag3)[count].split(tag4)[0]).strip() + '|' \
    + (val.split(tag4)[count].split(tag5)[0]).strip() + '|' \
    + (val.split(tag5)[count].split(tag6)[0]).strip() + '\n')

#opening and writing the output file
#In path to output file use '\' as escape character
file = open("C:\\Users\\md_sarfaraz\\Desktop\\processed_ctl_Files.txt", "w+") 
file.write(row)
file.close()
并得到以下结果/文件(processed_ctl_Files.txt):

但是,我想要这样的结果:

143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader   
                                                                          add $/Systems/DB/Expences/Loader/AAA.txt   
                                                                          add $/Systems/DB/Expences/Loader/BBB.txt   
                                                                          add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader   
                                                                 edit $/Systems/DB/Expences/Loader/AAA.txt   
                                                                 edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892   
                                                                            rename                $/Systems/DB/Rascal/Expences/AAC.txt.
或者,如果我们能得到这样的结果,那就太好了:

143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader   
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/AAA.txt   
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/BBB.txt   
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader   
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader/AAA.txt   
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892   
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|rename                $/Systems/DB/Rascal/Expences/AAC.txt.

让我知道我该怎么做。另外,我对Python非常陌生,所以如果我编写了一些糟糕或冗余的代码,请忽略。并帮助我改进这一点。

我将从将值提取到变量开始。然后从前几个标记创建前缀。您可以计算前缀中的字符数,并将其用作填充。当您访问项目时,可以将第一个项目附加到前缀中,并且可以将任何其他项目附加到根据所需空间数创建的填充中

# keywords used in the tag "Items: "
keywords = ['add', 'delete', 'edit', 'source', 'rename']

#passing the count - occurence to the loop
for cs in val.split(tag1)[1:]:
    changeset =  cs.split(tag2)[0].strip()
    user = cs.split(tag2)[1].split(tag3)[0].strip()
    date = cs.split(tag3)[1].split(tag4)[0].strip()
    comment = cs.split(tag4)[1].split(tag5)[0].strip()
    items = cs.split(tag5)[1].split(tag6)[0].strip().split()
    notes = cs.split(tag6)
    prefix = '{0}|{1}|{2}|{3}'.format(changeset, user, date, comment)
    space_count = len(prefix)
    i = 0
    while i < len(items):
        # if we are printing the first item, add it to the other text
        if i == 0:
            pref = prefix
        # otherwise create padding from spaces
        else:
            pref = ' '*space_count
        # add all keywords
        words = ''
        for j in range(i, len(items)):
            if items[j] in keywords:
                words += ' ' + items[j]
            else:
                break
        if i >= len(items): break
        row += '{0}|{1} {2}\n'.format(pref, words, items[j])
        i += j - i + 1 # increase by the number of keywords + the param
#标记“项目”中使用的关键字:
关键词=[“添加”、“删除”、“编辑”、“源”、“重命名”]
#将计数-发生传递给循环
对于val.split(tag1)中的cs[1:]:
changeset=cs.split(tag2)[0].strip()
user=cs.split(tag2)[1]。split(tag3)[0]。strip()
date=cs.split(tag3)[1]。split(tag4)[0]。strip()
comment=cs.split(tag4)[1]。split(tag5)[0]。strip()
items=cs.split(tag5)[1]。split(tag6)[0]。strip().split()
注释=cs.拆分(tag6)
前缀=“{0}{1}{2}{3}”。格式(变更集、用户、日期、注释)
空间计数=len(前缀)
i=0
而i=len(项目):中断
行+='{0}|{1}{2}\n'。格式(pref,words,items[j])
i+=j-i+1#增加关键字数+参数

这似乎是你想要的,但我不确定这是否是最好的解决方案。也许逐行处理文件并将值直接打印到流中更好?

您可以使用正则表达式搜索“添加”、“编辑”等

import re 

#Tags - used for spliting the information 
tag1 = 'Changeset:'
tag2 = 'User:'
tag3 = 'Date:'
tag4 = 'Comment:'
tag5 = 'Items:'
tag6 = 'Check-in Notes:'

#opening and reading the input file
#In path to input file use '\' as escape character
with open ("wibble.txt", "r") as myfile:
    val=myfile.read().replace('\n', ' ') 

#counting the occurence of any one of the above tag
#As count will be same for all the tags
occurence = val.count(tag1)

#initializing row variable
row=""

prevlen = 0

#passing the count - occurence to the loop
for count in  range(1, occurence+1):
   row += ( (val.split(tag1)[count].split(tag2)[0]).strip() + '|' \
    + (val.split(tag2)[count].split(tag3)[0]).strip() + '|' \
    + (val.split(tag3)[count].split(tag4)[0]).strip() + '|' \
    + (val.split(tag4)[count].split(tag5)[0]).strip() + '|' )

   distance = len(row) - prevlen
   row += re.sub("\s\s+([edit]|[add]|[delete]|[rename])", r"\n"+r" "*distance+r"\1", (val.split(tag5)[count].split(tag6)[0])) + '\r'
   prevlen = len(row)

#opening and writing the output file
#In path to output file use '\' as escape character
file = open("wobble.txt", "w+")
file.write(row)
file.close()

这个解决方案没有使用正则表达式的答案那么简短,也可能没有它有效,但它应该很容易理解。该解决方案确实使使用解析数据变得更容易,因为每个部分的数据都存储在字典中

    ctl_file = "ctl_Files.txt" # path of source file
    processed_ctl_file = "processed_ctl_Files.txt" # path of destination file

    #Tags - used for spliting the information
    changeset_tag = 'Changeset:'
    user_tag = 'User:'
    date_tag = 'Date:'
    comment_tag = 'Comment:'
    items_tag = 'Items:'
    checkin_tag = 'Check-in Notes:'

    section_separator = "------------------------"
    changesets = []

    #open and read the input file
    with open(ctl_file, 'r') as read_file:
        first_section = True
        changeset_dict = {}
        items = []
        comment_stage = False
        items_stage = False
        checkin_dict = {}
        # Read one line at a time
        for line in read_file:
            # Check which tag matches the current line and store the data to matching key in the dictionary
            if changeset_tag in line:
                changeset = line.split(":")[1].strip()
                changeset_dict[changeset_tag] = changeset
            elif user_tag in line:
                user = line.split(":")[1].strip()
                changeset_dict[user_tag] = user
            elif date_tag in line:
                date = line.split(":")[1].strip()
                changeset_dict[date_tag] = date
            elif comment_tag in line:
                comment_stage = True
            elif items_tag in line:
                items_stage = True
            elif checkin_tag in line:
                pass                        # not implemented due to example file not containing any data
            elif section_separator in line: # new section
                if first_section:
                    first_section = False
                    continue
                tmp = changeset_dict
                changesets.append(tmp)          
                changeset_dict = {}
                items = []
                # Set stages to false just in case
                items_stage = False
                comment_stage = False
            elif not line.strip():  # empty line
                if items_stage:
                    changeset_dict[items_tag] = items
                    items_stage = False
                comment_stage = False
            else:
                if comment_stage:
                    changeset_dict[comment_tag] = line.strip()  # Only works for one line comment  
                elif items_stage:
                    items.append(line.strip())

    #open and write to the output file
    with open(processed_ctl_file, 'w') as write_file:
        for changeset in changesets:        
            row = "{0}|{1}|{2}|{3}|".format(changeset[changeset_tag], changeset[user_tag], changeset[date_tag], changeset[comment_tag])
            distance = len(row)
            items = changeset[items_tag]
            join_string = "\n" + distance * " "
            items_part = str.join(join_string, items)
            row += items_part + "\n"
            write_file.write(row)

另外,尝试使用描述其内容的变量名。像tag1、tag2等名称并不能说明变量内容的多少。这使得代码很难阅读,尤其是当脚本变长时。在大多数情况下,可读性似乎并不重要,但当重新访问旧代码时,理解代码对非描述变量的作用需要更长的时间。

您删除了所有的换行符,即使您希望它们出现在输出中?不要这样做,以一种不需要的方式解析输入。
    ctl_file = "ctl_Files.txt" # path of source file
    processed_ctl_file = "processed_ctl_Files.txt" # path of destination file

    #Tags - used for spliting the information
    changeset_tag = 'Changeset:'
    user_tag = 'User:'
    date_tag = 'Date:'
    comment_tag = 'Comment:'
    items_tag = 'Items:'
    checkin_tag = 'Check-in Notes:'

    section_separator = "------------------------"
    changesets = []

    #open and read the input file
    with open(ctl_file, 'r') as read_file:
        first_section = True
        changeset_dict = {}
        items = []
        comment_stage = False
        items_stage = False
        checkin_dict = {}
        # Read one line at a time
        for line in read_file:
            # Check which tag matches the current line and store the data to matching key in the dictionary
            if changeset_tag in line:
                changeset = line.split(":")[1].strip()
                changeset_dict[changeset_tag] = changeset
            elif user_tag in line:
                user = line.split(":")[1].strip()
                changeset_dict[user_tag] = user
            elif date_tag in line:
                date = line.split(":")[1].strip()
                changeset_dict[date_tag] = date
            elif comment_tag in line:
                comment_stage = True
            elif items_tag in line:
                items_stage = True
            elif checkin_tag in line:
                pass                        # not implemented due to example file not containing any data
            elif section_separator in line: # new section
                if first_section:
                    first_section = False
                    continue
                tmp = changeset_dict
                changesets.append(tmp)          
                changeset_dict = {}
                items = []
                # Set stages to false just in case
                items_stage = False
                comment_stage = False
            elif not line.strip():  # empty line
                if items_stage:
                    changeset_dict[items_tag] = items
                    items_stage = False
                comment_stage = False
            else:
                if comment_stage:
                    changeset_dict[comment_tag] = line.strip()  # Only works for one line comment  
                elif items_stage:
                    items.append(line.strip())

    #open and write to the output file
    with open(processed_ctl_file, 'w') as write_file:
        for changeset in changesets:        
            row = "{0}|{1}|{2}|{3}|".format(changeset[changeset_tag], changeset[user_tag], changeset[date_tag], changeset[comment_tag])
            distance = len(row)
            items = changeset[items_tag]
            join_string = "\n" + distance * " "
            items_part = str.join(join_string, items)
            row += items_part + "\n"
            write_file.write(row)