python中的文件处理_Python_File Io_Python Textprocessing

python中的文件处理

python file-io

python中的文件处理,python,file-io,python-textprocessing,Python,File Io,Python Textprocessing,我正在使用Python处理文本文件。我有一个文本文件（ctl_Files.txt），其中包含以下内容/或类似内容： ------------------------ Changeset: 143 User: Sarfaraz Date: Tuesday, April 05, 2011 5:34:54 PM Comment: Initial add, all objects. Items: add $/Systems/DB/Expences/Loader add $/System

我正在使用Python处理文本文件。我有一个文本文件（ctl_Files.txt），其中包含以下内容/或类似内容：

------------------------
Changeset: 143
User: Sarfaraz
Date: Tuesday, April 05, 2011 5:34:54 PM

Comment:
  Initial add, all objects.

Items:
  add $/Systems/DB/Expences/Loader
  add $/Systems/DB/Expences/Loader/AAA.txt
  add $/Systems/DB/Expences/Loader/BBB.txt
  add $/Systems/DB/Expences/Loader/CCC.txt  

Check-in Notes:
  Code Reviewer:
  Performance Reviewer:
  Reviewer:
  Security Reviewer:
------------------------
Changeset: 145
User: Sarfaraz
Date: Thursday, April 07, 2011 5:34:54 PM

Comment:
  edited objects.

Items:
  edit $/Systems/DB/Expences/Loader
  edit $/Systems/DB/Expences/Loader/AAA.txt
  edit $/Systems/DB/Expences/Loader/AAB.txt  

Check-in Notes:
  Code Reviewer:
  Performance Reviewer:
  Reviewer:
  Security Reviewer:
------------------------
Changeset: 147
User: Sarfaraz
Date: Wednesday, April 06, 2011 5:34:54 PM

Comment:
  Initial add, all objects.

Items:
  delete, source rename $/Systems/DB/Expences/Loader/AAA.txt;X34892
  rename                $/Systems/DB/Expences/Loader/AAC.txt.

Check-in Notes:
  Code Reviewer:
  Performance Reviewer:
  Reviewer:
  Security Reviewer:
------------------------

为了处理此文件，我编写了以下代码：

#Tags - used for spliting the information

tag1 = 'Changeset:'
tag2 = 'User:'
tag3 = 'Date:'
tag4 = 'Comment:'
tag5 = 'Items:'
tag6 = 'Check-in Notes:'

#opening and reading the input file
#In path to input file use '\' as escape character
with open ("C:\\Users\\md_sarfaraz\\Desktop\\ctl_Files.txt", "r") as myfile:
    val=myfile.read().replace('\n', ' ')

#counting the occurence of any one of the above tag
#As count will be same for all the tags
occurence = val.count(tag1)

#initializing row variable
row=""

#passing the count - occurence to the loop
for count in  range(1, occurence+1):
   row += ( (val.split(tag1)[count].split(tag2)[0]).strip() + '|' \
    + (val.split(tag2)[count].split(tag3)[0]).strip() + '|' \
    + (val.split(tag3)[count].split(tag4)[0]).strip() + '|' \
    + (val.split(tag4)[count].split(tag5)[0]).strip() + '|' \
    + (val.split(tag5)[count].split(tag6)[0]).strip() + '\n')

#opening and writing the output file
#In path to output file use '\' as escape character
file = open("C:\\Users\\md_sarfaraz\\Desktop\\processed_ctl_Files.txt", "w+") 
file.write(row)
file.close()

并得到以下结果/文件（processed_ctl_Files.txt）：

但是，我想要这样的结果：

143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader   
                                                                          add $/Systems/DB/Expences/Loader/AAA.txt   
                                                                          add $/Systems/DB/Expences/Loader/BBB.txt   
                                                                          add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader   
                                                                 edit $/Systems/DB/Expences/Loader/AAA.txt   
                                                                 edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892   
                                                                            rename                $/Systems/DB/Rascal/Expences/AAC.txt.

或者，如果我们能得到这样的结果，那就太好了：

143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader   
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/AAA.txt   
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/BBB.txt   
143|Sarfaraz|Tuesday, April 05, 2011 5:34:54 PM|Initial add, all objects.|add $/Systems/DB/Expences/Loader/CCC.txt
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader   
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader/AAA.txt   
145|Sarfaraz|Thursday, April 07, 2011 5:34:54 PM|edited objects.|edit $/Systems/DB/Expences/Loader/AAB.txt
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|delete, source rename $/Systems/DB/Rascal/Expences/AAA.txt;X34892   
147|Sarfaraz|Wednesday, April 06, 2011 5:34:54 PM|Initial add, all objects.|rename                $/Systems/DB/Rascal/Expences/AAC.txt.

让我知道我该怎么做。另外，我对Python非常陌生，所以如果我编写了一些糟糕或冗余的代码，请忽略。并帮助我改进这一点。
我将从将值提取到变量开始。然后从前几个标记创建前缀。您可以计算前缀中的字符数，并将其用作填充。当您访问项目时，可以将第一个项目附加到前缀中，并且可以将任何其他项目附加到根据所需空间数创建的填充中

# keywords used in the tag "Items: " keywords = ['add', 'delete', 'edit', 'source', 'rename'] #passing the count - occurence to the loop for cs in val.split(tag1)[1:]: changeset = cs.split(tag2)[0].strip() user = cs.split(tag2)[1].split(tag3)[0].strip() date = cs.split(tag3)[1].split(tag4)[0].strip() comment = cs.split(tag4)[1].split(tag5)[0].strip() items = cs.split(tag5)[1].split(tag6)[0].strip().split() notes = cs.split(tag6) prefix = '{0}|{1}|{2}|{3}'.format(changeset, user, date, comment) space_count = len(prefix) i = 0 while i < len(items): # if we are printing the first item, add it to the other text if i == 0: pref = prefix # otherwise create padding from spaces else: pref = ' '*space_count # add all keywords words = '' for j in range(i, len(items)): if items[j] in keywords: words += ' ' + items[j] else: break if i >= len(items): break row += '{0}|{1} {2}\n'.format(pref, words, items[j]) i += j - i + 1 # increase by the number of keywords + the param

#标记“项目”中使用的关键字：关键词=[“添加”、“删除”、“编辑”、“源”、“重命名”] #将计数-发生传递给循环对于val.split（tag1）中的cs[1:]： changeset=cs.split（tag2）[0].strip（） user=cs.split（tag2）[1]。split（tag3）[0]。strip（） date=cs.split（tag3）[1]。split（tag4）[0]。strip（） comment=cs.split（tag4）[1]。split（tag5）[0]。strip（） items=cs.split（tag5）[1]。split（tag6）[0]。strip（）.split（）注释=cs.拆分（tag6）前缀=“{0}{1}{2}{3}”。格式（变更集、用户、日期、注释）空间计数=len（前缀） i=0 而i=len（项目）：中断行+='{0}|{1}{2}\n'。格式（pref，words，items[j]） i+=j-i+1#增加关键字数+参数

这似乎是你想要的，但我不确定这是否是最好的解决方案。也许逐行处理文件并将值直接打印到流中更好？
您可以使用正则表达式搜索“添加”、“编辑”等

import re #Tags - used for spliting the information tag1 = 'Changeset:' tag2 = 'User:' tag3 = 'Date:' tag4 = 'Comment:' tag5 = 'Items:' tag6 = 'Check-in Notes:' #opening and reading the input file #In path to input file use '\' as escape character with open ("wibble.txt", "r") as myfile: val=myfile.read().replace('\n', ' ') #counting the occurence of any one of the above tag #As count will be same for all the tags occurence = val.count(tag1) #initializing row variable row="" prevlen = 0 #passing the count - occurence to the loop for count in range(1, occurence+1): row += ( (val.split(tag1)[count].split(tag2)[0]).strip() + '|' \ + (val.split(tag2)[count].split(tag3)[0]).strip() + '|' \ + (val.split(tag3)[count].split(tag4)[0]).strip() + '|' \ + (val.split(tag4)[count].split(tag5)[0]).strip() + '|' ) distance = len(row) - prevlen row += re.sub("\s\s+([edit]|[add]|[delete]|[rename])", r"\n"+r" "*distance+r"\1", (val.split(tag5)[count].split(tag6)[0])) + '\r' prevlen = len(row) #opening and writing the output file #In path to output file use '\' as escape character file = open("wobble.txt", "w+") file.write(row) file.close()

这个解决方案没有使用正则表达式的答案那么简短，也可能没有它有效，但它应该很容易理解。该解决方案确实使使用解析数据变得更容易，因为每个部分的数据都存储在字典中

ctl_file = "ctl_Files.txt" # path of source file processed_ctl_file = "processed_ctl_Files.txt" # path of destination file #Tags - used for spliting the information changeset_tag = 'Changeset:' user_tag = 'User:' date_tag = 'Date:' comment_tag = 'Comment:' items_tag = 'Items:' checkin_tag = 'Check-in Notes:' section_separator = "------------------------" changesets = [] #open and read the input file with open(ctl_file, 'r') as read_file: first_section = True changeset_dict = {} items = [] comment_stage = False items_stage = False checkin_dict = {} # Read one line at a time for line in read_file: # Check which tag matches the current line and store the data to matching key in the dictionary if changeset_tag in line: changeset = line.split(":")[1].strip() changeset_dict[changeset_tag] = changeset elif user_tag in line: user = line.split(":")[1].strip() changeset_dict[user_tag] = user elif date_tag in line: date = line.split(":")[1].strip() changeset_dict[date_tag] = date elif comment_tag in line: comment_stage = True elif items_tag in line: items_stage = True elif checkin_tag in line: pass # not implemented due to example file not containing any data elif section_separator in line: # new section if first_section: first_section = False continue tmp = changeset_dict changesets.append(tmp) changeset_dict = {} items = [] # Set stages to false just in case items_stage = False comment_stage = False elif not line.strip(): # empty line if items_stage: changeset_dict[items_tag] = items items_stage = False comment_stage = False else: if comment_stage: changeset_dict[comment_tag] = line.strip() # Only works for one line comment elif items_stage: items.append(line.strip()) #open and write to the output file with open(processed_ctl_file, 'w') as write_file: for changeset in changesets: row = "{0}|{1}|{2}|{3}|".format(changeset[changeset_tag], changeset[user_tag], changeset[date_tag], changeset[comment_tag]) distance = len(row) items = changeset[items_tag] join_string = "\n" + distance * " " items_part = str.join(join_string, items) row += items_part + "\n" write_file.write(row)

另外，尝试使用描述其内容的变量名。像tag1、tag2等名称并不能说明变量内容的多少。这使得代码很难阅读，尤其是当脚本变长时。在大多数情况下，可读性似乎并不重要，但当重新访问旧代码时，理解代码对非描述变量的作用需要更长的时间。
您删除了所有的换行符，即使您希望它们出现在输出中？不要这样做，以一种不需要的方式解析输入。
ctl_file = "ctl_Files.txt" # path of source file processed_ctl_file = "processed_ctl_Files.txt" # path of destination file #Tags - used for spliting the information changeset_tag = 'Changeset:' user_tag = 'User:' date_tag = 'Date:' comment_tag = 'Comment:' items_tag = 'Items:' checkin_tag = 'Check-in Notes:' section_separator = "------------------------" changesets = [] #open and read the input file with open(ctl_file, 'r') as read_file: first_section = True changeset_dict = {} items = [] comment_stage = False items_stage = False checkin_dict = {} # Read one line at a time for line in read_file: # Check which tag matches the current line and store the data to matching key in the dictionary if changeset_tag in line: changeset = line.split(":")[1].strip() changeset_dict[changeset_tag] = changeset elif user_tag in line: user = line.split(":")[1].strip() changeset_dict[user_tag] = user elif date_tag in line: date = line.split(":")[1].strip() changeset_dict[date_tag] = date elif comment_tag in line: comment_stage = True elif items_tag in line: items_stage = True elif checkin_tag in line: pass # not implemented due to example file not containing any data elif section_separator in line: # new section if first_section: first_section = False continue tmp = changeset_dict changesets.append(tmp) changeset_dict = {} items = [] # Set stages to false just in case items_stage = False comment_stage = False elif not line.strip(): # empty line if items_stage: changeset_dict[items_tag] = items items_stage = False comment_stage = False else: if comment_stage: changeset_dict[comment_tag] = line.strip() # Only works for one line comment elif items_stage: items.append(line.strip()) #open and write to the output file with open(processed_ctl_file, 'w') as write_file: for changeset in changesets: row = "{0}|{1}|{2}|{3}|".format(changeset[changeset_tag], changeset[user_tag], changeset[date_tag], changeset[comment_tag]) distance = len(row) items = changeset[items_tag] join_string = "\n" + distance * " " items_part = str.join(join_string, items) row += items_part + "\n" write_file.write(row)