使用python从txt文件中拾取零件并复制到另一个文件_Python_File

使用python从txt文件中拾取零件并复制到另一个文件

python file

使用python从txt文件中拾取零件并复制到另一个文件,python,file,Python,File,我有麻烦了。我需要读一个文件。Txt文件，其中包含一系列的记录，检查记录，我想把它们复制到一个新的文件。文件内容如下这只是一个示例，原始文件有30000多行： AAAAA|12|120 #begin file 00000|46|150 #begin register 03000|TO|460 99999|35|436 #end register 00000|46|316 #begin register 03000|SP|467 99999|33|130 #end register 00000

我有麻烦了。我需要读一个文件。Txt文件，其中包含一系列的记录，检查记录，我想把它们复制到一个新的文件。文件内容如下这只是一个示例，原始文件有30000多行：

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|316 #begin register
03000|SP|467
99999|33|130 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

必须将以03000开头且字符为“TO”的记录写入新文件。根据该示例，该文件应如下所示：

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

代码：

我做了多次搜索，没有找到任何帮助。谢谢大家!

file = open("file.txt",'r')
newFile = open("newFile.txt","w")    
content = file.readlines()
file.close()
newFile.writelines(filter(lambda x:x.startswith("03000") and "TO" in x,content))

如果需要上一行和下一行，则必须在空间坐标系中迭代内嵌。首先定义：

def gen_triad(lines, prev=None):
    after = current = next(lines)
    for after in lines:
        yield prev, current, after
        prev, current = current, after

然后像以前一样：

outFile.writelines(''.join(triad) for triad in gen_triad(inFile) 
                   if triad[1].startswith("03000") and "TO" in triad[1])

如果需要上一行和下一行，则必须在空间坐标系中迭代内嵌。首先定义：

def gen_triad(lines, prev=None):
    after = current = next(lines)
    for after in lines:
        yield prev, current, after
        prev, current = current, after

然后像以前一样：

outFile.writelines(''.join(triad) for triad in gen_triad(inFile) 
                   if triad[1].startswith("03000") and "TO" in triad[1])

对于那些喜欢更紧凑的代表性的人：

import re

with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
    destination_file.writelines(this_line for this_line in source_file 
                                if re.match("^03000.*TO", this_line))

对于那些喜欢更紧凑的代表性的人：

import re

with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
    destination_file.writelines(this_line for this_line in source_file 
                                if re.match("^03000.*TO", this_line))

这似乎奏效了。其他答案似乎只是写出包含“03000 |到|”的记录，但你也必须在这之前和之后写出记录

    import sys
# ---------------------------------------------------------------
# ---------------------------------------------------------------
# import file
file_name = sys.argv[1]
file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name
file = open(file_path,"r")
# ---------------------------------------------------------------
# create output files
output_file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name + '.out'
output_file = open(output_file_path,"w")
# create output files

# ---------------------------------------------------------------
# process file

temp = ''
temp_out = ''
good_write = False
bad_write = False
for line in file:
    if line[:5] == 'AAAAA':
        temp_out += line 
    elif line[:5] == 'ZZZZZ':
        temp_out += line
    elif good_write:
        temp += line
        temp_out += temp
        temp = ''
        good_write = False
    elif bad_write:
        bad_write = False
        temp = ''
    elif line[:5] == '03000':
        if line[6:8] != 'TO':
            temp = ''
            bad_write = True
        else:
            good_write = True
            temp += line
            temp_out += temp 
            temp = ''
    else:
        temp += line

output_file.write(temp_out)
output_file.close()
file.close()

输出：

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

这似乎奏效了。其他答案似乎只是写出包含“03000 |到|”的记录，但你也必须在这之前和之后写出记录

    import sys
# ---------------------------------------------------------------
# ---------------------------------------------------------------
# import file
file_name = sys.argv[1]
file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name
file = open(file_path,"r")
# ---------------------------------------------------------------
# create output files
output_file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name + '.out'
output_file = open(output_file_path,"w")
# create output files

# ---------------------------------------------------------------
# process file

temp = ''
temp_out = ''
good_write = False
bad_write = False
for line in file:
    if line[:5] == 'AAAAA':
        temp_out += line 
    elif line[:5] == 'ZZZZZ':
        temp_out += line
    elif good_write:
        temp += line
        temp_out += temp
        temp = ''
        good_write = False
    elif bad_write:
        bad_write = False
        temp = ''
    elif line[:5] == '03000':
        if line[6:8] != 'TO':
            temp = ''
            bad_write = True
        else:
            good_write = True
            temp += line
            temp_out += temp 
            temp = ''
    else:
        temp += line

output_file.write(temp_out)
output_file.close()
file.close()

输出：

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

代码：

我知道这个解决方案可能有点长。但这很容易理解。这似乎是一种直观的方法。我已经用你们提供的数据检查过了，它工作得很好

如果你需要更多关于代码的解释，请告诉我。我肯定会添加相同的代码。

代码：

我知道这个解决方案可能有点长。但这很容易理解。这似乎是一种直观的方法。我已经用你们提供的数据检查过了，它工作得很好

如果你需要更多关于代码的解释，请告诉我。我肯定会添加相同的内容。

我提示Beasley和Joran elyase非常有趣，但它只允许获取03000行的内容。我想获取第00000行到第99999行的内容。我甚至设法做到了，但我不满意，我想做一个更干净的。看看我是怎么做的：

    file = open(url,'r')
    newFile = open("newFile.txt",'w')
    lines = file.readlines()        
    file.close()
    i = 0
    lineTemp = []
    for line in lines:                     
        lineTemp.append(line)                       
        if line[0:5] == '03000':
            state = line[21:23]                                
        if line[0:5] == '99999':
            if state == 'TO':
                newFile.writelines(lineTemp)                    
            else:
                linhaTemp = []                                                                            
        i = i+1                      
    newFile.close()

建议。。。

谢谢大家

我觉得Beasley和Joran elyase非常有趣，但它只允许获取03000行的内容。我想获取第00000行到第99999行的内容。

import re

pat = ('^00000\|\d+\|\d+.*\n'
       '^03000\|TO\|\d+.*\n'
       '^99999\|\d+\|\d+.*\n'
       '|'
       '^AAAAA\|\d+\|\d+.*\n'
       '|'
       '^ZZZZZ\|\d+\|\d+.*')
rag = re.compile(pat,re.MULTILINE)

with open('fifi.txt','r') as f,\
     open('newfifi.txt','w') as g:
    g.write(''.join(rag.findall(f.read())))

我甚至设法做到了，但我不满意，我想做一个更干净的。看看我是怎么做的：

    file = open(url,'r')
    newFile = open("newFile.txt",'w')
    lines = file.readlines()        
    file.close()
    i = 0
    lineTemp = []
    for line in lines:                     
        lineTemp.append(line)                       
        if line[0:5] == '03000':
            state = line[21:23]                                
        if line[0:5] == '99999':
            if state == 'TO':
                newFile.writelines(lineTemp)                    
            else:
                linhaTemp = []                                                                            
        i = i+1                      
    newFile.close()

建议。。。谢谢大家

import re

pat = ('^00000\|\d+\|\d+.*\n'
       '^03000\|TO\|\d+.*\n'
       '^99999\|\d+\|\d+.*\n'
       '|'
       '^AAAAA\|\d+\|\d+.*\n'
       '|'
       '^ZZZZZ\|\d+\|\d+.*')
rag = re.compile(pat,re.MULTILINE)

with open('fifi.txt','r') as f,\
     open('newfifi.txt','w') as g:
    g.write(''.join(rag.findall(f.read())))

对于在以00000、03000和99999开头的行之间有额外行的文件，我没有找到比这更简单的代码：

import re

pat = ('(^00000\|\d+\|\d+.*\n'
       '(?:.*\n)+?'
       '^99999\|\d+\|\d+.*\n)'
       '|'
       '(^AAAAA\|\d+\|\d+.*\n'
       '|'
       '^ZZZZZ\|\d+\|\d+.*)')
rag = re.compile(pat,re.MULTILINE)

pit = ('^00000\|.+?^03000\|TO\|\d+.+?^99999\|')
rig = re.compile(pit,re.DOTALL|re.MULTILINE)

def yi(text):
    for g1,g2 in rag.findall(text):
        if g2:
            yield g2
        elif rig.match(g1):
            yield g1

with open('fifi.txt','r') as f,\
     open('newfifi.txt','w') as g:
    g.write(''.join(yi(f.read())))

对于在以00000、03000和99999开头的行之间有额外行的文件，我没有找到比这更简单的代码：

import re

pat = ('(^00000\|\d+\|\d+.*\n'
       '(?:.*\n)+?'
       '^99999\|\d+\|\d+.*\n)'
       '|'
       '(^AAAAA\|\d+\|\d+.*\n'
       '|'
       '^ZZZZZ\|\d+\|\d+.*)')
rag = re.compile(pat,re.MULTILINE)

pit = ('^00000\|.+?^03000\|TO\|\d+.+?^99999\|')
rig = re.compile(pit,re.DOTALL|re.MULTILINE)

def yi(text):
    for g1,g2 in rag.findall(text):
        if g2:
            yield g2
        elif rig.match(g1):
            yield g1

with open('fifi.txt','r') as f,\
     open('newfifi.txt','w') as g:
    g.write(''.join(yi(f.read())))

它必须是python吗？这些shell命令在紧要关头也会做同样的事情

head -1 inputfile.txt > outputfile.txt
grep -C 1 "03000|TO" inputfile.txt >> outputfile.txt
tail -1 inputfile.txt >> outputfile.txt

它必须是python吗？这些shell命令在紧要关头也会做同样的事情

head -1 inputfile.txt > outputfile.txt
grep -C 1 "03000|TO" inputfile.txt >> outputfile.txt
tail -1 inputfile.txt >> outputfile.txt

注意，他想写第一行和最后一行。以及每场比赛的前一行和下一行。。在回答之前仔细阅读。嗯，他说以03000开头并带有“TO”字符的记录必须写入一个新文件。如果不是他想要的，他应该更清楚地知道他想要什么……我希望你在回答之前把整个问题都回答清楚。你看不到他期望的输出吗？注意，他想写第一行和最后一行。以及每场比赛的前一行和下一行。。在回答之前仔细阅读。嗯，他说以03000开头并带有“TO”字符的记录必须写入一个新文件。如果不是他想要的，他应该更清楚地知道他想要什么……我希望你在回答之前把整个问题都回答清楚。你看不到他期望的输出吗？注意，他想写第一行和最后一行。以及每场比赛的前一行和下一行。。请注意，他想写第一行和最后一行。以及每场比赛的前一行和下一行。。回答前请仔细阅读。如果您使用readlines，则一次只能拾取一行。如果你使用read，它会得到所有的东西，或者最多X字节非常好的Joran Beasley，但我不会只选择一行的内容，而是一组行作为示例。我试图更改您发布的代码，但我做不到，因为它还没有使用lambda函数。非常感谢。请参阅我在前面发表的评论：如果使用readlines，一次只能拾取一行。如果你使用read，它会得到所有的东西，或者最多X字节非常好的Joran Beasley，但我不会只选择一行的内容，而是一组行作为示例。我试图更改您发布的代码，但我做不到，因为它还没有使用lambda函数。感谢

你请参阅我在您之前发表的评论：如果需要，我可以修改我的答案，以包括上一行和下一行。我看到你的代码中没有包含它们，你还需要以“99999”开头的行吗？@elyase不要只是说你可以修改它。只要问问题的人需要就行！嗯，我实际上修改了它，我只是不确定他想要什么，因为他的代码与他的示例相矛盾。如果需要的话，我可以修改我的答案以包括上一行和下一行。我看到你的代码中没有包含它们，你还需要以“99999”开头的行吗？@elyase不要只是说你可以修改它。只要问问题的人需要就行！嗯，我实际上修改了它，我只是不确定他想要什么，因为他的代码与他的示例相矛盾。我不认为错误的答案会得到那么多的赞成票。。。。当我的答案是正确的，并没有得到一个上升的投票！！！事实上，我投了反对票。运行论坛的好方法！！！sarcastic@IcyFlame，我不知道你为什么会被否决顺便说一句，我也收到了3张否决票，但你为什么认为我的答案是错误的？我不认为错误的答案会得到同样多的赞成票。。。。当我的答案是正确的，并没有得到一个上升的投票！！！事实上，我投了反对票。运行论坛的好方法！！！sarcastic@IcyFlame顺便说一句，我也收到了3张反对票，但是为什么你认为我的答案是错的？你认为我的解决方案怎么样？你认为我的解决方案怎么样？