python从文件中读取行块

python从文件中读取行块,python,text-processing,Python,Text Processing,我有一个脚本,从中获取输出(我还将此输出保存到f1=20141202.194812_carStatus/中的文件中): 我使用了:exec\u cmd('cat'+f1+'| grep-e“CarModel=“-e”Owner\u Info.User\u ref=“”) 但我还需要块的第一行(实际上是第二行) TM 05157414.06: Processing... 我尝试/需要做的是,解析并获取每个块的变量值: TM 05120970.01 -> car_number = 05120

我有一个脚本,从中获取输出(我还将此输出保存到f1=20141202.194812_carStatus/中的文件中):

我使用了:
exec\u cmd('cat'+f1+'| grep-e“CarModel=“-e”Owner\u Info.User\u ref=“”)
但我还需要块的第一行(实际上是第二行)

TM 05157414.06: Processing...
我尝试/需要做的是,解析并获取每个块的变量值:

TM 05120970.01 -> car_number = 05120970.01

Owner_Info.User_ref = crossi14 -> owner_user = crossi14

CarModel = Nissan Micra -> car_model = Nissan Micra
有了这些信息,我将添加一些默认内容,如:

priority = Unknown
我需要将这些变量作为另一个名为insert_owner_car.pl的脚本的输入

 insert_owner_car.pl -id 05120970.01 -o owner_user="crossi14",car_model="Nissan Micra",priority="Unknown"
到目前为止,这是我设法做到的,但它不可用,因为我无法获得提到的值

#!/usr/bin/python

import itertools, commands, datetime, os, re, sys, time

inFile = open("/tmp/20141202.194812_carStatus")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
    if line.startswith("----------------------------------------------"):
        keepCurrentSet = False
    if keepCurrentSet:
        parts = line.split(" = ")[1:]
        part=','.join(parts)
        print part
#outFile.write(parts)   
    if line.startswith("----------------------------------------------"):
        keepCurrentSet = True
inFile.close()
outFile.close()
我不知道怎么才能拿到:05120970.01 以及如何让一个块中的所有变量能够将它们用作另一个脚本的输入

PS:我有python 2.5.1

您可以使用来分块处理文件:

import re
import subprocess

def open_chunk(readfunc, delimiter, chunksize=1024):
    """
    readfunc(chunksize) should return a string.
    """
    remainder = ''
    for chunk in iter(lambda: readfunc(chunksize), ''):
        pieces = re.split(delimiter, remainder + chunk)
        for piece in pieces[:-1]:
            yield piece
        remainder = pieces[-1]
    if remainder:
        yield remainder

f = open(filename, 'r')
for chunk in open_chunk(f.read, delimiter=r'-{45,}'):
    chunk = chunk.strip()
    if chunk:
        lines = chunk.splitlines()
        firstline = lines[0]
        car_number = firstline.split()[1][:-1]
        for line in lines[1:]:
            if 'Owner_Info.User_ref = ' in line:
                owner_user = line.split(" = ")[1]
            elif 'CarModel = ' in line:
                car_model =  line.split(" = ")[1]
        cmd = ['insert_owner_car.pl'
               , '-id'
               , car_number
               , '-o'
               , 'owner_user="%s"' % (owner_user, )
               , 'car_model="%s"' % (car_model, )
               , 'priority="Unknown"']
        print(' '.join(cmd))
        # subprocess.call(cmd)
f.close()
印刷品

insert_owner_car.pl -id 05120970.01 -o owner_user="crossi14" car_model="Nissan Micra" priority="Unknown"
insert_owner_car.pl -id 05157414.06 -o owner_user="yumiao12" car_model="Toyota Avensis" priority="Unknown"

如果数据文件很小,则可以将整个文件拖成一个字符串,然后使用
re.split
将其拆分为多个块:

In [37]: import re

In [38]: re.split(r'-{45,}', open('data').read())
Out[38]: 
['\n\n',
 '\nTM 05120970.01: Processing...\nTM 05120970: Processing...\nTM 05120970: current status Open\nTM 05120970: Owner_Info.User_ref = crossi14\nTM 05120970: Owner_Info.Email = Criss.Rossi@gmail.com\nTM 05120970: CarModel = Nissan Micra\n',
 '\nTM 05157414.06: Processing...\nTM 05157414: Processing...\nTM 05157414: current status Open\nTM 05157414: Owner_Info.User_ref = yumiao12\nTM 05157414: Owner_Info.Email = Yu.Miao@gmail.com\nTM 05157414: CarModel = Toyota Avensis\n',
 '\n']

这可以用来代替上面的
open\u chunk
。使用
open\u chunk
的优点是,当将整个文件拖入一个字符串并将其拆分为一个列表时,它可以用于非常大的文件。

您应该使用
re
模块提取相关信息:它是标准的、简单的和健壮的。 您还可以在块限制上显示块信息,并在文件末尾添加一个catch all

脚本将是:

import re

rnum = re.compile('\s*TM\s+([^\s:]+):.*')
ruser = re.compile('.*Owner_Info.User_ref\s*=\s*(.*)')
rmodel = re.compile('.*CarModel\s*=\s*(.*)')


def display(out, num, user, model):
    print(num, user, model)
    out.write('insert_owner_car.pl -id %s -o owner_user="%s",car_model="%s",priority="Unknown"\n' % (num, user, model))

inFile = open("/tmp/20141202.194812_carStatus")
outFile = open("result.txt", "w")
firstOfBlock = False
carnum = None
for line in inFile:
    if line.startswith("--------------------------------"):
        firstOfBlock = True
        if carnum is not None:
            display(outFile, carnum, user, model)
            carnum = None
    else:
        if firstOfBlock:
            m = rnum.match(line)
            if m is not None:
                carnum = m.group(1)
                firstOfBlock = False
        else:
            line = line.strip()
            m = ruser.match(line)
            if m is not None:
                user = m.group(1)
            else:
                m = rmodel.match(line)
                if m is not None:
                    model = m.group(1)

if carnum is not None:
    display(outFile, carnum, user, model)
    carnum = None

inFile.close()
outFile.close()
在当前示例中,输出是

05120970.01 crossi14日产迈卡
05157414.06裕妙12丰田安万斯
result.txt是:

insert_owner_car.pl-id 05120970.01-o owner_user=“crossi14”,car_model=“Nissan Micra”,priority=“未知”
插入\u owner\u car.pl-id 05157414.06-o owner\u user=“yumiao12”,car\u model=“Toyota Avensis”,priority=“未知”

看起来不错,但我无法使用打开的东西。。。文件“/test8.py”,第20行,打开('data_File/tmp/20141202.194812_carStatus')作为f:^SyntaxError:invalid syntaxe噢,我忘了您正在使用Python2.5。要将
与语句一起使用
,需要添加到代码顶部。这里有一个。是的,它仍然不起作用,我做了:从future import with_语句import re import subprocess import Six请编辑您的问题以包含完整的回溯错误消息。相同的错误,表示future stuff没有做它应该做的事情。$./test10.py文件“/test10.py”,第21行带有open(“/tmp/20141202.194812_carStatus”,r')作为f:^SyntaxError:invalid syntax$./test11.py回溯(最后一次调用):文件“/test11.py”,第25行,在?carnum=m.group(1)AttributeError:'NoneType'对象没有属性'group'@KayNix:很抱歉,我做了一个错误的复制和粘贴。修正谢谢,它起作用了:D现在我想用录音机,把汽车模型分为豪华型、紧凑型、小型……但如果我做不到,我会回来的:D
import re

rnum = re.compile('\s*TM\s+([^\s:]+):.*')
ruser = re.compile('.*Owner_Info.User_ref\s*=\s*(.*)')
rmodel = re.compile('.*CarModel\s*=\s*(.*)')


def display(out, num, user, model):
    print(num, user, model)
    out.write('insert_owner_car.pl -id %s -o owner_user="%s",car_model="%s",priority="Unknown"\n' % (num, user, model))

inFile = open("/tmp/20141202.194812_carStatus")
outFile = open("result.txt", "w")
firstOfBlock = False
carnum = None
for line in inFile:
    if line.startswith("--------------------------------"):
        firstOfBlock = True
        if carnum is not None:
            display(outFile, carnum, user, model)
            carnum = None
    else:
        if firstOfBlock:
            m = rnum.match(line)
            if m is not None:
                carnum = m.group(1)
                firstOfBlock = False
        else:
            line = line.strip()
            m = ruser.match(line)
            if m is not None:
                user = m.group(1)
            else:
                m = rmodel.match(line)
                if m is not None:
                    model = m.group(1)

if carnum is not None:
    display(outFile, carnum, user, model)
    carnum = None

inFile.close()
outFile.close()