Python 从文本文件的每一行提取字符串，并将输出保存在csv行中_Python

Python 从文本文件的每一行提取字符串，并将输出保存在csv行中

python

Python 从文本文件的每一行提取字符串，并将输出保存在csv行中,python,Python,我试图从文本文件中提取以下数据srcintf、dstintf、srcaddr、dstaddr、操作、计划、服务、日志流量，并将这些值保存到具有正确行的csv文件中输入文件如下所示： edit 258 set srcintf "Untrust" set dstintf "Trust" set srcaddr "all" set dstaddr "10.2.22.1/32" set action accept set schedule "always

我试图从文本文件中提取以下数据

srcintf、dstintf、srcaddr、dstaddr、操作、计划、服务、日志流量

，并将这些值保存到具有正确行的

csv

文件中

输入文件如下所示：

edit 258
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "all"
    set dstaddr "10.2.22.1/32"
    set action accept
    set schedule "always"
    set service "selling_soft_01"
    set logtraffic all
next
edit 184
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "10.1.1.1/32"
    set schedule "always"
    set service "HTTPS"
    set logtraffic all
next
edit 124
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "172.16.77.1/32"
    set schedule "always"
    set service "ping"
    set logtraffic all
    set nat enable
next

text_file = open("fwpolicy.txt", "r")
lines = text_file.readlines()
mycsv = csv.writer(open('output.csv', 'w'))
mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                'service', 'logtraffic', 'nat'])
for line in lines:
    if "edit" in line:
        [srcintf, dstintf, srcaddr, dstaddr, schedule,
         service, logtraffic, nat] = ['not set']*8
    elif 'next' in line:
        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
    elif "set srcintf" in line:
         srcintf = line.split()[2]
    elif "set dstintf" in line:            
         dstintf = line.split()[2]
    elif "set srcaddr" in line:           
         srcaddr = line.split()[2]
    elif "set dstaddr" in line:
        dstaddr = line.split()[2]
    elif "set action" in line:            
        action = line.split()[2]
    elif "set schedule" in line:
        schedule = line.split()[2]
    elif "set service" in line:
        service = line.split()[2]
    elif "set logtraffic" in line:
        logtraffic = line.split()[2]
    elif "set nat" in line:
        nat = line.split()[2]

这是我第一次编程（你可以从我的代码中看到），但也许你可以了解更多关于我要做的事情。请参阅下面的代码

import csv

text_file = open("fwpolicy.txt", "r")

lines = text_file.readlines()

mycsv = csv.writer(open('output.csv', 'w'))

mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat'])

n = 0
for line in lines: 
    n = n + 1
n = 0
for line in lines: 
    n = n + 1
    if "set srcintf" in line:
            srcintf = line
    else    srcintf = 'not set'
    if "set dstintf" in line:            
        dstintf = line
    else    dstintf  = 'not set'
    if "set srcaddr" in line:           
        srcaddr = line
    else    srcaddr = 'not set'
    if "set dstaddr" in line:
            dstaddr = line
    else    dstaddr = 'not set'
    if "set action" in line:            
        action = line
    else    action = 'not set'
    if "set schedule" in line:
            schedule = line
    else    schedule = 'not set'
    if "set service" in line:
            service = line
    else    service = 'not set'
    if "set logtraffic" in line:
            logtraffic = line
    else    logtraffic = 'not set'
    if "set nat" in line:
            nat = line
    else    nat = 'not set'            

        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])

预期结果（CSV文件）：

实际结果：

Traceback (most recent call last):
  File "parse.py", line 45, in <module>
    mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
NameError: name 'srcintf' is not defined

回溯（最近一次呼叫最后一次）：
文件“parse.py”，第45行，在
mycsv.writerow（[srcintf，dstinf，srcaddr，dstaddr，schedule，service，logtraffic，nat]）
NameError:未定义名称“srcintf”

您正试图为文件中的每一行向csv写入一行。您应该只在看到单词

next

时才写入该行，因此在写入之前检查该行，以便为每行完整收集术语

当您到达这一步时，您会注意到您已将值设置为整行，而不是字符串后面所需的值。 e、 g.用绳子

 set srcintf "Untrust"

你的代码

 if "set srcintf" in line: srcintf = line
 else srcintf = 'not set'

将给

srcintf

值

set srcintf“Untrust”

。尝试拆分字符串以查找实际值

。。。大概是这样的：

edit 258
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "all"
    set dstaddr "10.2.22.1/32"
    set action accept
    set schedule "always"
    set service "selling_soft_01"
    set logtraffic all
next
edit 184
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "10.1.1.1/32"
    set schedule "always"
    set service "HTTPS"
    set logtraffic all
next
edit 124
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "172.16.77.1/32"
    set schedule "always"
    set service "ping"
    set logtraffic all
    set nat enable
next

text_file = open("fwpolicy.txt", "r")
lines = text_file.readlines()
mycsv = csv.writer(open('output.csv', 'w'))
mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                'service', 'logtraffic', 'nat'])
for line in lines:
    if "edit" in line:
        [srcintf, dstintf, srcaddr, dstaddr, schedule,
         service, logtraffic, nat] = ['not set']*8
    elif 'next' in line:
        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
    elif "set srcintf" in line:
         srcintf = line.split()[2]
    elif "set dstintf" in line:            
         dstintf = line.split()[2]
    elif "set srcaddr" in line:           
         srcaddr = line.split()[2]
    elif "set dstaddr" in line:
        dstaddr = line.split()[2]
    elif "set action" in line:            
        action = line.split()[2]
    elif "set schedule" in line:
        schedule = line.split()[2]
    elif "set service" in line:
        service = line.split()[2]
    elif "set logtraffic" in line:
        logtraffic = line.split()[2]
    elif "set nat" in line:
        nat = line.split()[2]

重要的是填充一行的所有值，并且只有在有值时才进行写入。

重复可以更简洁，但希望这有助于状态机的概念-查看您在文件中的位置，以决定是收集值、开始新批次还是写入一行。

您正在尝试为文件中的每一行向csv写入一行。您应该只在看到单词

next

时才写入该行，因此在写入之前检查该行，以便为每行完整收集术语

当您到达这一步时，您会注意到您已将值设置为整行，而不是字符串后面所需的值。 e、 g.用绳子

 set srcintf "Untrust"

你的代码

 if "set srcintf" in line: srcintf = line
 else srcintf = 'not set'

将给

srcintf

值

set srcintf“Untrust”

。尝试拆分字符串以查找实际值

。。。大概是这样的：

edit 258
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "all"
    set dstaddr "10.2.22.1/32"
    set action accept
    set schedule "always"
    set service "selling_soft_01"
    set logtraffic all
next
edit 184
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "10.1.1.1/32"
    set schedule "always"
    set service "HTTPS"
    set logtraffic all
next
edit 124
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "172.16.77.1/32"
    set schedule "always"
    set service "ping"
    set logtraffic all
    set nat enable
next

text_file = open("fwpolicy.txt", "r")
lines = text_file.readlines()
mycsv = csv.writer(open('output.csv', 'w'))
mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                'service', 'logtraffic', 'nat'])
for line in lines:
    if "edit" in line:
        [srcintf, dstintf, srcaddr, dstaddr, schedule,
         service, logtraffic, nat] = ['not set']*8
    elif 'next' in line:
        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
    elif "set srcintf" in line:
         srcintf = line.split()[2]
    elif "set dstintf" in line:            
         dstintf = line.split()[2]
    elif "set srcaddr" in line:           
         srcaddr = line.split()[2]
    elif "set dstaddr" in line:
        dstaddr = line.split()[2]
    elif "set action" in line:            
        action = line.split()[2]
    elif "set schedule" in line:
        schedule = line.split()[2]
    elif "set service" in line:
        service = line.split()[2]
    elif "set logtraffic" in line:
        logtraffic = line.split()[2]
    elif "set nat" in line:
        nat = line.split()[2]

重要的是填充一行的所有值，并且只有在有值时才进行写入。

重复可以更简洁，但希望这有助于状态机的概念-查看您在文件中的位置，以决定是收集值、开始新批次还是写入一行。

以下是我的方法：

import csv
text_file = open("structured_content.txt", "r")
lines = "\n".join(text_file.readlines())
fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']

defaults = {'srcintf' : "not set", 'dstintf': "not set", 'srcaddr': "not set", 
            'dstaddr': "not set", 'schedule': "not set", 'service': "not set", 
            'logtraffic': "not set", 'nat': "not set"}

mycsv = csv.DictWriter(open('output.csv', 'w'), fieldnames)
for block in lines.split("next"):
    csv_row = {}
    for p in [(s.strip()) for s in block.replace("\n", "").split("set")]:
        s = p.split()
        if len(s)==2:
            csv_row[s[0]]=s[1]  # n.b. this includes "action" and "edit" fields, which need stripping out
            csv_write_row = {}
            for k,v in csv_row.items():
                print ( "key=",k,"value=",v )
                if k in fieldnames: # a filter to only include fields in the "fieldnames" list
                    print ( k , " is in the list - attach its value to the output dictionary")
                    csv_write_row[k]=v
            for k,v in defaults.items(): 
                if k not in csv_write_row.keys(): # pad-out the output row with any default values not lifted from the file
                    print ( k , " is not in the list - write a default out")
                    csv_write_row[k]=v
    mycsv.writerow(csv_write_row)

我的目标是利用文件的结构，使用

split

命令将文本字符串分解为重复的块。将文件转换为csv只是将块（和嵌套块）与csv格式对齐的问题

csv.DictWriter

提供了一个有用的界面，可以逐行保存内容

如果要为不存在的值设置默认值，我会使用包含字段名键和默认（缺少）值的字典。然后，您可以使用这些默认值“清洗”准备好的csv_write_行，以防它们不存在

以下是我的做法：

import csv
text_file = open("structured_content.txt", "r")
lines = "\n".join(text_file.readlines())
fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']

defaults = {'srcintf' : "not set", 'dstintf': "not set", 'srcaddr': "not set", 
            'dstaddr': "not set", 'schedule': "not set", 'service': "not set", 
            'logtraffic': "not set", 'nat': "not set"}

mycsv = csv.DictWriter(open('output.csv', 'w'), fieldnames)
for block in lines.split("next"):
    csv_row = {}
    for p in [(s.strip()) for s in block.replace("\n", "").split("set")]:
        s = p.split()
        if len(s)==2:
            csv_row[s[0]]=s[1]  # n.b. this includes "action" and "edit" fields, which need stripping out
            csv_write_row = {}
            for k,v in csv_row.items():
                print ( "key=",k,"value=",v )
                if k in fieldnames: # a filter to only include fields in the "fieldnames" list
                    print ( k , " is in the list - attach its value to the output dictionary")
                    csv_write_row[k]=v
            for k,v in defaults.items(): 
                if k not in csv_write_row.keys(): # pad-out the output row with any default values not lifted from the file
                    print ( k , " is not in the list - write a default out")
                    csv_write_row[k]=v
    mycsv.writerow(csv_write_row)

我的目标是利用文件的结构，使用

split

命令将文本字符串分解为重复的块。将文件转换为csv只是将块（和嵌套块）与csv格式对齐的问题

csv.DictWriter

提供了一个有用的界面，可以逐行保存内容

以下是一种方法：

keys = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']
lines
records = []
for line in lines:

    found_key = [key for key in keys if key in line]

    if len(found_key) >0:
        value = line.strip().rstrip("\n\r").replace('"', '').split(" ")[2: ]
        record[found_key[0]] = value[0]

    if 'next' in line:
        records.append(record)
        record = dict()

pd.DataFrame(records).to_csv('output.csv', index=False)

以下是一种方法：

keys = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']
lines
records = []
for line in lines:

    found_key = [key for key in keys if key in line]

    if len(found_key) >0:
        value = line.strip().rstrip("\n\r").replace('"', '').split(" ")[2: ]
        record[found_key[0]] = value[0]

    if 'next' in line:
        records.append(record)
        record = dict()

pd.DataFrame(records).to_csv('output.csv', index=False)

下面是如何使用

DictWriter

with open("fwpolicy.txt", "r") as text_file, open('output.csv', 'w', newline='') as out_file:

    fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                  'service', 'logtraffic', 'nat']

    mycsv = csv.DictWriter(out, fieldnames=fieldnames, extrasaction='ignore',
                           quotechar=None, quoting=csv.QUOTE_NONE)
    mycsv.writeheader()

    row = {}
    for line in text_file:
        words = line.strip().split(maxsplit=2)
        if 'set' == words[0]:
            row[words[1]] = words[2]
        elif 'next' == words[0]:
            print(row)
            mycsv.writerow(row)
            row = {}

下面是如何使用

DictWriter

with open("fwpolicy.txt", "r") as text_file, open('output.csv', 'w', newline='') as out_file:

    fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                  'service', 'logtraffic', 'nat']

    mycsv = csv.DictWriter(out, fieldnames=fieldnames, extrasaction='ignore',
                           quotechar=None, quoting=csv.QUOTE_NONE)
    mycsv.writeheader()

    row = {}
    for line in text_file:
        words = line.strip().split(maxsplit=2)
        if 'set' == words[0]:
            row[words[1]] = words[2]
        elif 'next' == words[0]:
            print(row)
            mycsv.writerow(row)
            row = {}

请显示真实且缩进正确的代码。这甚至不包含else语句后面的冒号！（使用复制/粘贴和Ctrl-K设置代码格式…）请显示真实且缩进正确的代码。这甚至不包含else语句后面的冒号！（使用复制/粘贴和Ctrl-K设置代码格式…）谢谢。我会调查“分裂”。希望我能把它做好。我已经添加了一个类似于您的代码的版本，但是有了适当的拆分。上面托马斯的字典版本更简洁，谢谢。我会调查“分裂”。希望我能把它做好。我已经添加了一个类似于您的代码的版本，但是有了适当的拆分。上面托马斯的字典版本更简洁，谢谢你，托马斯！我的意思是，我可能理解代码中30%的内容，但它是有效的。现在我要学习你做了什么。。。再次感谢你。只是在这里做一个编辑来执行我所做的关于使用默认值字典进行默认值的编辑。优点是在将来，如果你想编辑这些，它只是一个dict编辑，而不是代码。还有，不客气！嗨，托马斯。我仍在理解你的代码，我有一个问题…希望你能帮助我。有没有办法知道csv_write_行中发生了什么并从中进行调试？我的意思是，我尝试更改“k，v”变量，然后再次运行脚本以查看发生了哪些更改。。但是我想知道是否有一种方法可以一步一步地进行实时调试。。有没有工具可以帮我做到这一点？（也许IDE可以帮我做到这一点？你使用过吗？）谢谢你和其他帮助我的人。当然，它本身不是IDE，但我在尝试时会在一个IDE中编写代码。我认为一些IDE会给你一些步骤和断点，但我倾向于插入司法上放置的

print

语句（参见编辑）来运行我的代码，然后检查打印的内容是否符合我的预期。每个k，v块