如何在同一目录中的多个文件上运行此Python 2.7脚本_Python

如何在同一目录中的多个文件上运行此Python 2.7脚本

python

如何在同一目录中的多个文件上运行此Python 2.7脚本,python,Python,此脚本当前从文件中获取特定类型的IP地址，并将其格式化为csv 我如何更改它以使它查看其目录中的所有文件（与脚本的目录相同），并创建一个新的输出文件。这是我第一周学习python，所以请尽量简单 #!usr/bin/python # Extract IP address from file #import modules import re # Open Source File infile = open('stix1.xml', 'r')

此脚本当前从文件中获取特定类型的IP地址，并将其格式化为csv

我如何更改它以使它查看其目录中的所有文件（与脚本的目录相同），并创建一个新的输出文件。这是我第一周学习python，所以请尽量简单

  #!usr/bin/python

    # Extract IP address from file 

    #import modules
    import re

    # Open Source File
    infile = open('stix1.xml', 'r')
    # Open output file
    outfile = open('ExtractedIPs.csv', 'w') 
    # Create a list
    BadIPs = []

    #search each line in doc
    for line in infile:
        # ignore empty lines
        if line.isspace(): continue

        # find IP that are Indicator Titles
        IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
        # Only take finds
        if not IP: continue
        # Add each found IP to the BadIP list
        BadIPs.append(IP)

    #tidy up for CSV format
    data = str(BadIPs)
    data = data.replace('[', '')
    data = data.replace(']', '')
    data = data.replace("'", "")
    # Write IPs to a file        
    outfile.write(data)

    infile.close
    outfile.close

#！usr/bin/python
#从文件中提取IP地址
#导入模块
进口稀土
#开源文件
infle=open（'stix1.xml'，'r'）
#打开输出文件
outfile=open（'ExtractedIPs.csv'，'w'）
#创建一个列表
BadIPs=[]
#搜索文档中的每一行
对于填充中的线：
#忽略空行
if line.isspace（）：继续
#查找属于指示器标题的IP
IP=（re.findall（r“（？：IP:）（\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}）”，第行）
#只拿东西
如果不是IP：继续
#将找到的每个IP添加到BadIP列表中
BadIPs.append（IP）
#整理CSV格式
数据=str（BadIPs）
数据=数据。替换（'['，''）
数据=数据。替换（']'，''）
数据=数据。替换（“”，“”）
#将IP写入文件
outfile.write（数据）
结束
关闭

导入系统

用当前代码生成函数，例如def extract（filename）

使用所有文件名调用脚本：

python myscript.py file1 file2 file3

在脚本中，循环使用sys.argv[1::::中文件名的文件名


在循环中调用函数：extract（filename）
导入系统

用当前代码生成函数，例如def extract（filename）

使用所有文件名调用脚本：

python myscript.py file1 file2 file3

在脚本中，循环使用sys.argv[1::::中文件名的文件名


在循环中调用函数：extract（filename）
我想你想看看glob.glob：
这将返回与给定模式匹配的文件列表
然后你可以做类似的事情
全球进口稀土
def do_something_with(f):
   # Open Source File
   infile = open(f, 'r')
   # Open output file
   outfile = open('ExtractedIPs.csv', 'wa')  ## ADDED a to append
   # Create a list
   BadIPs = []

   ### rest of you code
   .
   .
   outfile.write(data)

   infile.close
   outfile.close

for f in glob.glob("*.xml"):
    do_something_with(f)

我想你想看看glob.glob：
这将返回与给定模式匹配的文件列表
然后你可以做类似的事情
全球进口稀土
def do_something_with(f):
   # Open Source File
   infile = open(f, 'r')
   # Open output file
   outfile = open('ExtractedIPs.csv', 'wa')  ## ADDED a to append
   # Create a list
   BadIPs = []

   ### rest of you code
   .
   .
   outfile.write(data)

   infile.close
   outfile.close

for f in glob.glob("*.xml"):
    do_something_with(f)

您可以得到这样的所有XML文件的列表
filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]

然后迭代所有文件
for fn in filenames:
    with open(fn) as infile:
        for ln in infile:
            # do your thing

with
-语句确保文件在处理完毕后关闭。
您可以获得如下所有XML文件的列表
filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]

然后迭代所有文件
for fn in filenames:
    with open(fn) as infile:
        for ln in infile:
            # do your thing

with
-语句确保文件在处理完毕后关闭。
假设要将所有输出添加到同一文件，则脚本如下：
#!usr/bin/python
import glob   
import re

for infileName in glob.glob("*.xml"):
    # Open Source File
    infile = open(infileName, 'r')
    # Append to file
    outfile = open('ExtractedIPs.csv', 'a') 
    # Create a list
    BadIPs = []

    #search each line in doc
    for line in infile:
        # ignore empty lines
        if line.isspace(): continue

        # find IP that are Indicator Titles
        IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
        # Only take finds
        if not IP: continue
        # Add each found IP to the BadIP list
        BadIPs.append(IP)

    #tidy up for CSV format
    data = str(BadIPs)
    data = data.replace('[', '')
    data = data.replace(']', '')
    data = data.replace("'", "")
    # Write IPs to a file        
    outfile.write(data)

    infile.close
    outfile.close

#！usr/bin/python
导入glob
进口稀土
对于glob.glob（“*.xml”）中的填充名：
#开源文件
infile=open（infileName，'r'）
#附加到文件
outfile=open（'ExtractedIPs.csv'，'a'）
#创建一个列表
BadIPs=[]
#搜索文档中的每一行
对于填充中的线：
#忽略空行
if line.isspace（）：继续
#查找属于指示器标题的IP
IP=（re.findall（r“（？：IP:）（\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}）”，第行）
#只拿东西
如果不是IP：继续
#将找到的每个IP添加到BadIP列表中
BadIPs.append（IP）
#整理CSV格式
数据=str（BadIPs）
数据=数据。替换（'['，''）
数据=数据。替换（']'，''）
数据=数据。替换（“”，“”）
#将IP写入文件
outfile.write（数据）
结束
关闭
假设要将所有输出添加到同一个文件，则脚本如下：
#!usr/bin/python
import glob   
import re

for infileName in glob.glob("*.xml"):
    # Open Source File
    infile = open(infileName, 'r')
    # Append to file
    outfile = open('ExtractedIPs.csv', 'a') 
    # Create a list
    BadIPs = []

    #search each line in doc
    for line in infile:
        # ignore empty lines
        if line.isspace(): continue

        # find IP that are Indicator Titles
        IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
        # Only take finds
        if not IP: continue
        # Add each found IP to the BadIP list
        BadIPs.append(IP)

    #tidy up for CSV format
    data = str(BadIPs)
    data = data.replace('[', '')
    data = data.replace(']', '')
    data = data.replace("'", "")
    # Write IPs to a file        
    outfile.write(data)

    infile.close
    outfile.close

#！usr/bin/python
导入glob
进口稀土
对于glob.glob（“*.xml”）中的填充名：
#开源文件
infile=open（infileName，'r'）
#附加到文件
outfile=open（'ExtractedIPs.csv'，'a'）
#创建一个列表
BadIPs=[]
#搜索文档中的每一行
对于填充中的线：
#忽略空行
if line.isspace（）：继续
#查找属于指示器标题的IP
IP=（re.findall（r“（？：IP:）（\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}）”，第行）
#只拿东西
如果不是IP：继续
#将找到的每个IP添加到BadIP列表中
BadIPs.append（IP）
#整理CSV格式
数据=str（BadIPs）
数据=数据。替换（'['，''）
数据=数据。替换（']'，''）
数据=数据。替换（“”，“”）
#将IP写入文件
outfile.write（数据）
结束
关闭
我需要这样做，也需要进入子目录。您需要导入os和os.path，然后可以使用如下函数：
def recursive_glob(rootdir='.', suffix=()):
    """ recursively traverses full path from route, returns
        paths and file names for files with suffix in tuple """
    pathlist = []
    filelist = []
    for looproot,dirnames, filenames in os.walk(rootdir):
        for filename in filenames:
            if filename.endswith(suffix):
                pathlist.append(os.path.join(looproot, filename))
                filelist.append(filename)
    return pathlist, filelist

将要从中开始的顶级目录和要查找的文件类型的后缀传递给函数。这是为Windows编写和测试的，但我相信它也可以在其他操作系统上使用，只要您有文件扩展名
 我需要这样做，也需要进入子目录。您需要导入os和os.path，然后可以使用如下函数：
def recursive_glob(rootdir='.', suffix=()):
    """ recursively traverses full path from route, returns
        paths and file names for files with suffix in tuple """
    pathlist = []
    filelist = []
    for looproot,dirnames, filenames in os.walk(rootdir):
        for filename in filenames:
            if filename.endswith(suffix):
                pathlist.append(os.path.join(looproot, filename))
                filelist.append(filename)
    return pathlist, filelist

将要从中开始的顶级目录和要查找的文件类型的后缀传递给函数。这是为Windows编写和测试的，但我相信它也可以在其他操作系统上使用，只要您有文件扩展名
 如果当前文件夹中的所有文件都相关，则可以使用os.listdir（）
。如果没有，请说出所有.xml
文件，然后使用glob.glob（“*.xml”）
。但总体方案可以改进，大致如下
#import modules
import re

pat = re.compile(reg) # reg is your regex
with open("out.csv", "w") as fw:
    writer = csv.writer(fw)
    for f in os.listdir(): # or glob.glob("*.xml")
        with open(f) as fr:
            lines = (line for line in fr if line.isspace())
            # genex for all ip in that file
            ips = (ip for line in lines for ip in pat.findall(line))
            writer.writerow(ips)

你可能必须改变它以适应确切的需要。但是这个版本的想法是，副作用少了很多，内存消耗少了很多，代码也少了很多