Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于搜索结果并将结果导出到.csv文件的Python脚本_Python_Search_Csv_Grep - Fatal编程技术网

用于搜索结果并将结果导出到.csv文件的Python脚本

用于搜索结果并将结果导出到.csv文件的Python脚本,python,search,csv,grep,Python,Search,Csv,Grep,我正在尝试用Python执行以下操作,同时使用一些bash脚本。除非Python中有更简单的方法 我有一个包含如下数据的日志文件: 16:14:59.027003 - WARN - Cancel Latency: 100ms - OrderId: 311yrsbj - On Venue: ABCD 16:14:59.027010 - WARN - Ack Latency: 25ms - OrderId: 311yrsbl - On Venue: EFGH 16:14:59.027201 - WA

我正在尝试用Python执行以下操作,同时使用一些bash脚本。除非Python中有更简单的方法

我有一个包含如下数据的日志文件:

16:14:59.027003 - WARN - Cancel Latency: 100ms - OrderId: 311yrsbj - On Venue: ABCD
16:14:59.027010 - WARN - Ack Latency: 25ms - OrderId: 311yrsbl - On Venue: EFGH
16:14:59.027201 - WARN - Ack Latency: 22ms - OrderId: 311yrsbn - On Venue: IJKL
16:14:59.027235 - WARN - Cancel Latency: 137ms - OrderId: 311yrsbp - On Venue: MNOP
16:14:59.027256 - WARN - Cancel Latency: 220ms - OrderId: 311yrsbr - On Venue: QRST
16:14:59.027293 - WARN - Ack Latency: 142ms - OrderId: 311yrsbt - On Venue: UVWX
16:14:59.027329 - WARN - Cancel Latency: 134ms - OrderId: 311yrsbv - On Venue: YZ  
16:14:59.027359 - WARN - Ack Latency: 75ms - OrderId: 311yrsbx - On Venue: ABCD
16:14:59.027401 - WARN - Cancel Latency: 66ms - OrderId: 311yrsbz - On Venue: ABCD
16:14:59.027426 - WARN - Cancel Latency: 212ms - OrderId: 311yrsc1 - On Venue: EFGH
16:14:59.027470 - WARN - Cancel Latency: 89ms - OrderId: 311yrsf7 - On Venue: IJKL  
16:14:59.027495 - WARN - Cancel Latency: 97ms - OrderId: 311yrsay - On Venue: IJKL
我需要从每一行中提取最后一个条目,然后使用每个唯一的条目搜索每一行,并将其导出到.csv文件中

我使用了以下bash脚本来获取每个唯一条目: cat日志文件{ucode>date+%Y%m%d.msg.log{lawk'{print$14}'| sort | uniq

基于日志文件中的上述数据,bash脚本将返回以下结果:

ABCD
EFGH
IJKL
MNOP
QRST
UVWX
YZ
现在,我想在同一个日志文件中搜索(或grep)每个结果,并返回前十个结果。我有另一个bash脚本来实现这一点,但是,如何使用FOR循环来实现呢?对于x,其中x=上面的每个条目

grep x LogFile_uucode>date+%Y%m%dmsg.log | awk'{print$7}'| sort-nr | uniq | head-10

然后将结果返回到.csv文件中。结果如下所示(每个字段位于单独的列中):


我是Python的初学者,从大学开始(13年前)就没有做过太多的编码。任何帮助都将不胜感激。谢谢。

说你已经打开了文件。您要做的是记录每个条目在其中的次数,也就是说,每个条目将导致一个或多个计时:

from collections import defaultdict

entries = defaultdict(list)
for line in your_file:
    # Parse the line and return the 'ABCD' part and time
    column_a, timing = parse(line)
    entries[column_a].append(timing)
完成后,您将拥有这样一本词典:

{ 'ABCD': ['30ms', '25ms', '12ms'],
  'EFGH': ['12ms'],
  'IJKL': ['2ms', '14ms'] }
您现在要做的是将此字典转换为另一个数据结构,该数据结构按其值(即列表)的
len
排序。例如:

In [15]: sorted(((k, v) for k, v in entries.items()), 
                key=lambda i: len(i[1]), reverse=True)
Out[15]: 
[('ABCD', ['30ms', '25ms', '12ms']),
 ('IJKL', ['2ms', '14ms']),
 ('EFGH', ['12ms'])]

当然,这只是说明性的,您可能希望在原始
循环中为
收集更多数据。

假设您已打开文件。您要做的是记录每个条目在其中的次数,也就是说,每个条目将导致一个或多个计时:

from collections import defaultdict

entries = defaultdict(list)
for line in your_file:
    # Parse the line and return the 'ABCD' part and time
    column_a, timing = parse(line)
    entries[column_a].append(timing)
完成后,您将拥有这样一本词典:

{ 'ABCD': ['30ms', '25ms', '12ms'],
  'EFGH': ['12ms'],
  'IJKL': ['2ms', '14ms'] }
您现在要做的是将此字典转换为另一个数据结构,该数据结构按其值(即列表)的
len
排序。例如:

In [15]: sorted(((k, v) for k, v in entries.items()), 
                key=lambda i: len(i[1]), reverse=True)
Out[15]: 
[('ABCD', ['30ms', '25ms', '12ms']),
 ('IJKL', ['2ms', '14ms']),
 ('EFGH', ['12ms'])]

当然,这只是说明性的,您可能希望在原始的
循环中为
收集更多数据。

可能不像您认为的那样简洁。。。但我认为这可以解决你的问题。我添加了一些try…catch来更好地处理真实数据

import re
import os
import csv
import collections

# get all logfiles under current directory of course this pattern can be more
# sophisticated, but it's not our attention here, isn't it?
log_pattern = re.compile(r"LogFile_date[0-9]{8}.msg.log")
logfiles = [f for f in os.listdir('./') if log_pattern.match(f)]

# top n
nhead = 10
# used to parse useful fields
extract_pattern = re.compile(
    r'.*Cancel Latency: ([0-9]+ms) - OrderId: ([0-9a-z]+) - On Venue: ([A-Z]+)')
# container for final results
res = collections.defaultdict(list)

# parse out all interesting fields
for logfile in logfiles:
    with open(logfile, 'r') as logf:
        for line in logf:
            try:  # in case of blank line or line with no such fields.
                latency, orderid, venue = extract_pattern.match(line).groups()
            except AttributeError:
                continue
            res[venue].append((orderid, latency))

# write to csv
with open('res.csv', 'w') as resf:
    resc = csv.writer(resf, delimiter=' ')
    for venue in sorted(res.iterkeys()):  # sort by Venue
        entries = res[venue]
        entries.sort()  # sort by OrderId
        for i in range(0, nhead):
            try:
                resc.writerow([venue, entries[i][0], 'Cancel ' + entries[i][1]])
            except IndexError:  # nhead can not be satisfied
                break

也许不像你想的那么简洁。。。但我认为这可以解决你的问题。我添加了一些try…catch来更好地处理真实数据

import re
import os
import csv
import collections

# get all logfiles under current directory of course this pattern can be more
# sophisticated, but it's not our attention here, isn't it?
log_pattern = re.compile(r"LogFile_date[0-9]{8}.msg.log")
logfiles = [f for f in os.listdir('./') if log_pattern.match(f)]

# top n
nhead = 10
# used to parse useful fields
extract_pattern = re.compile(
    r'.*Cancel Latency: ([0-9]+ms) - OrderId: ([0-9a-z]+) - On Venue: ([A-Z]+)')
# container for final results
res = collections.defaultdict(list)

# parse out all interesting fields
for logfile in logfiles:
    with open(logfile, 'r') as logf:
        for line in logf:
            try:  # in case of blank line or line with no such fields.
                latency, orderid, venue = extract_pattern.match(line).groups()
            except AttributeError:
                continue
            res[venue].append((orderid, latency))

# write to csv
with open('res.csv', 'w') as resf:
    resc = csv.writer(resf, delimiter=' ')
    for venue in sorted(res.iterkeys()):  # sort by Venue
        entries = res[venue]
        entries.sort()  # sort by OrderId
        for i in range(0, nhead):
            try:
                resc.writerow([venue, entries[i][0], 'Cancel ' + entries[i][1]])
            except IndexError:  # nhead can not be satisfied
                break

您的输出如何与您的输入相对应?您的输出如何与您的输入相对应?可能是一些简单的问题,但我遇到了错误:将open(logfile,'r')作为logf:^syntaxror:invalid syntax谢谢Francis Chan的帮助。这很有效。是否有一种方法可以将每个字段写入.csv文件中的一个单独的列,每个列都有相应的标题?立即写入—将所有4个字段写入同一列(A列)。另外,我想按场地的字母顺序排序,然后按第四个字段降序排序(63ms、64ms、63ms、62ms…等等)?再次感谢你的帮助。另外,我应该用一个更好的例子来说明我的日志文件。有两种不同类型的“延迟”,但我只显示了一种类型,即“取消”。它实际上是“取消”或“确认”。如何在延迟之前包含正确的前一个单词?16:14:59.027003-警告-确认延迟:22ms-订单ID:311yrsbj-在场馆:IJKL 16:14:59.027010-警告-取消延迟:22ms-订单ID:311yrsbl-在场馆:EFGH 16:14:59.027201-警告-确认延迟:22ms-订单ID:311yrsbn-在场馆:IJKL 16:14:59.027235-警告-取消延迟:22ms-订单ID:311yrsbp-在场馆:不可能发生什么事情很简单,但我遇到了一个错误:将open(logfile,'r')作为logf:^SyntaxError:invalid syntaxes谢谢Francis Chan的帮助。这很有效。是否有一种方法可以将每个字段写入.csv文件中的一个单独的列,每个列都有相应的标题?立即写入—将所有4个字段写入同一列(A列)。另外,我想按场地的字母顺序排序,然后按第四个字段降序排序(63ms、64ms、63ms、62ms…等等)?再次感谢你的帮助。另外,我应该用一个更好的例子来说明我的日志文件。有两种不同类型的“延迟”,但我只显示了一种类型,即“取消”。它实际上是“取消”或“确认”。如何在延迟之前包含正确的前一个单词?16:14:59.027003-警告-确认延迟:22ms-订单ID:311yrsbj-在场馆:IJKL 16:14:59.027010-警告-取消延迟:22ms-订单ID:311yrsbl-在场馆:EFGH 16:14:59.027201-警告-确认延迟:22ms-订单ID:311yrsbn-在场馆:IJKL 16:14:59.027235-警告-取消延迟:22ms-订单ID:311yrsbp-在场馆:MNOP