Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python拆分数据块时出错_Python - Fatal编程技术网

使用Python拆分数据块时出错

使用Python拆分数据块时出错,python,Python,我有一个解析文件,需要根据LogType拆分数据。下面是我的数据: =================================================================================== LogType:container-localizer-syslog Log Upload Time :Thu Jun 25 12:24:45 +0100 2020 LogLength:0 Log Contents: LogType:stderr Log Upl

我有一个解析文件,需要根据
LogType
拆分数据。下面是我的数据:

===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:

LogType:stderr
Log Upload Time :Thu Jun 25 12:24:52 +0100 2020
LogLength:3000
Log Contents:
20/06/25 12:19:33 INFO datasources.FileScanRDD
20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/21 12:19:40 INFO eas
20/06/25 12:20:41 WARN Warning as the node is accessed without started

===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
我应用了一个代码,该代码在分割数据时会导致一些错误。下面是我应用的代码:

def parse_container(text,full_text_lines,filter_log_types=None,filter_content_types=None):
    results={}

    first, rest  = text.split('\n', 1)
   #print(rest)      #rest is the block of data mentioned above
    results['id'] = first
    all_log_types = re.compile('^(?=LogType:)',flags=re.MULTILINE).split(rest)
    print(all_log_types)
我得到的结果是:

['========================================================================\nLogType:container-
localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:\n\n
LogType:stderr\nLog Upload Time :Thu Jun 25 12:24:52 +0100 2020\nLogLength:3000\nLog Contents:\n20/06/25 12:19:33 INFO datasources.FileScanRDD \n20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.\n
20/06/21 12:19:40 INFO eas\n20/06/25 12:20:41 WARN Warning as the node is accessed without started\n \n']
['========================================================================\nLogType:container-
localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:\n\n']
我需要的输出:

['========================================================================\n','LogType:contain
er-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:\n\n', 
 'LogType:stderr\nLog Upload Time :Thu Jun 25 12:24:52 +0100 2020\nLogLength:3000\nLog Contents:\n20/06/25 12:19:33 INFO datasources.FileScanRDD \n20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.\n20/06/21 12:19:40 INFO eas\n20/06/25 12:20:41 WARN Warning as the node is accessed without started\n \n']

['========================================================================\n','LogType:contain
    er-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:\n\n']
在我的输出中,您可以看到我在日志类型的开头得到了
\n
,但我需要根据日志类型按
逗号进行拆分

在预期的输出中,您可以看到数据已根据日志类型被


我正在使用Python 2.6.6。请帮我解决这个问题。非常感谢

如果一个文件中有多个日志,请尝试以下操作:

import re

results={}
logs = re.split('^=', text, 0, re.MULTILINE)

for log in logs:
    if (len(log) > 0):
        first, rest = log.split('=\n')
        print('first', first)
        print('rest',rest)
        print("\n\n")
输出:

first =================================================================================
rest LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:

LogType:stderr
Log Upload Time :Thu Jun 25 12:24:52 +0100 2020
LogLength:3000
Log Contents:
20/06/25 12:19:33 INFO datasources.FileScanRDD
20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/21 12:19:40 INFO eas
20/06/25 12:20:41 WARN Warning as the node is accessed without started



first =================================================================================
rest LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0

你可以根据你的问题使用这个

text=text.replace('=','')
 all_log_types=text.split('\n\n') # splitting based on an Empty line
 print(all_log_types)

我们可以在
python
中使用正则表达式轻松地拆分日志。以下代码按两种情况中的
拆分日志

条件1:多次出现
=
,后跟
\n

条件2:2次出现
\n

如果满足任何条件,我们就得到输出
filter
将删除
split
返回的所有空字符串,并返回
对象。然后将该
对象
转换为
列表

import re

text = """===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:

LogType:stderr
Log Upload Time :Thu Jun 25 12:24:52 +0100 2020
LogLength:3000
Log Contents:
20/06/25 12:19:33 INFO datasources.FileScanRDD
20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/21 12:19:40 INFO eas
20/06/25 12:20:41 WARN Warning as the node is accessed without started

===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
"""


output = list(filter(None, re.compile('[=]+.\n|\n\n').split(text)))

print(output)
输出:

['LogType:container-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:', 'LogType:stderr\nLog Upload Time :Thu Jun 25 12:24:52 +0100 2020\nLogLength:3000\nLog Contents:\n20/06/25 12:19:33 INFO datasources.FileScanRDD\n20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.\n20/06/21 12:19:40 INFO eas\n20/06/25 12:20:41 WARN Warning as the node is accessed without started', 'LogType:container-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\n']

谢谢你,唐纳德!但是我仍然得到了相同的输出。这有什么问题吗:all_log_types=re.compile(“^(?=LogType:)”,flags=re.MULTILINE)。split(rest)@Lekshmi,第二行看起来也正确。我试过了,得到了你想要的结果。我在我的回答中添加了一些东西,希望你能帮助缩小问题的范围。正如你所说,我认为我的文本源是一个问题,我想是的。有什么办法可以解决这个问题吗?因为我必须对给定的文件执行此操作。为了更好地理解这个问题,我增加了一些内容。或者你能建议一些方法来解决这个错误吗?我在上面的示例中添加了另一个字符串来匹配,“=$”,它应该在第一行末尾找到最后一个“=”,然后拆分,看看这是否有帮助。非常感谢Syed。我得到了部分输出。但是我有第一行“=========================”,它仍然连接到我的第一个日志类型。如何删除?如果不需要“==”,则可以使用此文本=文本。替换(“=”,“”)文本=文本。替换(“=”,“”)#删除“=”所有日志类型=文本。拆分('\n\n')打印(所有日志类型)当我只有一个块时,它会给出正确的输出,但如果有许多块,如我在问题中提到的,它就会失败。我在另一个领域遇到了类似的问题。你能帮我解决这个问题吗?这是链接