Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python拆分数据并将排序后的数据分配给excel视图的列_Python_List_Text_Data Processing - Fatal编程技术网

使用Python拆分数据并将排序后的数据分配给excel视图的列

使用Python拆分数据并将排序后的数据分配给excel视图的列,python,list,text,data-processing,Python,List,Text,Data Processing,嗨,我在一个文本文件中有一组如下所示的数据(虚拟数据替换学校数据) 01-01-1998 00:00:00 AM GP: D(B):1234 to time difference. Hourly Avg:-3 secs 01-01-1998 00:00:12 AM GP: D(A): 2345 to time difference. Hourly Avg:0 secs 01-01-1998 00:08:08 AM SYS: The Screen Is now minimised. 01-0

嗨,我在一个文本文件中有一组如下所示的数据(虚拟数据替换学校数据)

01-01-1998 00:00:00 AM  GP: D(B):1234 to time difference. Hourly Avg:-3 secs
01-01-1998 00:00:12 AM  GP: D(A): 2345 to time difference. Hourly Avg:0 secs
01-01-1998 00:08:08 AM  SYS: The Screen Is now minimised.
01-01-1998 00:09:10 AM  00:09:10 AM SC: Findcorrect: W. D:1. Count one two three four five.       #there are somehow some glitch in the system showing 2 timestamp
01-01-1998 00:14:14 AM  SC: D1 test. Old:111, New:222, Calculated was 123, out of 120 secs.    
01-01-1998 01:06:24 AM  ET: Program Disconnected event.
我想整理数据,如下所示,以

[['Timestamp','System','Di','Message']    #  <-- header
['01-01-1998 00:00:00 AM', 'GP:','D(B):','1234 to time difference. Hourly Avg:-3 secs'],
['01-01-1998 00:00:12 AM', 'GP:','D(A):', '2345 to time difference. Hourly Avg:0 secs'],
['01-01-1998 00:08:08 AM', 'SYS:','','The Screen Is now minimised.'],   #<-- with a blank
['01-01-1998 00:09:10 AM', 'SC:','','Findcorrect: HW. D:1. Count one two three four five.'],
['01-01-1998 00:14:14 AM', 'SC:','D1','test. Old:111, New:222, Calculated was 123, out of 120 secs.' ],
['01-01-1998 01:06:24 AM', 'ET:','', 'Program Disconnected event.']]
由于缺乏python方面的知识,代码尚未完全开发,我将非常感谢任何指导或示例! 需要思考的问题是,我是否使用pandas/dataframe?或者我可以不用警局就这么做

编辑:第一行数据更新为“D(B)1234”,数字和D(B)之间不应有任何空格。

清除此混乱数据的代码部分使用正则表达式,部分使用字符串插值

由于需要在文本中屏蔽内部的
(例如,在第行中,旧的:111,新的:222,),已清理csv的写入使用模块:

创建演示文件:

with open("data.txt","w") as w:
    w.write("""01-01-1998 00:00:00 AM  GP: D(B): 1234 to time difference. Hourly Avg:-3 secs
01-01-1998 00:00:12 AM  GP: D(A): 2345 to time difference. Hourly Avg:0 secs
01-01-1998 00:08:08 AM  SYS: The Screen Is now minimised.
01-01-1998 00:09:10 AM  00:09:10 AM SC: Findcorrect: W. D:1. Count one two three four five.       #there are somehow some glitch in the system showing 2 timestamp
01-01-1998 00:14:14 AM  SC: D1 test. Old:111, New:222, Calculated was 123, out of 120 secs.    
01-01-1998 01:06:24 AM  ET: Program Disconnected event.""")
解析并编写它:

import re

def parseLine(line):
    # get the timestamp
    ts = re.match(r"\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2} +(?:AM|PM)",line)

    # get all but the timestamp - cleaning the double-time issue
    cleaned = re.sub(r"^\d{2}-\d{2}-\d{4} (\d{2}:\d{2}:\d{2} (AM|PM) +)+","", line)

    # split cleaned part based on occurence of ["D(A)", "D(B)", "D1", "D2"]
    if any(k in cleaned.split(":")[1] for k in ["D(A)", "D(B)", "D1", "D2"]):
        system, di, msg = cleaned.split(" ", maxsplit = 2)
    else:
        di = ""
        system, msg = cleaned.split(":", maxsplit = 1)

    # return each line as list of cleaned stuff:
    return [ts[0].strip() ,system.strip(), di.strip(), msg.strip()]

# fixed header, lines will be appended   
p = [['Timestamp','System','Di','Message']]

with open("data.txt","r") as r:
    for l in r:
        l = l.strip()
        p.append(parseLine(l))

import csv
with open("c.csv","w",newline="") as w:
    writer = csv.writer(w,quoting=csv.QUOTE_ALL)
    writer.writerows(p)
读取并输出写入的文件:

with open("c.csv") as r:
    print(r.read())
文件内容(屏蔽csv)否则
st.旧:111,新:222,计算为123,
将损坏您的格式:

"Timestamp","System","Di","Message"
"01-01-1998 00:00:00 AM","GP:","D(B):","1234 to time difference. Hourly Avg:-3 secs"
"01-01-1998 00:00:12 AM","GP:","D(A):","2345 to time difference. Hourly Avg:0 secs"
"01-01-1998 00:08:08 AM","SYS","","The Screen Is now minimised."
"01-01-1998 00:09:10 AM","SC","","Findcorrect: W. D:1. Count one two three four five.       #there are somehow some glitch in the system showing 2 timestamp"
"01-01-1998 00:14:14 AM","SC:","D1","test. Old:111, New:222, Calculated was 123, out of 120 secs."
"01-01-1998 01:06:24 AM","ET","","Program Disconnected event."

你好谢谢你的指导。我只是想和你核对一些东西,因为上面显示的数据只是我测试数据的一部分,当我用我的实际数据进行测试时,它会导致“非类型”对象不可订阅的错误。我正在进行一些故障排除,并意识到当我的时间是小时部分的一位数时会发生错误。我已尝试将检查数字的参数更改为“\d{2,}”,但仍然不起作用。请提供建议,谢谢!当我在测试数据中发现错误时,已更新问题!@Thanksforelping似乎您的白天是我的睡眠时间。您可以更改
\d{1,2}
使其最多接受1到2个数字-
\d{2,}
接受2个或更多的数字。我用来测试正则表达式-它甚至可以用“普通”文本向您解释它们-数据中的错误不应该影响这一点,因为该部分在
-空格与否应该无关紧要
"Timestamp","System","Di","Message"
"01-01-1998 00:00:00 AM","GP:","D(B):","1234 to time difference. Hourly Avg:-3 secs"
"01-01-1998 00:00:12 AM","GP:","D(A):","2345 to time difference. Hourly Avg:0 secs"
"01-01-1998 00:08:08 AM","SYS","","The Screen Is now minimised."
"01-01-1998 00:09:10 AM","SC","","Findcorrect: W. D:1. Count one two three four five.       #there are somehow some glitch in the system showing 2 timestamp"
"01-01-1998 00:14:14 AM","SC:","D1","test. Old:111, New:222, Calculated was 123, out of 120 secs."
"01-01-1998 01:06:24 AM","ET","","Program Disconnected event."