Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将包含日期时间的数据帧打印到单个时间轴中_Python_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Python 将包含日期时间的数据帧打印到单个时间轴中

Python 将包含日期时间的数据帧打印到单个时间轴中,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我正在尝试解析一个日志文件(具体地说,来自Gradle构建),它如下所示: 21:51:38.991 [DEBUG] [TestEventLogger] cha.LoginTest4 STARTED 21:51:39.054 [DEBUG] [TestEventLogger] cha.LoginTest2 STARTED 21:51:40.068 [DEBUG] [TestEventLogger] cha.LoginTest4 PASSED 21:51:40.101 [DEBUG] [TestE

我正在尝试解析一个日志文件(具体地说,来自Gradle构建),它如下所示:

21:51:38.991 [DEBUG] [TestEventLogger] cha.LoginTest4 STARTED
21:51:39.054 [DEBUG] [TestEventLogger] cha.LoginTest2 STARTED
21:51:40.068 [DEBUG] [TestEventLogger] cha.LoginTest4 PASSED
21:51:40.101 [DEBUG] [TestEventLogger] cha.LoginTest2 PASSED
21:51:40.366 [DEBUG] [TestEventLogger] cha.LoginTest1 STARTED
21:51:40.413 [DEBUG] [TestEventLogger] cha.LoginTest3 STARTED
21:51:50.435 [DEBUG] [TestEventLogger] cha.LoginTest1 PASSED
21:51:50.463 [DEBUG] [TestEventLogger] cha.LoginTest3 PASSED
21:51:50.484 [DEBUG] [TestEventLogger] Gradle Test Run :test PASSED
21:51:38.622 [DEBUG] [TestEventLogger] Gradle Test Run :test STARTED
n |  ======= 
a |   === 
m |       == 
e |    ======= 
  |______________
     time
group                 timestamp            name
0       1900-01-01 21:51:38.991  cha.LoginTest4
0       1900-01-01 21:51:40.068  cha.LoginTest4
进入显示事件时间表的图表。有点像这样:

21:51:38.991 [DEBUG] [TestEventLogger] cha.LoginTest4 STARTED
21:51:39.054 [DEBUG] [TestEventLogger] cha.LoginTest2 STARTED
21:51:40.068 [DEBUG] [TestEventLogger] cha.LoginTest4 PASSED
21:51:40.101 [DEBUG] [TestEventLogger] cha.LoginTest2 PASSED
21:51:40.366 [DEBUG] [TestEventLogger] cha.LoginTest1 STARTED
21:51:40.413 [DEBUG] [TestEventLogger] cha.LoginTest3 STARTED
21:51:50.435 [DEBUG] [TestEventLogger] cha.LoginTest1 PASSED
21:51:50.463 [DEBUG] [TestEventLogger] cha.LoginTest3 PASSED
21:51:50.484 [DEBUG] [TestEventLogger] Gradle Test Run :test PASSED
21:51:38.622 [DEBUG] [TestEventLogger] Gradle Test Run :test STARTED
n |  ======= 
a |   === 
m |       == 
e |    ======= 
  |______________
     time
group                 timestamp            name
0       1900-01-01 21:51:38.991  cha.LoginTest4
0       1900-01-01 21:51:40.068  cha.LoginTest4
到目前为止,我已经解析了日志,并将相关的“事件”放入一个数据帧(按时间戳排序)

因为我需要每个“名称”的开始和结束时间,所以我执行了一个
groupby
。我得到的组如下所示:

21:51:38.991 [DEBUG] [TestEventLogger] cha.LoginTest4 STARTED
21:51:39.054 [DEBUG] [TestEventLogger] cha.LoginTest2 STARTED
21:51:40.068 [DEBUG] [TestEventLogger] cha.LoginTest4 PASSED
21:51:40.101 [DEBUG] [TestEventLogger] cha.LoginTest2 PASSED
21:51:40.366 [DEBUG] [TestEventLogger] cha.LoginTest1 STARTED
21:51:40.413 [DEBUG] [TestEventLogger] cha.LoginTest3 STARTED
21:51:50.435 [DEBUG] [TestEventLogger] cha.LoginTest1 PASSED
21:51:50.463 [DEBUG] [TestEventLogger] cha.LoginTest3 PASSED
21:51:50.484 [DEBUG] [TestEventLogger] Gradle Test Run :test PASSED
21:51:38.622 [DEBUG] [TestEventLogger] Gradle Test Run :test STARTED
n |  ======= 
a |   === 
m |       == 
e |    ======= 
  |______________
     time
group                 timestamp            name
0       1900-01-01 21:51:38.991  cha.LoginTest4
0       1900-01-01 21:51:40.068  cha.LoginTest4
始终有两行,第一行是开始时间,最后一行是结束时间。我可以使用
hlines
显示每个组的时间线。但是,我想让所有的组进入同一个图中,看看它们之间的开始/结束时间。我仍然希望使用
groupby
,因为它可以让我用几行代码就可以将开始/结束时间与“name”一起获得

我只能够为每个组显示一个图,而不是所有组一起显示,没有出现错误。以下是我为展示每个情节所做的工作:

for name, group in df.groupby('name', sort=False):

    group.amin = group['timestamp'].iloc[0] # assume sorted order
    group.amax = group['timestamp'].iloc[1]

    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax = ax.xaxis_date()
    ax = plt.hlines(group.index, dt.date2num(group.amin), dt.date2num(group.amax))

    plt.show()
已解决完整来源:

import os
import re
import pandas as pd
from pandas import Timestamp
import matplotlib.pyplot as plt
import matplotlib.dates as dt
import warnings
from random import random
from matplotlib.pyplot import text
from datetime import datetime
import numpy as np

warnings.simplefilter(action='ignore', category=FutureWarning) # https://stackoverflow.com/a/46721064

'''
The log contents are not guaranteed to be in order. Multiple processes are dumping contents into a single file.
Contents from a single process will be in order.
'''

def main():

    log_file_path = "gradle-4.2.test.debug.log"

    # regex to get test and task log events
    test_re = re.compile('^(\S+) \[DEBUG\] \[TestEventLogger\] (\S+[^:>]) (STARTED|PASSED|FAILED)$')
    task_re = re.compile('^(\S+) \[DEBUG\] \[TestEventLogger\] Gradle Test Run [:](\S+) (STARTED|PASSED|FAILED)$')

    df = pd.DataFrame()
    with open(log_file_path, "r") as file:
        for line in file:
            test_match = test_re.findall(line)
            if test_match:
                df = df.append(test_match)
            else:
                task_match = task_re.findall(line)
                if task_match:
                    df = df.append(task_match)

    file.close()

    df.columns = ['timestamp','name','type']
    df.drop('type', axis=1, inplace=True) # don't need this col
    df['timestamp'] = pd.to_datetime(df.timestamp, format="%H:%M:%S.%f") # pandas datetime
    df =  df.sort_values('timestamp')  # sort by  pandas datetime

    print ("log events parsed, sorted and ungrouped:\n", df)

    fig, ax = plt.subplots()
    ax.xaxis_date()

    # Customize the major grid
    ax.minorticks_on()
    ax.grid(which='major', linestyle='-', linewidth='0.2', color='gray')

    i = 0 # y-coord will be loop iteration

    # Groupby name. Because the df was previously sorted, the tuple will be sorted order (first event, second event)
    # Give each group an hline.
    for name, group in df.groupby('name', sort=False):
        i += 1

        assert group['timestamp'].size == 2 # make sure we have a start & end time for each test/task
        group.amin = group['timestamp'].iloc[0] # assume sorted order
        group.amax = group['timestamp'].iloc[1]
        assert group.amin < group.amax # make sure start/end times are in order

        if '.' in name: # assume '.' indicates a JUnit test, not a task
            color = [(random(),random(),random())]
            linestyle = 'solid'
            ax.text(group.amin, (i + 0.05), name, color='blue') # add name to x, y+.05 to hline
        else: # a task.
            color = 'black'
            linestyle = 'dashed'
            ax.text(group.amin, (i + 0.05), name + ' (Task)', color='red') # add name to x, y+.05 to hline

        ax.hlines(i, dt.date2num(group.amin), dt.date2num(group.amax), linewidth = 6, color=color, linestyle=linestyle)

    # Turn off y ticks. These are just execution order (numbers won't make sense).
    plt.setp(ax.get_yticklabels(), visible=False)
    ax.yaxis.set_tick_params(size=0)
    ax.yaxis.tick_left()

    plt.title('Timeline of Gradle Task and Test Execution')
    plt.xlabel('Time')
    plt.ylabel('Execution Order')
    plt.show()
#    plt.savefig('myfig')


if __name__ == '__main__':
    main()
导入操作系统
进口稀土
作为pd进口熊猫
从导入时间戳
将matplotlib.pyplot作为plt导入
将matplotlib.dates导入为dt
进口警告
从随机导入随机
从matplotlib.pyplot导入文本
从日期时间导入日期时间
将numpy作为np导入
警告.simplefilter(action='ignore',category=FutureWarning)#https://stackoverflow.com/a/46721064
'''
日志内容不保证是有序的。多个进程正在将内容转储到单个文件中。
单个进程中的内容将按顺序排列。
'''
def main():
log\u file\u path=“gradle-4.2.test.debug.log”
#用于获取测试和任务日志事件的regex
test\u re=re.compile(“^(\S+\[DEBUG\]\[TestEventLogger\]”(\S+[^:>])(启动|通过|失败)$”)
任务\u re=re.compile(“^(\S+)\[DEBUG\]\[TestEventLogger\]Gradle测试运行[:](\S+)(启动|通过|失败)$”)
df=pd.DataFrame()
打开(日志文件路径,“r”)作为文件:
对于文件中的行:
测试匹配=测试关于findall(行)
如果测试匹配:
df=df.append(测试匹配)
其他:
任务匹配=任务匹配findall(行)
如果任务匹配:
df=df.append(任务匹配)
file.close()文件
df.columns=['timestamp','name','type']
df.drop('type',axis=1,inplace=True)#不需要此列
df['timestamp']=pd.to_datetime(df.timestamp,format=“%H:%M:%S.%f”)#datetime
df=df.sort_值('timestamp')#按日期时间排序
打印(“已分析、排序和取消分组的日志事件:\n”,df)
图,ax=plt.子批次()
ax.xaxis_日期()
#自定义主网格
ax.minorticks_on()
ax.grid(其中,='major',linestyle='-',linewidth='0.2',color='gray')
i=0#y坐标将是循环迭代
#按名称分组。由于df之前已排序,因此元组将按顺序排序(第一个事件,第二个事件)
#给每组一个hline。
对于名称,df.groupby中的group('name',sort=False):
i+=1
assert group['timestamp'].size==2#确保每个测试/任务都有一个开始和结束时间
group.amin=group['timestamp'].iloc[0]#假定排序顺序
group.amax=group['timestamp'].iloc[1]
assert group.amin
那么,如何将这个充满时间戳的分组数据帧放入一个显示开始/结束时间线的图表中呢


似乎我在正则表达式、数据帧、日期时间等方面遇到了这样或那样的问题,但我认为我得到了一个很好的干净的解决方案……

现在无法测试,抱歉,但这(或接近)应该会有帮助:在打印循环之前创建一个图形,然后将每组的数据打印到一个轴上

fig, ax = plt.subplots()
ax.xaxis_date()
for name, group in df.groupby('name', sort=False):

    group.amin = group['timestamp'].iloc[0] # assume sorted order
    group.amax = group['timestamp'].iloc[1]

    ax.hlines(group.index, dt.date2num(group.amin), dt.date2num(group.amax))

plt.show()

我对这个问题的第一个联想是使用
plt.barh
——但我必须承认我在datetime/time主题上挣扎了一段时间,直到结果如我所愿

然而,这是这个想法的结果:

假设以下数据帧为开始:

df
Out: 
      timestamp            name
0  21:51:38.622            test
1  21:51:38.991  cha.LoginTest4
2  21:51:39.054  cha.LoginTest2
3  21:51:40.068  cha.LoginTest4
4  21:51:40.101  cha.LoginTest2
5  21:51:40.366  cha.LoginTest1
6  21:51:40.413  cha.LoginTest3
7  21:51:50.435  cha.LoginTest1
8  21:51:50.463  cha.LoginTest3
9  21:51:50.484            test
首先,我按名称分组并创建一个新的数据框,其中包含
matplotlib.dates中的开始和持续时间数据

grpd = df.groupby('name')
plot_data = pd.DataFrame({'start': dt.date2num(pd.to_datetime(grpd.min().timestamp)), 'stop':  dt.date2num(pd.to_datetime(grpd.max().timestamp))}, grpd.min().index)
从零开始减去第一个开始时间(仍然添加
1
,因为这是
matplotlib.dates
开始的方式)

基于此数据框,随时间绘制水平条形图很容易:

fig, ax = plt.subplots(figsize=(8,4))
ax.xaxis_date()
ax.barh(plot_data.index, plot_data.duration, left=plot_data.start, height=.4)
plt.tight_layout()

我得到:
AttributeError:“非类型”对象没有属性“hlines”
。看起来好像
ax
不是来自
ax=ax.xaxis_date()
@patronising_bofh的类型,我忘了更改
fig,ax
设置行。我编辑了答案;现在试试。好的,太好了。我能让它工作。我用正确的完整来源更新了我的原始帖子。谢谢,你好。这个解决方案看起来也不错。然而,我已经接受了Peter Leimbigler的解决方案。谢谢