Python 无法使用pandas read\u csv分析自定义日志文件格式

Python 无法使用pandas read\u csv分析自定义日志文件格式,python,pandas,dataframe,Python,Pandas,Dataframe,我试图在一个自定义日志文件中找到并绘制每小时的错误率,如下所示 <2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="

我试图在一个自定义日志文件中找到并绘制每小时的错误率,如下所示

<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>
<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>
<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>
<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>
警告响应不正常--传输内容:500内部服务器错误500内部服务器错误
--ServiceInfo:PostDataService-客户端:A201ACCDC3DAB47C0B4BF0021AD11785DFD49F1863,租户:19b0be25fd5248588f0631a820a43c88,payloadType:apm_metric,messageForClient:false Observations:DeploymentMetric[1]MappingMetric[1]RequestTypeMetric[1]LinkMetric[1]--传输信息:HTTP方法:POST,网址:https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,响应状态:500,响应标题:连接=保持活动状态,内容长度=182,日期=周四,2019年12月19日16:02:14 GMT,内容类型=文本/html 警告:无法获取sdpinternal.messaging.management.em.ServerTargetImpl的业务接口,将不监视类。 警告响应不正常--传输内容:500内部服务器错误500内部服务器错误
--ServiceInfo:PostDataService-客户端:A201ACCDC3DAB47C0B4BF0021AD11785DFD49F1863,租户:19b0be25fd5248588f0631a820a43c88,payloadType:apm_metric,messageForClient:false Observations:HostMetric[1]DeploymentMetric[1]JVMMMetric[1]Infrastructure metric[1]MappingMetric[1]RequestTypeMetric[1]LinkMetric[1]ThreadPoolMetric[1]AppServerMetric[1]ConnectionPoolMetric[1]--传输信息:HTTP方法:POST,URL:https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,响应状态:500,响应标题:连接=保持活动状态,内容长度=182,日期=周四,2019年12月19日16:04:14 GMT,内容类型=文本/html 连接到时出现警告错误https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric
我想画出每小时“500个内部服务器错误”的数量。我尝试使用以下方法将此日志解析为pandas数据帧:

import pandas as pd
from pandas.compat import StringIO


tmp=u"""<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>
<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>
<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>
<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>"""

df = pd.read_csv(StringIO(tmp), comment=' --', sep='0> ', names=['Time','Text'])
indexNames = df[ (df['Time'].str.startswith(' --')) ].index
df.drop(indexNames , inplace=True)

# remove < by strip and convert column Time to_datetime:
df.Time = pd.to_datetime(df.Time.str.strip('<'), format='%Y-%m-%dT%H:%M:%S.%f+0000')
df.Text = df.Text.str.strip()

print (df)
print (df.dtypes)
将熊猫作为pd导入
从pandas.compat导入StringIO
tmp=u“”“警告响应不正常--传输内容:500内部服务器错误500内部服务器错误
--ServiceInfo:PostDataService-客户端:A201ACCDC3DAB47C0B4BF0021AD11785DFD49F1863,租户:19b0be25fd5248588f0631a820a43c88,有效负载类型:apm_metric,messageForClient:错误观察:部署度量[1]映射度量[1]请求类型度量[1]链接度量[1]--传输信息:HTTP方法:POST,URL:https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,响应状态:500,响应标题:连接=保持活动状态,内容长度=182,日期=周四,2019年12月19日16:02:14 GMT,内容类型=文本/html 警告:无法获取sdpinternal.messaging.management.em.ServerTargetImpl的业务接口,将不监视类。 警告响应不正常--传输内容:500内部服务器错误500内部服务器错误
--ServiceInfo:PostDataService-客户端:A201ACCDC3DAB47C0B4BF0021AD11785DFD49F1863,租户:19b0be25fd5248588f0631a820a43c88,payloadType:apm_metric,messageForClient:false Observations:HostMetric[1]DeploymentMetric[1]JVMMMetric[1]Infrastructure metric[1]MappingMetric[1]RequestTypeMetric[1]LinkMetric[1]ThreadPoolMetric[1]AppServerMetric[1]ConnectionPoolMetric[1]--传输信息:HTTP方法:POST,URL:https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,响应状态:500,响应标题:连接=保持活动状态,内容长度=182,日期=周四,2019年12月19日16:04:14 GMT,内容类型=文本/html 连接到时出现警告错误https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric """ df=pd.read_csv(StringIO(tmp),comment='--',sep='0>',names=['Time','Text'])) indexNames=df[(df['Time'].str.startswith('--'))].index drop(indexNames,inplace=True) #删除df.Time=pd.to_datetime(df.Time.str.strip)(“受影响的行不以任何空格开头。请将
startswith('--')
替换为
startswith('--')


另一方面,
pd.read_csv()
中的
comment='-'
参数不起作用。根据

注释:str,可选

指示不应分析行的其余部分。如果在行首找到该行,则将完全忽略该行。此参数必须是单个字符。


受影响的行不以任何空格开头。请将
startswith('--')
替换为
startswith('--')


另一方面,
pd.read_csv()
中的
comment='-'
参数不起作用。根据

注释:str,可选

指示不应分析行的其余部分。如果在行首找到该行,则将完全忽略该行。此参数必须是单个字符。

我试图在自定义日志中查找并绘制每小时的错误率 看起来像这样的文件

<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>
<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>
<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>
<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>
日志文件自然包含行尾信息。因此,如果您可以访问日志文件,我建议直接通过日志文件进行处理

errors = []
with open("log.txt", "r") as log:
    for line in log:
        if "500 Internal Server Error" in line:
            errors.append(datetime.strptime(line.strip().split()[0], '<%Y-%m-%dT%H:%M:%S.%f+0000>'))

df = pd.DataFrame({'Time': errors})
错误=[]
打开(“log.txt”、“r”)作为日志:
对于日志中的行:
如果行中出现“500内部服务器错误”:
errors.append(datetime.strtime(line.strip().split()[0],“”))
df=pd.DataFrame({'Time':errors})
测试

log = [
"""<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>""",
"""--ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>""",
"""<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>""",
"""<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>""",
"""--ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>""",
"""<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>"""
]

errors = []

#with open("log.txt", "r") as log:
for line in log:
    if "500 Internal Server Error" in line:
        errors.append(datetime.strptime(line.strip().split()[0], '<%Y-%m-%dT%H:%M:%S.%f+0000>'))

df = pd.DataFrame({'Time': errors})
log=[
“”“警告响应不正常--传输内容:500内部服务器错误500内部服务器错误
”“”, “”--ServiceInfo:PostDataService-客户端:A201ACCDC3DAB47C0B4BF0021AD11785DFD49F1863,租户:19b0be25fd5248588f0631a820a43c88,payloadType:apm_metric,messageForClient:false观察:部署度量[1]映射度量[1]请求类型度量[1]链接度量[1]--传输信息:HTTP方法:POST,URL:https://oc-