Python 如何将AWS CloudTrail JSON日志读取到数据帧中

Python 如何将AWS CloudTrail JSON日志读取到数据帧中,python,json,pandas,amazon-cloudtrail,json-normalize,Python,Json,Pandas,Amazon Cloudtrail,Json Normalize,我有一个问题,因为我的虚拟机突然宕机,我正在使用运行Anaconda3的Jupyterlab将数据加载到pandas。在它启动之后,我发现我的代码由于某种原因不再工作了。这是我的密码: awsc = pd.DataFrame() json_pattern = os.path.join('logs_old/AWSCloudtrailLog/','*') file_list = glob.glob(json_pattern) for file in file_list: data = pd.

我有一个问题,因为我的虚拟机突然宕机,我正在使用运行Anaconda3的Jupyterlab将数据加载到pandas。在它启动之后,我发现我的代码由于某种原因不再工作了。这是我的密码:

awsc = pd.DataFrame()
json_pattern = os.path.join('logs_old/AWSCloudtrailLog/','*')
file_list = glob.glob(json_pattern)
for file in file_list:
    data = pd.read_json(file, lines=True)
    awsc = awsc.append(data, ignore_index = True)
awsc = pd.concat([awsc, pd.json_normalize(awsc['userIdentity'])], axis=1).drop('userIdentity', 1)
awsc.rename(columns={'type':'userIdentity_type',
                     'principalId':'userIdentity_principalId',
                     'arn':'userIdentity_arn',
                     'accountId':'userIdentity_accountId',
                     'accessKeyId':'userIdentity_accessKeyId',
                     'userName':'userIdentity_userName',}, inplace=True)
当我运行代码时,它给了我如下的KeyError消息:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'userIdentity'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-9-efd1d2e600a5> in <module>
      1 # unpack nested json
      2 
----> 3 awsc = pd.concat([awsc, pd.json_normalize(awsc['userIdentity'])], axis=1).drop('userIdentity', 1)
      4 awsc.rename(columns={'type':'userIdentity_type',
      5                      'principalId':'userIdentity_principalId',

~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:

KeyError: 'userIdentity'
---------------------------------------------------------------------------
KeyError回溯(最近一次呼叫最后一次)
get_loc中的~/anaconda3/envs/envs/environment/lib/python3.8/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
2888尝试:
->2889自动返回引擎。获取锁定(铸造键)
2890除KeyError作为错误外:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:'用户标识'
上述异常是以下异常的直接原因:
KeyError回溯(最近一次呼叫最后一次)
在里面
1#解包嵌套json
2.
---->3 awsc=pd.concat([awsc,pd.json_normalize(awsc['userIdentity'])],axis=1)。drop('userIdentity',1)
4 awsc.rename(列={'type':'userIdentity\u type',
5'principalId':'userIdentity\u principalId',
~/anaconda3/envs/envs/environment/lib/python3.8/site-packages/pandas/core/frame.py in\uuuu\u getitem\uuuu(self,key)
2900如果self.columns.nlevels>1:
2901返回自我。\u获取项目\u多级(键)
->2902索引器=self.columns.get_loc(键)
2903如果是_整数(索引器):
2904索引器=[索引器]
get_loc中的~/anaconda3/envs/envs/environment/lib/python3.8/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
2889自动返回引擎。获取锁定(铸造键)
2890除KeyError作为错误外:
->2891从err升起钥匙错误(钥匙)
2892
2893如果公差不是无:
KeyError:“用户标识”
运行print(awss.info())或print(awsc.info())时数据帧awsc的输出:


索引:0个条目
空DataFrameNone
有没有解决这个问题的方法?问题是来自熊猫还是水蟒?

使用OP的代码
  • 创建数据帧的方法不正确,即way
    awsc
    为空
  • 如果没有看到文件,就不可能知道
    pd.read\u json(file,lines=True)
    是否是正确的使用方法
  • pd.json\u normalize(awsc['userIdentity'])
    将在
    dicts
    列上工作。不过,该列很可能是字符串。
    • 如果
      dict
      str
      类型,请使用
      ast.literal\u eval
      将其转换为
      dict
      类型
将熊猫作为pd导入
从ast导入文字值
#将数据帧添加到一个列表中
awsc_列表=列表()
#遍历的列表并将其附加到awsc_列表
对于文件列表中的文件:
awsc_list.append(pd.read_json(文件,行=True))
#将文件合并到单个数据帧中
awsc=pd.concat(awsc\u列表)。重置索引(drop=True)
#如果userIdentity列包含str类型,则将其转换为dict类型
awsc.userIdentity=awsc.userIdentity.apply(literal\u eval)
#规范化用户身份
normalized=pd.json\u normalize(awsc['userIdentity',sep=''uu')
#使用normalized连接awsc并删除userIdentity列
awsc=awsc.join(规范化).drop('userIdentity',1)
#重命名列
重命名(列={'type':'userIdentity\u type',
'principalId':'userIdentity\u principalId',
'arn':'userIdentity\u arn',
“accountId”:“userIdentity\u accountId”,
'accessKeyId':'userIdentity\u accessKeyId',
'userName':'userIdentity\u userName',},inplace=True)
带有示例数据的新代码
  • 密钥已具有正确的名称,因此不需要重命名
  • 使用
    .json\u normalize
    读取日志,规范化
    'userIdentity'
    ,因此不需要第二步
  • 也看到
test2.json
  • 一个JSON
{
“记录”:[{
“eventVersion”:“1.0”,
“用户标识”:{
“类型”:“IAMUser”,
“principalId”:“前任校长ID”,
“arn”:“arn:aws:iam::123456789012:user/Alice”,
“accessKeyId”:“示例密钥ID”,
“accountId”:“123456789012”,
“用户名”:“爱丽丝”
},
“事件时间”:“2014-03-06T21:22:54Z”,
“eventSource”:“ec2.amazonaws.com”,
“eventName”:“StartInstances”,
“awsRegion”:“us-east-2”,
“sourceIPAddress”:“205.251.233.176”,
“用户代理”:“ec2 api工具1.6.12.2”,
“请求参数”:{
“InstanceSet”:{
“项目”:[{
“实例ID”:“i-ebeaf9e2”
}
]
}
},
“响应要素”:{
“InstanceSet”:{
“项目”:[{
“实例ID”:“i-ebeaf9e2”,
“当前状态”:{
“代码”:0,
“名称”:“待定”
},
“先前状态”:{
“代码”:80,
“名称”:“已停止”
}
}
]
}
}
}
]
}

非常感谢您提供的代码!我只是
 <class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrameNone
import json
import pandas as pd

# crate a list to add dataframes to
awsc_list = list()

# list of files
files_list = ['test.json', 'test2.json']

# read the filess
for file in files_list:
    with open(file, 'r', encoding='utf-8') as f:
        data = json.loads(f.read())
    
    # normalize the file and append it to the list of dataframe
    awsc_list.append(pd.json_normalize(data, 'Records', sep='_'))
    
# concat the files into a single dataframe
awsc = pd.concat(awsc_list).reset_index(drop=True)

# display(awsc)
  eventVersion             eventTime        eventSource       eventName  awsRegion  sourceIPAddress                                                                                 userAgent userIdentity_type userIdentity_principalId                      userIdentity_arn userIdentity_accessKeyId userIdentity_accountId userIdentity_userName requestParameters_instancesSet_items                                                                                                 responseElements_instancesSet_items requestParameters_force userIdentity_sessionContext_attributes_mfaAuthenticated userIdentity_sessionContext_attributes_creationDate requestParameters_keyName responseElements_keyName                              responseElements_keyFingerprint responseElements_keyMaterial
0          1.0  2014-03-06T21:22:54Z  ec2.amazonaws.com  StartInstances  us-east-2  205.251.233.176                                                                    ec2-api-tools 1.6.12.2           IAMUser          EX_PRINCIPAL_ID  arn:aws:iam::123456789012:user/Alice           EXAMPLE_KEY_ID           123456789012                 Alice       [{'instanceId': 'i-ebeaf9e2'}]    [{'instanceId': 'i-ebeaf9e2', 'currentState': {'code': 0, 'name': 'pending'}, 'previousState': {'code': 80, 'name': 'stopped'}}]                     NaN                                                     NaN                                                 NaN                       NaN                      NaN                                                          NaN                          NaN
1          1.0  2014-03-06T21:01:59Z  ec2.amazonaws.com   StopInstances  us-east-2  205.251.233.176                                                                    ec2-api-tools 1.6.12.2           IAMUser          EX_PRINCIPAL_ID  arn:aws:iam::123456789012:user/Alice           EXAMPLE_KEY_ID           123456789012                 Alice       [{'instanceId': 'i-ebeaf9e2'}]  [{'instanceId': 'i-ebeaf9e2', 'currentState': {'code': 64, 'name': 'stopping'}, 'previousState': {'code': 16, 'name': 'running'}}]                   False                                                     NaN                                                 NaN                       NaN                      NaN                                                          NaN                          NaN
2          1.0  2014-03-06T17:10:34Z  ec2.amazonaws.com   CreateKeyPair  us-east-2     72.21.198.64  EC2ConsoleBackend, aws-sdk-java/Linux/x.xx.fleetxen Java_HotSpot(TM)_64-Bit_Server_VM/xx           IAMUser          EX_PRINCIPAL_ID  arn:aws:iam::123456789012:user/Alice           EXAMPLE_KEY_ID           123456789012                 Alice                                  NaN                                                                                                                                 NaN                     NaN                                                   false                                2014-03-06T15:15:06Z                 mykeypair                mykeypair  30:1d:46:d0:5b:ad:7e:1b:b6:70:62:8b:ff:38:b5:e9:ab:5d:b8:21       <sensitiveDataRemoved>
3          1.0  2014-03-06T21:22:54Z  ec2.amazonaws.com  StartInstances  us-east-2  205.251.233.176                                                                    ec2-api-tools 1.6.12.2           IAMUser          EX_PRINCIPAL_ID  arn:aws:iam::123456789012:user/Alice           EXAMPLE_KEY_ID           123456789012                 Alice       [{'instanceId': 'i-ebeaf9e2'}]    [{'instanceId': 'i-ebeaf9e2', 'currentState': {'code': 0, 'name': 'pending'}, 'previousState': {'code': 80, 'name': 'stopped'}}]                     NaN                                                     NaN                                                 NaN                       NaN                      NaN                                                          NaN                          NaN