Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python JSON到DF_Python_Json_Pandas - Fatal编程技术网

Python JSON到DF

Python JSON到DF,python,json,pandas,Python,Json,Pandas,我有一个来自Azure Firewall的数据集(防火墙日志),它以JSON格式存储在Blob存储中。JSON代码如下所示 { "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1551130Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DE

我有一个来自Azure Firewall的数据集(防火墙日志),它以JSON格式存储在Blob存储中。JSON代码如下所示

{ "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1551130Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SEA-DEV", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"TCP request from 172.16.1.218:54652 to 172.17.1.219:8080. Action: Allow"}}
{ "category": "AzureFirewallNetworkRule", "time": "2021-01-31T00:00:00.1268490Z", "resourceId": "/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDERS/MICROSOFT.NETWORK/AZUREFIREWALLS/SEA-DEV", "operationName": "AzureFirewallNetworkRuleLog", "properties": {"msg":"UDP request from 172.16.1.218:53067 to 8.8.8.8:53. Action: Allow"}}
每天有数百万条线路要通过,以便再次将源IP分组到允许或拒绝的端口,因此我认为使用JN分析这些数据是可行的

问题:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-61-3500c0d62d55> in <module>
      7 # load data using Python JSON module
      8 with open('FWLog/FWLog2.json','r') as f:
----> 9     data = json.loads(f.read())
     10 # Flatten data
     11 df_nested_list = pd.json_normalize(data, record_path =['properties'])

~\anaconda3\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    355             parse_int is None and parse_float is None and
    356             parse_constant is None and object_pairs_hook is None and not kw):
--> 357         return _default_decoder.decode(s)
    358     if cls is None:
    359         cls = JSONDecoder

~\anaconda3\lib\json\decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 386)
我尝试了下面的代码,但在尝试展平“属性”时遇到了问题,我想要“msg”

错误:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-61-3500c0d62d55> in <module>
      7 # load data using Python JSON module
      8 with open('FWLog/FWLog2.json','r') as f:
----> 9     data = json.loads(f.read())
     10 # Flatten data
     11 df_nested_list = pd.json_normalize(data, record_path =['properties'])

~\anaconda3\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    355             parse_int is None and parse_float is None and
    356             parse_constant is None and object_pairs_hook is None and not kw):
--> 357         return _default_decoder.decode(s)
    358     if cls is None:
    359         cls = JSONDecoder

~\anaconda3\lib\json\decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 386)
---------------------------------------------------------------------------
JSONDecodeError回溯(最近一次调用)
在里面
7#使用Python JSON模块加载数据
8将open('FWLog/FWLog2.json','r')作为f:
---->9 data=json.load(f.read())
10#展平数据
11 df_nested_list=pd.json_normalize(数据、记录路径=['properties'])
加载中的~\anaconda3\lib\json\\uuuuu init\uuuuuuuuuuuuuuuuuuuuuuuuupy(s、cls、object\u hook、parse\u float、parse\u int、parse\u constant、object\u pairs\u hook、**kw)
355 parse_int为无,parse_float为无且
356 parse_常量为None且对象_pairs_hook为None且非kw):
-->357返回默认解码器。解码
358如果cls为无:
359 cls=JSONDecoder
解码中的~\anaconda3\lib\json\decoder.py(self,s,\u w)
338 end=_w(s,end).end()
339如果结束!=(s)
-->340 raise JSONDecodeError(“额外数据”,s,结束)
341返回obj
342
JSONDecodeError:额外数据:第2行第1列(char 386)

您可以在
pd中使用
lines=True

df = pd.read_json("your_file.txt", lines=True)
df_final = pd.concat([pd.DataFrame(df.pop("properties").to_list()), df], axis=1)
print(df_final)
印刷品:

msg类别时间资源ID操作名称
0 TCP请求从172.16.1.218:54652到172.17。。。。AzureFirewallNetworkRule 2021-01-31T00:00:00.1551130Z/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDER。。。AzureFirewallNetworkRuleLog
1个UDP请求,从172.16.1.218:53067到8.8.8.8。。。AzureFirewallNetworkRule 2021-01-31T00:00:00.1268490Z/SUBSCRIPTIONS/RESOURCEGROUPS/SEA-DEV/PROVIDER。。。AzureFirewallNetworkRuleLog

文件中有多个JSON。错误发生在json加载中

import json
import pandas as pd

# load data using Python JSON module
with open('test_json.json') as f:
    data = [json.loads(line) for line in f]
# Flatten data
pd.DataFrame([j['properties'] for j in data])

JSON文件format.Right可能存在一些问题。从技术上讲,这不是一个JSON文件。合法的JSON文件必须是单个对象。您拥有的是一组单独的JSON记录。幸运的是,熊猫可以处理这一点,如下面的@Andrej所示。谢谢你,伙计,这正是我要寻找的,这些“线=真”解决了整个神话。以“df_final”开头的第二行结束了结果。:-)
msg
0   TCP request from 172.16.1.218:54652 to 172.17....
1   UDP request from 172.16.1.218:53067 to 8.8.8.8...