Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
pandas |将带有类似列表/数组字段的json文件读取为布尔列_Json_Python 3.x_Pandas_Boolean_Dataframe - Fatal编程技术网

pandas |将带有类似列表/数组字段的json文件读取为布尔列

pandas |将带有类似列表/数组字段的json文件读取为布尔列,json,python-3.x,pandas,boolean,dataframe,Json,Python 3.x,Pandas,Boolean,Dataframe,下面是一个JSON字符串,其中包含一个对象列表,每个对象都嵌入了另一个列表 [ { "name": "Alice", "hobbies": [ "volleyball", "shopping", "movies" ] }, { "name": "Bob", "hobbies": [ "fishing", "movies" ] } ] 使用pandas.read_json(

下面是一个JSON字符串,其中包含一个对象列表,每个对象都嵌入了另一个列表

[
  {
    "name": "Alice",
    "hobbies": [
      "volleyball",
      "shopping",
      "movies"
    ]
  },
  {
    "name": "Bob",
    "hobbies": [
      "fishing",
      "movies"
    ]
  }
]
使用
pandas.read_json()
这将变成如下数据帧:

  name      hobbies
  --------------------------------------
1 Alice     [volleyball, shopping, movies]
2 Bob       [fishing, movies]
  name      volleyball  shopping    movies  fishing 
  ----------------------------------------------------
1 Alice     True        True        True    False
2 Bob       False       False       True    True
但是,我想将列表展平为布尔列,如下所示:

  name      hobbies
  --------------------------------------
1 Alice     [volleyball, shopping, movies]
2 Bob       [fishing, movies]
  name      volleyball  shopping    movies  fishing 
  ----------------------------------------------------
1 Alice     True        True        True    False
2 Bob       False       False       True    True
即,当列表包含值时,相应列中的字段将填充布尔值
True
,否则将填充
False

我还研究了pandas.io.json.json\u normalize(),但这似乎也不支持这个想法。是否有任何内置的方法(蟒蛇3或熊猫)可以做到这一点


(注:我意识到,在将整个列表加载到数据帧之前,您可以编写自己的代码来“规范化”字典对象,但我可能正在用这种方式重新发明轮子,而且可能效率很低)。

您可以执行以下操作:

In [56]: data = [
   ....:   {
   ....:     "name": "Alice",
   ....:     "hobbies": [
   ....:       "volleyball",
   ....:       "shopping",
   ....:       "movies"
   ....:     ]
   ....:   },
   ....:   {
   ....:     "name": "Bob",
   ....:     "hobbies": [
   ....:       "fishing",
   ....:       "movies"
   ....:     ]
   ....:   }
   ....: ]

 In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})

In [59]: df['count'] = 1

In [60]: df
Out[60]:
        hobby   name  count
0  volleyball  Alice      1
1    shopping  Alice      1
2      movies  Alice      1
3     fishing    Bob      1
4      movies    Bob      1

In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0
或者更好:

In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)

In [89]: r
Out[89]:
hobby  fishing  movies  shopping  volleyball
name
Alice      0.0     1.0       1.0         1.0
Bob        1.0     1.0       0.0         0.0
让我们动态生成“布尔”列列表

In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']

In [91]: r = r[cols_boolean].astype(bool)

In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice   False   True     True       True
Bob      True   True    False      False
您可以通过以下方式使用强制转换到
bool


谢谢你,在这段时间里,我已经通过反复阅读列表中的字典解决了这个问题。我会研究你的代码。
df=pd.json\u normalize(数据,'cabiods',['name'])。重命名(列={0:'habiody'})
as
:20:FutureWarning:pandas.io.json.json\u normalize不推荐使用,改用pandas.json\u normalize