pandas |将带有类似列表/数组字段的json文件读取为布尔列
下面是一个JSON字符串,其中包含一个对象列表,每个对象都嵌入了另一个列表pandas |将带有类似列表/数组字段的json文件读取为布尔列,json,python-3.x,pandas,boolean,dataframe,Json,Python 3.x,Pandas,Boolean,Dataframe,下面是一个JSON字符串,其中包含一个对象列表,每个对象都嵌入了另一个列表 [ { "name": "Alice", "hobbies": [ "volleyball", "shopping", "movies" ] }, { "name": "Bob", "hobbies": [ "fishing", "movies" ] } ] 使用pandas.read_json(
[
{
"name": "Alice",
"hobbies": [
"volleyball",
"shopping",
"movies"
]
},
{
"name": "Bob",
"hobbies": [
"fishing",
"movies"
]
}
]
使用pandas.read_json()
这将变成如下数据帧:
name hobbies
--------------------------------------
1 Alice [volleyball, shopping, movies]
2 Bob [fishing, movies]
name volleyball shopping movies fishing
----------------------------------------------------
1 Alice True True True False
2 Bob False False True True
但是,我想将列表展平为布尔列,如下所示:
name hobbies
--------------------------------------
1 Alice [volleyball, shopping, movies]
2 Bob [fishing, movies]
name volleyball shopping movies fishing
----------------------------------------------------
1 Alice True True True False
2 Bob False False True True
即,当列表包含值时,相应列中的字段将填充布尔值True
,否则将填充False
我还研究了pandas.io.json.json\u normalize(),但这似乎也不支持这个想法。是否有任何内置的方法(蟒蛇3或熊猫)可以做到这一点
(注:我意识到,在将整个列表加载到数据帧之前,您可以编写自己的代码来“规范化”字典对象,但我可能正在用这种方式重新发明轮子,而且可能效率很低)。您可以执行以下操作:
In [56]: data = [
....: {
....: "name": "Alice",
....: "hobbies": [
....: "volleyball",
....: "shopping",
....: "movies"
....: ]
....: },
....: {
....: "name": "Bob",
....: "hobbies": [
....: "fishing",
....: "movies"
....: ]
....: }
....: ]
In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
In [59]: df['count'] = 1
In [60]: df
Out[60]:
hobby name count
0 volleyball Alice 1
1 shopping Alice 1
2 movies Alice 1
3 fishing Bob 1
4 movies Bob 1
In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby fishing movies shopping volleyball
name
Alice 0.0 1.0 1.0 1.0
Bob 1.0 1.0 0.0 0.0
或者更好:
In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
In [89]: r
Out[89]:
hobby fishing movies shopping volleyball
name
Alice 0.0 1.0 1.0 1.0
Bob 1.0 1.0 0.0 0.0
让我们动态生成“布尔”列列表
In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']
In [91]: r = r[cols_boolean].astype(bool)
In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice False True True True
Bob True True False False
您可以通过以下方式使用强制转换到bool
:
谢谢你,在这段时间里,我已经通过反复阅读列表中的字典解决了这个问题。我会研究你的代码。
df=pd.json\u normalize(数据,'cabiods',['name'])。重命名(列={0:'habiody'})
as:20:FutureWarning:pandas.io.json.json\u normalize不推荐使用,改用pandas.json\u normalize