Python 基于属性对对象进行分组,并将列表中的其余列组合在一起,这样就得到了不可损坏的类型:“list”
我有一个目标:Python 基于属性对对象进行分组,并将列表中的其余列组合在一起,这样就得到了不可损坏的类型:“list”,python,pandas,Python,Pandas,我有一个目标: obj = [ {"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":1000}}, {"mode":1,"items":[{"id":1}],"people":[{&qu
obj = [
{"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":1000}},
{"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":2000}},
{"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":1000}},
{"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":2000}}
]
我想按模式、项目和值进行分组,并将人员值合并到一个列表中
所以我想得到的结果是:
resObj = [
{"mode":1,"items":[{"id":1}],"people":[{"id":8888},{"id":9999}],"value":{"v":1000}}
{"mode":1,"items":[{"id":1}],"people":[{"id":8888},{"id":9999}],"value":{"v":2000}}
]
当我这样做时:
>>> obj = [{"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":1000}},{"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":2000}},{"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":1000}},{"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":2000}}]
>>> import pandas as pd
>>> df = pd.DataFrame(obj)
>>> df.groupby(['items','mode','value'])['people'].apply(list)
我得到不可损坏的类型:“列表”
这是意料之中的,因为人是一个列表,但我如何才能实现我想要的?另一个问题是,条目也是一个列表,我一直在阅读groupby不适用于不可损坏的类型
有没有办法实现我需要的转变
编辑:我也尝试过:
>>> df['items']=df['items'].apply(lambda x: tuple(x))
>>> df['people']=df['people'].apply(lambda x: tuple(x))
>>> df.groupby(['items','mode','value'])['people'].apply(list)
但是现在我得到了不可破坏的dict类型。应该避免复杂数据,例如数据帧单元格中的list/dict/tuple。也就是说,使用您的数据,您可以解压缩列表中的词典:
(df.groupby([
'mode',
df['items'].apply(lambda x: x[0]['id']),
df['value'].apply(lambda x: x['v'])
], as_index=False).agg({'people':'sum',
'items':'first',
'value':'first'})
.to_dict(orient='records')
)
输出:
[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}},
{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items mode _value
1 1 1000 [{'id': 8888}, {'id': 9999}]
2000 [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object
应避免使用复杂数据,例如数据框单元格中的列表/目录/元组。也就是说,使用您的数据,您可以解压缩列表中的词典:
(df.groupby([
'mode',
df['items'].apply(lambda x: x[0]['id']),
df['value'].apply(lambda x: x['v'])
], as_index=False).agg({'people':'sum',
'items':'first',
'value':'first'})
.to_dict(orient='records')
)
输出:
[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}},
{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items mode _value
1 1 1000 [{'id': 8888}, {'id': 9999}]
2000 [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object
不能按包含列表或dict的列分组,因为它们不可散列。所以事实上,people列不是问题所在,但是item和value列是问题所在。最简单的解决方案是将它们转换为字符串,以便用于分组 此示例显示了如何实现这一点:
df['_items'] = df['items'].apply(lambda item: ",".join([str(x['id']) for x in item]))
df['_value'] = df['value'].apply(lambda value: value['v'])
print(df.groupby(['_items','mode','_value'])['people'].sum())
输出:
[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}},
{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items mode _value
1 1 1000 [{'id': 8888}, {'id': 9999}]
2000 [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object
不能按包含列表或dict的列分组,因为它们不可散列。所以事实上,people列不是问题所在,但是item和value列是问题所在。最简单的解决方案是将它们转换为字符串,以便用于分组 此示例显示了如何实现这一点:
df['_items'] = df['items'].apply(lambda item: ",".join([str(x['id']) for x in item]))
df['_value'] = df['value'].apply(lambda value: value['v'])
print(df.groupby(['_items','mode','_value'])['people'].sum())
输出:
[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}},
{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items mode _value
1 1 1000 [{'id': 8888}, {'id': 9999}]
2000 [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object