Python 基于属性对对象进行分组,并将列表中的其余列组合在一起,这样就得到了不可损坏的类型:“list”

Python 基于属性对对象进行分组,并将列表中的其余列组合在一起,这样就得到了不可损坏的类型:“list”,python,pandas,Python,Pandas,我有一个目标: obj = [ {"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":1000}}, {"mode":1,"items":[{"id":1}],"people":[{&qu

我有一个目标:

obj = [
    {"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":1000}},
    {"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":2000}},
    {"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":1000}},
    {"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":2000}}
]
我想按模式、项目和值进行分组,并将人员值合并到一个列表中

所以我想得到的结果是:

resObj = [
    {"mode":1,"items":[{"id":1}],"people":[{"id":8888},{"id":9999}],"value":{"v":1000}}
    {"mode":1,"items":[{"id":1}],"people":[{"id":8888},{"id":9999}],"value":{"v":2000}}
]
当我这样做时:

>>> obj = [{"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":1000}},{"mode":1,"items":[{"id":1}],"people":[{"id":8888}],"value":{"v":2000}},{"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":1000}},{"mode":1,"items":[{"id":1}],"people":[{"id":9999}],"value":{"v":2000}}]

>>> import pandas as pd
>>> df = pd.DataFrame(obj)
>>> df.groupby(['items','mode','value'])['people'].apply(list)
我得到不可损坏的类型:“列表”

这是意料之中的,因为人是一个列表,但我如何才能实现我想要的?另一个问题是,条目也是一个列表,我一直在阅读groupby不适用于不可损坏的类型

有没有办法实现我需要的转变

编辑:我也尝试过:

>>> df['items']=df['items'].apply(lambda x: tuple(x))
>>> df['people']=df['people'].apply(lambda x: tuple(x))
>>> df.groupby(['items','mode','value'])['people'].apply(list)
但是现在我得到了不可破坏的dict类型。

应该避免复杂数据,例如数据帧单元格中的list/dict/tuple。也就是说,使用您的数据,您可以解压缩列表中的词典:

(df.groupby([
    'mode',
    df['items'].apply(lambda x: x[0]['id']),
    df['value'].apply(lambda x: x['v'])
], as_index=False).agg({'people':'sum',
                        'items':'first',
                        'value':'first'})
   .to_dict(orient='records')
)
输出:

[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}}, 
 {'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items  mode  _value
1       1     1000      [{'id': 8888}, {'id': 9999}]
              2000      [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object
应避免使用复杂数据,例如数据框单元格中的列表/目录/元组。也就是说,使用您的数据,您可以解压缩列表中的词典:

(df.groupby([
    'mode',
    df['items'].apply(lambda x: x[0]['id']),
    df['value'].apply(lambda x: x['v'])
], as_index=False).agg({'people':'sum',
                        'items':'first',
                        'value':'first'})
   .to_dict(orient='records')
)
输出:

[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}}, 
 {'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items  mode  _value
1       1     1000      [{'id': 8888}, {'id': 9999}]
              2000      [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object

不能按包含列表或dict的列分组,因为它们不可散列。所以事实上,people列不是问题所在,但是item和value列是问题所在。最简单的解决方案是将它们转换为字符串,以便用于分组

此示例显示了如何实现这一点:

df['_items'] = df['items'].apply(lambda item: ",".join([str(x['id']) for x in item]))
df['_value'] = df['value'].apply(lambda value: value['v'])
print(df.groupby(['_items','mode','_value'])['people'].sum())
输出:

[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}}, 
 {'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items  mode  _value
1       1     1000      [{'id': 8888}, {'id': 9999}]
              2000      [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object

不能按包含列表或dict的列分组,因为它们不可散列。所以事实上,people列不是问题所在,但是item和value列是问题所在。最简单的解决方案是将它们转换为字符串,以便用于分组

此示例显示了如何实现这一点:

df['_items'] = df['items'].apply(lambda item: ",".join([str(x['id']) for x in item]))
df['_value'] = df['value'].apply(lambda value: value['v'])
print(df.groupby(['_items','mode','_value'])['people'].sum())
输出:

[{'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 1000}}, 
 {'mode': 1, 'people': [{'id': 8888}, {'id': 9999}], 'items': [{'id': 1}], 'value': {'v': 2000}}]
_items  mode  _value
1       1     1000      [{'id': 8888}, {'id': 9999}]
              2000      [{'id': 8888}, {'id': 9999}]
Name: people, dtype: object