如何按数据帧分组'；在Python中包含列表的单元格？_Python_Pandas_List_Pandas Groupby

如何按数据帧分组'；在Python中包含列表的单元格？

python pandas list

如何按数据帧分组'；在Python中包含列表的单元格？,python,pandas,list,pandas-groupby,Python,Pandas,List,Pandas Groupby,我正在使用Python和Pandas，试图以一种有效的方式，基于ID列表而不是唯一ID总结不同行中的dataframe值 df: Name - ID - Related IDs - Value z - 123 - ['aaa','bbb','ccc'] - 10 w - 456 - ['aaa'] - 20 y - 789 - ['ggg','hhh','jjj'] - 50 x - 012 -

我正在使用Python和Pandas，试图以一种有效的方式，基于ID列表而不是唯一ID总结不同行中的dataframe值

df:

Name  -  ID  - Related IDs          - Value
z     -  123 - ['aaa','bbb','ccc']  -  10
w     -  456 - ['aaa']              -  20
y     -  789 - ['ggg','hhh','jjj']  -  50
x     -  012 - ['jjj','hhh']        -  60
r     -  015 - ['hhh']              -  15

可以尝试按列表的元素分解每一行，但它可能会重复要求和的值，并且在时间和资源方面可能不是一个有效的解决方案

```python
f = {'Sum': 'sum'}

df = df.groupby(['Related IDs']).agg(f) 
#it is not working has is matching element wise 
#rather then by element

df = df.reset_index()
```

我期望的是一个新的列“Sum”，它将具有一个或多个公共相关ID的行的值“Value”相加。详情如下：

Name  -  ID  - Related IDs          - Value - Sum
z     -  123 - ['aaa','bbb','ccc']  -  10  -  30
w     -  456 - ['aaa']              -  20  -  30
y     -  789 - ['ggg','hhh','jjj']  -  50  -  125
x     -  012 - ['jjj','hhh']        -  60  -  125
r     -  015 - ['hhh']              -  15  -  125

将

networkx

用于：

df['Sum']=df.groupby（['Related id']）['Value'].transform（'Sum'）

尝试下面的操作，我得到了一个不可修复的错误类型

df=pd.DataFrame（{'Related id'：[['aaa'，'bbb'，'ccc']，['aaa']，['ddd'，'eee'，'fff']]，'Val'：[40010060]}）打印（df）df['Sum']=df.groupby（['Related id']）['Val']）。转换（'Sum'）

@jezrael dup问题不能解决这个问题<代码>列表不可散列，因此不能是

groupby

@QuangHoang-让我们去回答；）谢谢你的评论。对列表进行标签编码怎么样？或者把它们变成绳子？

import networkx as nx
from itertools import combinations, chain

#if necessary convert to lists 
df['Related IDs'] = df['Related IDs'].apply(ast.literal_eval)

#create edges (can only connect two nodes)
L2_nested = [list(combinations(l,2)) for l in df['Related IDs']]
L2 = list(chain.from_iterable(L2_nested))
print (L2)
[('aaa', 'bbb'), ('aaa', 'ccc'), ('bbb', 'ccc'), 
 ('ggg', 'hhh'), ('ggg', 'jjj'), ('hhh', 'jjj'), ('jjj', 'hhh')]

#create the graph from the dataframe
G=nx.Graph()
G.add_edges_from(L2)
connected_comp = nx.connected_components(G)

#create dict for common values
node2id = {x: cid for cid, c in enumerate(connected_comp) for x in c}

#create groups by mapping first value of column Related IDs
groups = [node2id.get(x[0]) for x in df['Related IDs']]
print (groups)
[0, 0, 1, 1, 1]

#get sum to new column
df['Sum'] = df.groupby(groups)['Value'].transform('sum')
print (df)
  Name   ID      Related IDs  Value  Sum
0    z  123  [aaa, bbb, ccc]     10   30
1    w  456            [aaa]     20   30
2    y  789  [ggg, hhh, jjj]     50  125
3    x   12       [jjj, hhh]     60  125
4    r   15            [hhh]     15  125