Python GroupBy聚合收集\列表收集\集
我正在尝试获得与Spark GroupBy和Collect_List或Collect_Set在Pandas中类似的功能Python GroupBy聚合收集\列表收集\集,python,pandas-groupby,Python,Pandas Groupby,我正在尝试获得与Spark GroupBy和Collect_List或Collect_Set在Pandas中类似的功能 import pandas as pd (pd.DataFrame ( { 'professorid' : [1,2,3,4,5,1,2], 'studentid': ['a','b','c', 'd','e','b','b'] }
import pandas as pd
(pd.DataFrame
(
{
'professorid' : [1,2,3,4,5,1,2],
'studentid': ['a','b','c', 'd','e','b','b']
}
)
.groupby
(
'professorid'
)
.agg
(
num_students = ('studentid' , 'count'),
studentids = ('studentid' , lambda x: x.unique().tolist())
)
)
这是一个错误:
KeyError:['studentid',]不在索引中
我正在寻找的结果如下所示
如何获得结果。您不需要lambda,可以使用unique:
您不需要lambda,可以使用unique:
我一直在寻找这个词,比如“未爆炸”和“爆炸”的反面,因为它本质上就是这样。谢谢。我已经找了很久了,用unexplore和explode的反义词,因为这就是它的本质。谢谢
import pandas as pd
(pd.DataFrame
(
{
'professorid' : [1,2,3,4,5,1,2],
'studentid': ['a','b','c', 'd','e','b','b']
}
)
.groupby
(
'professorid'
)
.agg
(
num_students = ('studentid' , 'count'),
studentids = ('studentid' , 'unique')
)
)
num_students studentids
professorid
1 2 [a, b]
2 2 [b]
3 1 [c]
4 1 [d]
5 1 [e]