Python 熊猫：来自groupby.value_counts（）的Dict_Python_Pandas_Pandas Groupby

Python 熊猫：来自groupby.value_counts（）的Dict

python pandas

Python 熊猫：来自groupby.value_counts（）的Dict,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个熊猫数据框df，其中包含user和product列。它描述了哪些用户购买了哪些产品，说明了重复购买同一产品的原因。例如，如果用户1三次购买产品23，df将为用户1三次包含条目23。对于每个用户，我只对那些被该用户购买三次以上的产品感兴趣。因此，我会执行s=df.groupby（'user'）.product.value\u counts（），然后过滤s=s[s>2]，以丢弃不经常购买的产品。然后，s看起来像这样： user product 3 39190

我有一个熊猫数据框

df

，其中包含

user

和

product

列。它描述了哪些用户购买了哪些产品，说明了重复购买同一产品的原因。例如，如果用户1三次购买产品23，

df

将为用户1三次包含条目23。对于每个用户，我只对那些被该用户购买三次以上的产品感兴趣。因此，我会执行

s=df.groupby（'user'）.product.value\u counts（）

，然后过滤

s=s[s>2]

，以丢弃不经常购买的产品。然后，

看起来像这样：

user     product
3        39190         9
         47766         8
         21903         8
6        21903         5
         38293         5
11       8309          7
         27959         7
         14947         5
         35948         4
         8670          4

过滤数据后，我对频率（右栏）不再感兴趣

如何基于

创建表单

user:product

的dict？我无法访问该系列的各个列/索引。

选项0

s.reset_index().groupby('user').product.apply(list).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

s.groupby(level='user').apply(lambda x: x.loc[x.name].index.tolist()).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

from collections import defaultdict

d = defaultdict(list)

[d[x].append(y) for x, y in s.index.values];

dict(d)

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

选项1

s.reset_index().groupby('user').product.apply(list).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

s.groupby(level='user').apply(lambda x: x.loc[x.name].index.tolist()).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

from collections import defaultdict

d = defaultdict(list)

[d[x].append(y) for x, y in s.index.values];

dict(d)

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

选项2

s.reset_index().groupby('user').product.apply(list).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

s.groupby(level='user').apply(lambda x: x.loc[x.name].index.tolist()).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

from collections import defaultdict

d = defaultdict(list)

[d[x].append(y) for x, y in s.index.values];

dict(d)

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

谢谢，这就解决了！在选项0中，我必须在reset_index（）中提供一个新列名，否则会出现命名错误（如上所述）。