python panda groupby并消除重复项_Python_Pandas_Pandas Groupby

python panda groupby并消除重复项

python pandas

python panda groupby并消除重复项,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我想去尽可能少的商店买我的产品。我该怎么做？我有一份销售特定产品的商店名单 wanted_Products = pd.DataFrame({'p':[1,2,3,4,5,6,7]}) stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4), 'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]}) # return 1 if the Product

我想去尽可能少的商店买我的产品。我该怎么做？我有一份销售特定产品的商店名单

wanted_Products = pd.DataFrame({'p':[1,2,3,4,5,6,7]})
stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
                       'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]})
# return 1 if the Product is wanted
stores['Wanted'] = stores.Product.isin(wanted_Products.p).values.astype(int)

     Store  Product  Wanted
0       1        1       1
1       1        2       1
2       1        3       1
3       1        5       1
4       2        0       0
5       2        2       1
6       2        3       1
7       2        4       1
8       3        0       0
9       3        6       1
10      3        7       1
11      3        8       0
12      4        0       0
13      4        1       1
14      4        2       1
15      4        6       1

# Group products per store and calculate how many wanted products are in a store
w = stores.groupby('Store', as_index=False).agg(list)
w['Number_wanted'] = stores.groupby('Store', as_index=False)['Wanted'].sum().agg(list)['Wanted']

      Store  Product        Wanted         Number_wanted  ?Products_wanted?
0      1  [1, 2, 3, 5]  [1, 1, 1, 1]              4            [1,2,3,5]
1      2  [0, 2, 3, 4]  [0, 1, 1, 1]              3            [2,3,4]
2      3  [0, 6, 7, 8]  [0, 1, 1, 0]              2            [6,7]
3      4  [0, 1, 2, 6]  [0, 1, 1, 1]              3            [1,2,6]

在没有非通缉产品的情况下，如何在新专栏（通缉产品）中获得我想要的产品？当我使用isin（）时，我只得到真/假（如果我使用astype（int），则为1/0），而不是实际的数字。

一种方法是跟踪商店中可用的所有产品，获取它们，然后将这些产品标记为“take”，这样你就不会在下一家商店中选择相同的产品

因此，最初您有想要的\u产品
=[1,2,3,4,5,6,7]
，因为您从商店1获得
[1,2,3,5]
，您选择并返回这些产品作为产品从商店1获取，然后将所有这些标记为“已获取”只需将
所需产品中的这些值替换为其他值，例如-1（或您喜欢的其他值，表示它们已被采用）现在想要的产品=[-1，-1,4，-1,6,7] <代码>-1 购买了一个，因此您只能从下一家商店购买[4,6,7] 产品。对所有商店重复相同的逻辑将为您提供从那里获得的产品，没有任何重复： def get_products(possible, wanted): i = np.where(np.in1d(wanted, possible)) available = wanted[i] wanted[i] = -1 return available w = stores.groupby('Store', as_index=False).agg(list) w['Products to get'] = w.Product.apply(get_products, args=(np.array(wanted_Products),)) 输出： >>> w Store Product Products to get 0 1 [1, 2, 3, 5] [1, 2, 3, 5] 1 2 [0, 2, 3, 4] [4] 2 3 [0, 6, 7, 8] [6, 7] 3 4 [0, 1, 2, 6] [] 尊重您的优化标准（始终从具有您列表中的最大产品数），每个商店的产品列表每次迭代都需要重新排序：每次您决定从给定的商店获取一组产品时，剩余的列表需要清理（移除已购买的产品）并按长度重新订购作为技术说明，我正在将列表转换为集合，因为您不希望重复，所以这样做是可以的，它给了我们集合运算：交集（检查给定商店中有哪些想要的产品）和差异（从通缉名单中删除已购买的产品。）代码不太优雅，但我包含了不少注释： stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4), 'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]}) # stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4), # 'Product': [0,2,7,6,0,2,4,8,1,2,7,6,1,2,3,5]}) w = stores.groupby('Store', as_index=False).agg(list) w['Products to get'] = np.nan w['Products to get'] = w['Products to get'].astype('object') wanted_Products = [1,2,3,4,5,6,7] wanted = set(wanted_Products) tmp = w[['Store', 'Product']] while len(wanted) > 0: # Removed unwanted products (set intersection) tmp['Product'] = tmp.Product.apply(lambda x: set(x) & wanted) # Sort on length of product sets tmp['lengths'] = tmp.Product.str.len() tmp = tmp.sort_values(by='lengths', ascending=False).drop('lengths', 1) # Get products from this store, remove them from wanted set get = tmp.loc[tmp.index[0], 'Product'] & wanted wanted -= get # Update Products to get for this store row = w[w['Store'] == tmp.loc[tmp.index[0], 'Store']] w.at[row.index[0], 'Products to get'] = get # Remove the largest product set, work on the remaining ones tmp = tmp.iloc[1:, ] 以下是输出： In [71]: w Out[71]: Store Product Products to get 0 1 [1, 2, 3, 5] {1, 2, 3, 5} 1 2 [0, 2, 3, 4] {4} 2 3 [0, 6, 7, 8] {6, 7} 3 4 [0, 1, 2, 6] NaN 3号店和4号店有更多的产品，它仍然有效： stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4), 'Product': [0,2,7,6,0,2,4,8,1,2,7,6,1,2,3,5]}) 输出为： In [76]: w Out[76]: Store Product Products to get 0 1 [0, 2, 7, 6] NaN 1 2 [0, 2, 4, 8] {4} 2 3 [1, 2, 7, 6] {1, 2, 6, 7} 3 4 [1, 2, 3, 5] {3, 5} 到目前为止，您尝试了什么？您的尝试出了什么问题？请提供一个好的起点：我用白名单stores=（stores.assign（key=lambda x:x[“Store”].sub（0））.merge（通缉产品left\on=“key”，right\u index=True，how=“left”）.drop（“key”，axis=“columns”））。首先，我把通缉犯的产品panda@JohannaMarklund你肯定应该在问题本身中包含你的上述评论。这将向人们表明你为解决问题做出了真诚的努力，也可以帮助其他人建立在你试图得到答案的基础上。这几乎是可行的，但如果大多数产品在商店3或4中，它将不起作用。我没有理解你。你能举一个例子说明它不起作用吗？如果你从商店1和3切换产品，那么即使从商店3收集了更多的产品，产品2,3也会被带到商店2中（切换后）。我使用了你的as_索引来获得一个好的数据帧。谢谢，这也是我遇到的一个问题。