Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python panda groupby并消除重复项_Python_Pandas_Pandas Groupby - Fatal编程技术网

python panda groupby并消除重复项

python panda groupby并消除重复项,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我想去尽可能少的商店买我的产品。我该怎么做? 我有一份销售特定产品的商店名单 wanted_Products = pd.DataFrame({'p':[1,2,3,4,5,6,7]}) stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4), 'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]}) # return 1 if the Product

我想去尽可能少的商店买我的产品。我该怎么做? 我有一份销售特定产品的商店名单

wanted_Products = pd.DataFrame({'p':[1,2,3,4,5,6,7]})
stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
                       'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]})
# return 1 if the Product is wanted
stores['Wanted'] = stores.Product.isin(wanted_Products.p).values.astype(int)

     Store  Product  Wanted
0       1        1       1
1       1        2       1
2       1        3       1
3       1        5       1
4       2        0       0
5       2        2       1
6       2        3       1
7       2        4       1
8       3        0       0
9       3        6       1
10      3        7       1
11      3        8       0
12      4        0       0
13      4        1       1
14      4        2       1
15      4        6       1

# Group products per store and calculate how many wanted products are in a store
w = stores.groupby('Store', as_index=False).agg(list)
w['Number_wanted'] = stores.groupby('Store', as_index=False)['Wanted'].sum().agg(list)['Wanted']

      Store  Product        Wanted         Number_wanted  ?Products_wanted?
0      1  [1, 2, 3, 5]  [1, 1, 1, 1]              4            [1,2,3,5]
1      2  [0, 2, 3, 4]  [0, 1, 1, 1]              3            [2,3,4]
2      3  [0, 6, 7, 8]  [0, 1, 1, 0]              2            [6,7]
3      4  [0, 1, 2, 6]  [0, 1, 1, 1]              3            [1,2,6]

在没有非通缉产品的情况下,如何在新专栏(通缉产品)中获得我想要的产品?当我使用isin()时,我只得到真/假(如果我使用astype(int),则为1/0),而不是实际的数字。

一种方法是跟踪商店中可用的所有产品,获取它们,然后将这些产品标记为“take”,这样你就不会在下一家商店中选择相同的产品

因此,最初您有
想要的\u产品
=[1,2,3,4,5,6,7]
,因为您从商店1获得
[1,2,3,5]
,您选择并返回这些产品作为产品从商店1获取,然后将所有这些标记为“已获取”只需将
所需产品中的这些值替换为其他值,例如
-1
(或您喜欢的其他值,表示它们已被采用)

现在
想要的产品
=[-1,-1,4,-1,6,7]
<代码>-1
购买了一个,因此您只能从下一家商店购买
[4,6,7]
产品。对所有商店重复相同的逻辑将为您提供从那里获得的产品,没有任何重复:

def get_products(possible, wanted):
    i = np.where(np.in1d(wanted, possible))
    available = wanted[i]
    wanted[i] = -1
    return available

w = stores.groupby('Store', as_index=False).agg(list)
w['Products to get'] = w.Product.apply(get_products, args=(np.array(wanted_Products),))
输出:

>>> w
   Store       Product Products to get
0      1  [1, 2, 3, 5]    [1, 2, 3, 5]
1      2  [0, 2, 3, 4]             [4]
2      3  [0, 6, 7, 8]          [6, 7]
3      4  [0, 1, 2, 6]              []

尊重您的优化标准(始终从具有 您列表中的最大产品数),每个商店的产品列表 每次迭代都需要重新排序:每次您决定 从给定的商店获取一组产品时,剩余的列表需要 清理(移除已购买的产品)并按长度重新订购

作为技术说明,我正在将列表转换为集合,因为您不希望 重复,所以这样做是可以的,它给了我们集合运算:交集 (检查给定商店中有哪些想要的产品)和差异 (从通缉名单中删除已购买的产品。)

代码不太优雅,但我包含了不少注释:

stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
                   'Product': [1,2,3,5,0,2,3,4,0,6,7,8,0,1,2,6]})
# stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
#                    'Product': [0,2,7,6,0,2,4,8,1,2,7,6,1,2,3,5]})

w = stores.groupby('Store', as_index=False).agg(list)
w['Products to get'] = np.nan
w['Products to get'] = w['Products to get'].astype('object')

wanted_Products = [1,2,3,4,5,6,7]
wanted = set(wanted_Products)

tmp = w[['Store', 'Product']]
while len(wanted) > 0:
    # Removed unwanted products (set intersection)
    tmp['Product'] = tmp.Product.apply(lambda x: set(x) & wanted)
    
    # Sort on length of product sets
    tmp['lengths'] = tmp.Product.str.len()
    tmp = tmp.sort_values(by='lengths', ascending=False).drop('lengths', 1)

    # Get products from this store, remove them from wanted set
    get = tmp.loc[tmp.index[0], 'Product'] & wanted
    wanted -= get

    # Update Products to get for this store
    row = w[w['Store'] == tmp.loc[tmp.index[0], 'Store']]
    w.at[row.index[0], 'Products to get'] = get

    # Remove the largest product set, work on the remaining ones
    tmp = tmp.iloc[1:, ]
以下是输出:

In [71]: w
Out[71]: 
   Store       Product Products to get
0      1  [1, 2, 3, 5]    {1, 2, 3, 5}
1      2  [0, 2, 3, 4]             {4}
2      3  [0, 6, 7, 8]          {6, 7}
3      4  [0, 1, 2, 6]             NaN
3号店和4号店有更多的产品,它仍然有效:

stores = pd.DataFrame({'Store': np.repeat(np.arange(1,5),4),
                   'Product': [0,2,7,6,0,2,4,8,1,2,7,6,1,2,3,5]})
输出为:

In [76]: w
Out[76]: 
   Store       Product Products to get
0      1  [0, 2, 7, 6]             NaN
1      2  [0, 2, 4, 8]             {4}
2      3  [1, 2, 7, 6]    {1, 2, 6, 7}
3      4  [1, 2, 3, 5]          {3, 5}

到目前为止,您尝试了什么?您的尝试出了什么问题?请提供一个好的起点:我用白名单stores=(stores.assign(key=lambda x:x[“Store”].sub(0)).merge(通缉产品left\on=“key”,right\u index=True,how=“left”).drop(“key”,axis=“columns”))。首先,我把通缉犯的产品panda@JohannaMarklund你肯定应该在问题本身中包含你的上述评论。这将向人们表明你为解决问题做出了真诚的努力,也可以帮助其他人建立在你试图得到答案的基础上。这几乎是可行的,但如果大多数产品在商店3或4中,它将不起作用。我没有理解你。你能举一个例子说明它不起作用吗?如果你从商店1和3切换产品,那么即使从商店3收集了更多的产品,产品2,3也会被带到商店2中(切换后)。我使用了你的as_索引来获得一个好的数据帧。谢谢,这也是我遇到的一个问题。