Python 新老客户数据透视表_Python_Pandas_Numpy_Pandas Groupby

Python 新老客户数据透视表

python pandas numpy

Python 新老客户数据透视表,python,pandas,numpy,pandas-groupby,Python,Pandas,Numpy,Pandas Groupby,我想做一个支点，在那里我可以看到有多少新老客户前来购买。预期产出： customer purchase_id payment_status price currency payment_date 1 Andy 6 REPAID 100 GBP 2020-04-16 2 Randy 10 IN_PROGRESS 10000 SEK 2020-04-17

我想做一个支点，在那里我可以看到有多少新老客户前来购买。预期产出：

  customer  purchase_id    payment_status   price currency  payment_date    
1   Andy        6         REPAID             100    GBP   2020-04-16 
2   Randy      10       IN_PROGRESS          10000  SEK   2020-04-17

我被困在：

       new_customers    old_customers
Jan          1                3
Feb          5                2

我不知道如何将

df2

与

df

集成，其中在

df

中，客户可以出现多次，并具有不同的购买id、价格和付款日期。。代码的最后一部分可能如下所示：

df['year']=df['payment_date'].dt.year
df['month']=df['payment_date'].dt.month
df2=pd.DataFrame(df.groupby("customer", sort=False)["purchase_id"].count())
df2=number_of_purchase.reset_index()
df2.columns = ['merchant_code','number_of_purchase']
df2['repeat_customer']=np.where(df['number_of_purchase']>1,'old_customers','new_customers')

但是请随意更改我的代码，输出更重要。

您的示例数据没有足够的特性，因此我生成了一个与该结构匹配的随机数据集
处理月初的工作要简单得多，所以只需转到这些
新客户需要一个定义，我从您的代码中暗示了它的定义
用这个定义直接计算
最后，根据需要重新调整结果的结构
您可能需要在输出DF中格式化月份

df 顾客购买编号付款状态价格通货付款日期 0 桑特 5. 偿还 9228 瑞典克朗 2020-01-05 00:00:00 1. 特尼图尔 6. 进行中 1458 瑞典克朗 2020-01-12 00:00:00 2. 最大的 7. 进行中 9798 英镑 2020-01-19 00:00:00 3. 桑特 1. 进行中 2418 瑞典克朗 2020-01-26 00:00:00 4. 最大的 8. 进行中 6608 英镑 2020-02-02 00:00:00 5. 桑特 4. 进行中 2341 英镑 2020-02-09 00:00:00 6. 雷姆 7. 偿还 8961 英镑 2020-02-16 00:00:00 7. 码头 2. 偿还 7068 英镑 2020-02-23 00:00:00 8. 最大的 1. 进行中 4872 瑞典克朗 2020-03-01 00:00:00 9 特尼图尔 1. 偿还 2860 英镑 2020-03-08 00:00:00

df.groupby(["year","month", "repeat_customers"])["repeat_customers"].count()

import numpy as np
d = pd.date_range("01-Jan-2020", periods=10, freq="W")
c = ['tenetur', 'quae', 'rem', 'maxime', 'sunt']
df = pd.DataFrame({"customer":np.random.choice(c, len(d)),
             "purchase_id":np.random.randint(1,10, len(d)),
             "payment_status":np.random.choice(["REPAID","IN_PROGRESS"],len(d)),
             "price":np.random.randint(100,10000, len(d)),
             "currency":np.random.choice(["GBP","SEK"],len(d)),
             "payment_date":d})

# only interested with month start
df2 = (df.assign(ms=df.payment_date - pd.to_timedelta(df.payment_date.dt.day-1, "d"),
           # find first time a customer made a purchase
          fms=lambda dfa: dfa.groupby("customer")["ms"].transform("first"),
           # if month of purchase and first month customer made a purchase are same, new ...
          new_customer=lambda dfa: np.where(dfa.ms==dfa.fms, "new_customers", "old_customers")
         )
 # with the prep it's a simple count
 .groupby(["ms","new_customer"])["customer"].count()
 # format the results
 .to_frame().unstack(1).fillna(0).droplevel(0, axis=1).rename_axis("", axis=1)
)