Python 查找第一个非零值并计算其后的列_Python_Pandas

Python 查找第一个非零值并计算其后的列

python pandas

Python 查找第一个非零值并计算其后的列,python,pandas,Python,Pandas,我有一个每个客户id的每日交易的大型数据集，示例如下。我试图计算每个客户的生命周期，即客户首次购买后的天数。在样本数据的上下文中，我想计算自上次购买以来的2018-04-14天数：对于id1052，长度为44天 trans_date 2018-03-01 2018-03-02 2018-03-03 ... 2018-04-12 2018-04-13 id

我有一个每个客户id的每日交易的大型数据集，示例如下。我试图计算每个客户的生命周期，即客户首次购买后的天数。在样本数据的上下文中，我想计算自上次购买以来的

2018-04-14

天数：对于id

，长度为44天

trans_date  2018-03-01  2018-03-02  2018-03-03 ... 2018-04-12   2018-04-13  
   id                                                                                   
  1052         30.00       0.0         0.0     ...    14.00          0.0        
  1053          0.0        0.0         10.64   ...    0.0            0.0        
  1054          0.0        8.40        0.0     ...    0.0            9.10       
  1055          0.0        0.0         0.0     ...    0.0            0.0        
  1056          0.0        0.0         0.0     ...    42.21          0.0

我知道如何找到行中的第一个非零值，但使用列索引

trans\u date

计算最近性是我遇到的难题。我该怎么做呢？

试试这个：

# Convert the column names to dates
dates = pd.to_datetime(df.columns)

def customer_lifetime(row):
    # Indices of the days where the customer has made purchase
    purchases = np.where(row > 0)[0]

    if len(purchases) == 0:
        # The customer never bought anything
        return 0
    else:
        # The index of the first purchase date
        i = purchases[0]

    # The "current" date
    j = len(dates) - 1

    # The customer lifetime is the difference between the
    # current date and first purchase date
    return (dates[j] - dates[i]) / pd.Timedelta(days=1)

df['CustomerLifetime'] = df.apply(customer_lifetime, axis=1)

结果:

      2018-03-01  2018-03-02  2018-03-03  2018-04-12  2018-04-13  CustomerLifetime
id                                                                                
1052        30.0         0.0        0.00       14.00         0.0              43.0
1053         0.0         0.0       10.64        0.00         0.0              41.0
1054         0.0         8.4        0.00        0.00         9.1              42.0
1055         0.0         0.0        0.00        0.00         0.0               0.0
1056         0.0         0.0        0.00       42.21         0.0               1.0

这是在检查非零值之后。我们需要为所有0添加一个额外的签入案例，因为

idxmax

将为该行返回

df.columns[0]

，这没有意义。在这里，我选择让它使用

where

NaN

import pandas as pd
#df.columns = pd.to_datetime(df.columns)

my_date = pd.to_datetime('2018-04-14')
u = df.ne(0)

(my_date - u.idxmax(1).where(u.any(1))).dt.days
#id
#1052    44.0
#1053    42.0
#1054    43.0
#1055     NaN
#1056     2.0
#dtype: float64

是否要保留数据框的当前形状？