Python 将函数应用于获取上一列值的列_Python_Pandas

Python 将函数应用于获取上一列值的列

python pandas

Python 将函数应用于获取上一列值的列,python,pandas,Python,Pandas,我必须使用列值创建一个timeseries来计算客户的最近性我必须使用的公式是R（t）=0，如果客户在该月购买了东西，则R（t-1）+1 我设法计算了一个数据帧 CustomerID -1 0 1 2 3 4 5 6 7 8 9 10 11 12 0 17850 0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1 13047 0 0.0 1.0 0.0 0

我必须使用列值创建一个timeseries来计算客户的最近性

我必须使用的公式是R（t）=0，如果客户在该月购买了东西，则R（t-1）+1

我设法计算了一个数据帧

    CustomerID  -1  0   1   2   3   4   5   6   7   8   9   10  11  12
0   17850   0   0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1   13047   0   0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0
2   12583   0   0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   14688   0   0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4   15311   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3750    15471   0   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
3751    13436   0   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
3752    15520   0   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
3753    14569   0   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
3754    12713   0   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0

其中，如果客户在该月购买了东西，则为0，否则为1。列名表示一个时间段，列“-1”作为伪列

如果当前值为0，如何将每列中的值替换为0，否则如何将前一列的值替换为+1

例如，第二个客户的最终结果应为0 1 0 1 0 1 0 1 2

我知道如何将函数应用于列，但我不知道如何使该函数使用上一列中的值。

您坚持使用列结构吗？在时间序列中，使用行是很常见的，例如，带有CustomerID列的数据帧，hasBoughtThisMonth。然后，您可以使用pandas transform（）轻松地添加recenty列

我还不能这样评论这个问题

import pandas as pd 
import numpy as np

np.random.seed(1)

# Make example dataframe
df = pd.DataFrame({'CustomerID': [1]*12+[2]*12,
                   'Month': [1,2,3,4,5,6,7,8,9,10,11,12]*2,
                   'hasBoughtThisMonth': np.random.randint(2,size=24)})

# Make Recency column by finding contiguous groups of ones, and groupby
contiguous_groups = df['hasBoughtThisMonth'].diff().ne(0).cumsum()

df['Recency']=df.groupby(by=['CustomerID', contiguous_groups], 
            as_index=False)['hasBoughtThisMonth'].cumsum().reset_index(drop=True)

编辑：这是另一种方法。我以两位顾客为例，随机抽取他们一个月内是否买了东西

基本上，您可以透视表，并使用groupby+cumsum来获得结果。注意，我以这种方式避免了您的虚拟列

import pandas as pd 
import numpy as np

np.random.seed(1)

# Make example dataframe
df = pd.DataFrame({'CustomerID': [1]*12+[2]*12,
                   'Month': [1,2,3,4,5,6,7,8,9,10,11,12]*2,
                   'hasBoughtThisMonth': np.random.randint(2,size=24)})

# Make Recency column by finding contiguous groups of ones, and groupby
contiguous_groups = df['hasBoughtThisMonth'].diff().ne(0).cumsum()

df['Recency']=df.groupby(by=['CustomerID', contiguous_groups], 
            as_index=False)['hasBoughtThisMonth'].cumsum().reset_index(drop=True)

结果是

    CustomerID  Month  hasBoughtThisMonth  Recency
0            1      1                   1        1
1            1      2                   1        2
2            1      3                   0        0
3            1      4                   0        0
4            1      5                   1        1
5            1      6                   1        2
6            1      7                   1        3
7            1      8                   1        4
8            1      9                   1        5
9            1     10                   0        0
10           1     11                   0        0
11           1     12                   1        1
12           2      1                   0        0
13           2      2                   1        1
14           2      3                   1        2
15           2      4                   0        0
16           2      5                   0        0
17           2      6                   1        1
18           2      7                   0        0
19           2      8                   0        0
20           2      9                   0        0
21           2     10                   1        1
22           2     11                   0        0
23           2     12                   0        0

你坚持使用柱状结构吗？在时间序列中，使用行是很常见的，例如，带有CustomerID列的数据帧，hasBoughtThisMonth。然后，您可以使用pandas transform（）轻松地添加recenty列

我还不能这样评论这个问题

import pandas as pd 
import numpy as np

np.random.seed(1)

# Make example dataframe
df = pd.DataFrame({'CustomerID': [1]*12+[2]*12,
                   'Month': [1,2,3,4,5,6,7,8,9,10,11,12]*2,
                   'hasBoughtThisMonth': np.random.randint(2,size=24)})

# Make Recency column by finding contiguous groups of ones, and groupby
contiguous_groups = df['hasBoughtThisMonth'].diff().ne(0).cumsum()

df['Recency']=df.groupby(by=['CustomerID', contiguous_groups], 
            as_index=False)['hasBoughtThisMonth'].cumsum().reset_index(drop=True)

编辑：这是另一种方法。我以两位顾客为例，随机抽取他们一个月内是否买了东西

基本上，您可以透视表，并使用groupby+cumsum来获得结果。注意，我以这种方式避免了您的虚拟列

import pandas as pd 
import numpy as np

np.random.seed(1)

# Make example dataframe
df = pd.DataFrame({'CustomerID': [1]*12+[2]*12,
                   'Month': [1,2,3,4,5,6,7,8,9,10,11,12]*2,
                   'hasBoughtThisMonth': np.random.randint(2,size=24)})

# Make Recency column by finding contiguous groups of ones, and groupby
contiguous_groups = df['hasBoughtThisMonth'].diff().ne(0).cumsum()

df['Recency']=df.groupby(by=['CustomerID', contiguous_groups], 
            as_index=False)['hasBoughtThisMonth'].cumsum().reset_index(drop=True)

结果是

    CustomerID  Month  hasBoughtThisMonth  Recency
0            1      1                   1        1
1            1      2                   1        2
2            1      3                   0        0
3            1      4                   0        0
4            1      5                   1        1
5            1      6                   1        2
6            1      7                   1        3
7            1      8                   1        4
8            1      9                   1        5
9            1     10                   0        0
10           1     11                   0        0
11           1     12                   1        1
12           2      1                   0        0
13           2      2                   1        1
14           2      3                   1        2
15           2      4                   0        0
16           2      5                   0        0
17           2      6                   1        1
18           2      7                   0        0
19           2      8                   0        0
20           2      9                   0        0
21           2     10                   1        1
22           2     11                   0        0
23           2     12                   0        0

如果您首先将

CustomerID

设置为索引并转置数据帧，则会更容易

然后应用自定义函数

i、 e.类似于：

df.T.apply(custom_func)

如果您首先将

CustomerID

设置为索引并转置数据帧，则会更容易

然后应用自定义函数

i、 e.类似于：

df.T.apply(custom_func)

只需使用apply函数迭代抛出dataframe的列或行并进行操作

def apply_function(row):
    return [item if i == 0 else 0 if item == 0 else item+row[i-1] for i,item in enumerate(row)]

new_df = df.apply(apply_function, axis=1, result_type='expand')
new_df.columns = df.columns  # just to set previous column names

只需使用apply函数迭代抛出dataframe的列或行并进行操作

def apply_function(row):
    return [item if i == 0 else 0 if item == 0 else item+row[i-1] for i,item in enumerate(row)]

new_df = df.apply(apply_function, axis=1, result_type='expand')
new_df.columns = df.columns  # just to set previous column names

如果当前值为0，上一列的值为+1，请详细说明。@ShubhamSharma

f（x）：=0如果x==0，其他f（x-1）+1

如果当前值为0，上一列的值为+1，请详细说明。@ShubhamSharma

f（x）：=0如果x==0，其他f（x-1）+1

我不能简单地将axis=1设置为使用行吗？但是让一行/列的值取决于上一行/列的值的问题仍然存在，我用一个例子编辑了我的文章。我不能简单地设置axis=1来使用这些行吗？但是让一行/列的值取决于前一行/列的值的问题仍然存在。我用一个例子编辑了我的文章。但是我如何使函数取两个连续的值呢？但是我如何使函数取两个连续的值呢？