Python 将层次结构应用于多个列,根据其他列更改列值的随机数

Python 将层次结构应用于多个列,根据其他列更改列值的随机数,python,pandas,numpy,Python,Pandas,Numpy,我有一个数据集,其中包含客户ID和名为“WEEK1”、“WEEK2”等的指标。如果客户在该周注册,则值为1,否则为0,如下所示: ID WEEK1 WEEK2 WEEK3 WEEK4 WEEK5 1 0 0 1 0 1 2 0 0 0 0 1 3 1 0 1 0 1 4 0 0 0 0 0 5 1 1 1 1 1 6

我有一个数据集,其中包含客户ID和名为“WEEK1”、“WEEK2”等的指标。如果客户在该周注册,则值为1,否则为0,如下所示:

ID WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
1   0     0     1     0     1
2   0     0     0     0     1
3   1     0     1     0     1
4   0     0     0     0     0
5   1     1     1     1     1
6   1     0     0     0     0
7   0     1     1     1     0
if df['WEEK1'] == 1:
    df['WEEK2'] = 0
    df['WEEK3'] = 0
    df['WEEK4'] = 0
    df['WEEK5'] = 0
elif df['WEEK2'] == 1:
    df['WEEK3'] = 0
    df['WEEK4'] = 0
    df['WEEK5'] = 0
... and so on
我要做的是搜索客户注册的第一周,保持该周的指标=1,并将该客户ID的所有其他周指标值更改为0,即O/p:-

ID WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
1   0     0     1     0     0  ## WEEK5 is changed to 0 here
2   0     0     0     0     1  ## nothing changed
3   1     0     0     0     0  ## WEEK3 and WEEK5 is changed to 0
4   0     0     0     0     0
5   1     0     0     0     0
6   1     0     0     0     0
7   0     1     0     0     0
因此,对于每个客户ID,我们找到值为1的第一周,然后将所有下一周的值设为0

现在,我尝试使用if-else,将每个条件逐一放置,如下所示:

ID WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
1   0     0     1     0     1
2   0     0     0     0     1
3   1     0     1     0     1
4   0     0     0     0     0
5   1     1     1     1     1
6   1     0     0     0     0
7   0     1     1     1     0
if df['WEEK1'] == 1:
    df['WEEK2'] = 0
    df['WEEK3'] = 0
    df['WEEK4'] = 0
    df['WEEK5'] = 0
elif df['WEEK2'] == 1:
    df['WEEK3'] = 0
    df['WEEK4'] = 0
    df['WEEK5'] = 0
... and so on
使用if-else在只有5周列的情况下对我有效,但现在我获得了52周列的数据,除了使用if-else之外,我找不到任何替代方法

因此,任何可以在这5列上施加层次结构的东西,也可以扩展到可变数量的列,如52、104等,都会非常有用。

使用:

#if first column is not index
df = df.set_index('ID')
df = df.where(df.shift(axis=1).eq(1).cumsum(axis=1).eq(0), 0)
print (df)
    WEEK1  WEEK2  WEEK3  WEEK4  WEEK5
ID                                   
1       0      0      1      0      0
2       0      0      0      0      1
3       1      0      0      0      0
4       0      0      0      0      0
5       1      0      0      0      0
6       1      0      0      0      0
7       0      1      0      0      0
详细信息和说明

右边的第一个值:

print (df.shift(axis=1))
    WEEK1  WEEK2  WEEK3  WEEK4  WEEK5
ID                                   
1     NaN    0.0    0.0    1.0    0.0
2     NaN    0.0    0.0    0.0    0.0
3     NaN    1.0    0.0    1.0    0.0
4     NaN    0.0    0.0    0.0    0.0
5     NaN    1.0    1.0    1.0    1.0
6     NaN    1.0    0.0    0.0    0.0
7     NaN    0.0    1.0    1.0    1.0
如果可能,通过
1
比较另一个值,如
1
0
,否则省略此步骤:

print (df.shift(axis=1).eq(1))
    WEEK1  WEEK2  WEEK3  WEEK4  WEEK5
ID                                   
1   False  False  False   True  False
2   False  False  False  False  False
3   False   True  False   True  False
4   False  False  False  False  False
5   False   True   True   True   True
6   False   True  False  False  False
7   False  False   True   True   True
通过以下方式获取每行的累积总和:

通过
0
进行比较:

print (df.shift(axis=1).eq(1).cumsum(axis=1).eq(0))
    WEEK1  WEEK2  WEEK3  WEEK4  WEEK5
ID                                   
1    True   True   True  False  False
2    True   True   True   True   True
3    True  False  False  False  False
4    True   True   True   True   True
5    True  False  False  False  False
6    True  False  False  False  False
7    True   True  False  False  False
最后一次通过掩码设置值
False
0
通过: