Python Pandas.DataFrame:添加列的有效方法;自上次事件后的秒数“;

Python Pandas.DataFrame:添加列的有效方法;自上次事件后的秒数“;,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个Pandas.DataFrame,它的标准索引表示秒数,我想在列表中添加一列“自上次事件以来经过的秒数”。具体地说, event = [2, 5] 及 那么我想得到 | | 0 | x | |---:|----:|-----:| | 0 | 0 | <NA> | | 1 | 0 | <NA> | | 2 | 0 | 0 | | 3 | 0 | 1 | | 4 | 0 | 2 | | 5 | 0

我有一个Pandas.DataFrame,它的标准索引表示秒数,我想在列表中添加一列“自上次事件以来经过的秒数”。具体地说,

event = [2, 5]

那么我想得到

|    |   0 |    x |
|---:|----:|-----:|
|  0 |   0 | <NA> |
|  1 |   0 | <NA> |
|  2 |   0 |    0 |
|  3 |   0 |    1 |
|  4 |   0 |    2 |
|  5 |   0 |    0 |
|  6 |   0 |    1 |
显然,为了让它工作,我需要编写
df[“x”]=pd.Series(范围(5+2)).shift(2)

更重要的是,当我执行
df[“x”]=pd.Series(范围(2+5)).shift(5)

|    |   0 |   x |
|---:|----:|----:|
|  0 |   0 | nan |
|  1 |   0 | nan |
|  2 |   0 | nan |
|  3 |   0 | nan |
|  4 |   0 | nan |
|  5 |   0 |   0 |
|  6 |   0 |   1 |
即:上一个已被覆盖。有没有一种方法可以在不被nan覆盖现有值的情况下分配新值? 然后,我可以做类似的事情

for i in event:
    df["x"] = pd.Series(range(len(df))).shift(i)
还是有更有效的方法

记录在案,这是我的天真代码。它工作正常,但效率低下,设计拙劣:

c = 1000000
df["x"] = c
if event:
    idx = 0
    for i in df.itertuples():
        print(i)
        if idx < len(event) and i.Index == event[idx]:
            c = 0
            idx += 1
        df.loc[i.Index, "x"] = c
        c += 1
return df
c=1000000
df[“x”]=c
如果事件:
idx=0
对于df.itertuples()中的i:
印刷品(一)
如果idx
让我们试试这个:

df = pd.DataFrame(np.zeros((7, 1)))
event = [2, 5]

df.loc[event, 0] = 1
df = df.replace(0, np.nan)

grp=df[0].cumsum().ffill()
df['x'] = df.groupby(grp).cumcount().mask(grp.isna())
df
输出:

|    |   0 |   x |
|---:|----:|----:|
|  0 | nan | nan |
|  1 | nan | nan |
|  2 |   1 |   0 |
|  3 | nan |   1 |
|  4 | nan |   2 |
|  5 |   1 |   0 |
|  6 | nan |   1 |
     0    x
0  0.0  NaN
1  0.0  NaN
2  0.0  0.0
3  0.0  1.0
4  0.0  2.0
5  0.0  0.0
6  0.0  1.0
让我们试试这个:

df = pd.DataFrame(np.zeros((7, 1)))
event = [2, 5]

df.loc[event, 0] = 1
df = df.replace(0, np.nan)

grp=df[0].cumsum().ffill()
df['x'] = df.groupby(grp).cumcount().mask(grp.isna())
df
输出:

|    |   0 |   x |
|---:|----:|----:|
|  0 | nan | nan |
|  1 | nan | nan |
|  2 |   1 |   0 |
|  3 | nan |   1 |
|  4 | nan |   2 |
|  5 |   1 |   0 |
|  6 | nan |   1 |
     0    x
0  0.0  NaN
1  0.0  NaN
2  0.0  0.0
3  0.0  1.0
4  0.0  2.0
5  0.0  0.0
6  0.0  1.0

IIUC,你可以做双重分组:

s = df.index.isin(event).cumsum()
# or equivalently
# s = df.loc[event, 0].reindex(df.index).isna().cumsum()

df['x'] = np.where(s>0,df.groupby(s).cumcount(), np.nan)
输出:

|    |   0 |   x |
|---:|----:|----:|
|  0 | nan | nan |
|  1 | nan | nan |
|  2 |   1 |   0 |
|  3 | nan |   1 |
|  4 | nan |   2 |
|  5 |   1 |   0 |
|  6 | nan |   1 |
     0    x
0  0.0  NaN
1  0.0  NaN
2  0.0  0.0
3  0.0  1.0
4  0.0  2.0
5  0.0  0.0
6  0.0  1.0

IIUC,你可以做双重分组:

s = df.index.isin(event).cumsum()
# or equivalently
# s = df.loc[event, 0].reindex(df.index).isna().cumsum()

df['x'] = np.where(s>0,df.groupby(s).cumcount(), np.nan)
输出:

|    |   0 |   x |
|---:|----:|----:|
|  0 | nan | nan |
|  1 | nan | nan |
|  2 |   1 |   0 |
|  3 | nan |   1 |
|  4 | nan |   2 |
|  5 |   1 |   0 |
|  6 | nan |   1 |
     0    x
0  0.0  NaN
1  0.0  NaN
2  0.0  0.0
3  0.0  1.0
4  0.0  2.0
5  0.0  0.0
6  0.0  1.0
谢谢对于“到下一个偶数的时间”列,我从您的答案推断:t=df[::-1].index.isin(event.cumsum()[::-1];df['y']=np.where(t>0,df.groupby(t).cumcount(升序=False),np.nan)谢谢。对于“到下一个偶数的时间”列,我从您的答案推断:t=df[::-1].index.isin(event.cumsum()[::-1];df['y']=np.where(t>0,df.groupby(t).cumcount(升序=False),np.nan)