Python Pandas.DataFrame:添加列的有效方法;自上次事件后的秒数“;
我有一个Pandas.DataFrame,它的标准索引表示秒数,我想在列表中添加一列“自上次事件以来经过的秒数”。具体地说,Python Pandas.DataFrame:添加列的有效方法;自上次事件后的秒数“;,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个Pandas.DataFrame,它的标准索引表示秒数,我想在列表中添加一列“自上次事件以来经过的秒数”。具体地说, event = [2, 5] 及 那么我想得到 | | 0 | x | |---:|----:|-----:| | 0 | 0 | <NA> | | 1 | 0 | <NA> | | 2 | 0 | 0 | | 3 | 0 | 1 | | 4 | 0 | 2 | | 5 | 0
event = [2, 5]
及
那么我想得到
| | 0 | x |
|---:|----:|-----:|
| 0 | 0 | <NA> |
| 1 | 0 | <NA> |
| 2 | 0 | 0 |
| 3 | 0 | 1 |
| 4 | 0 | 2 |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
显然,为了让它工作,我需要编写df[“x”]=pd.Series(范围(5+2)).shift(2)
更重要的是,当我执行df[“x”]=pd.Series(范围(2+5)).shift(5)
| | 0 | x |
|---:|----:|----:|
| 0 | 0 | nan |
| 1 | 0 | nan |
| 2 | 0 | nan |
| 3 | 0 | nan |
| 4 | 0 | nan |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
即:上一个已被覆盖。有没有一种方法可以在不被nan覆盖现有值的情况下分配新值?
然后,我可以做类似的事情
for i in event:
df["x"] = pd.Series(range(len(df))).shift(i)
还是有更有效的方法
记录在案,这是我的天真代码。它工作正常,但效率低下,设计拙劣:
c = 1000000
df["x"] = c
if event:
idx = 0
for i in df.itertuples():
print(i)
if idx < len(event) and i.Index == event[idx]:
c = 0
idx += 1
df.loc[i.Index, "x"] = c
c += 1
return df
c=1000000
df[“x”]=c
如果事件:
idx=0
对于df.itertuples()中的i:
印刷品(一)
如果idx
让我们试试这个:
df = pd.DataFrame(np.zeros((7, 1)))
event = [2, 5]
df.loc[event, 0] = 1
df = df.replace(0, np.nan)
grp=df[0].cumsum().ffill()
df['x'] = df.groupby(grp).cumcount().mask(grp.isna())
df
输出:
| | 0 | x |
|---:|----:|----:|
| 0 | nan | nan |
| 1 | nan | nan |
| 2 | 1 | 0 |
| 3 | nan | 1 |
| 4 | nan | 2 |
| 5 | 1 | 0 |
| 6 | nan | 1 |
0 x
0 0.0 NaN
1 0.0 NaN
2 0.0 0.0
3 0.0 1.0
4 0.0 2.0
5 0.0 0.0
6 0.0 1.0
让我们试试这个:
df = pd.DataFrame(np.zeros((7, 1)))
event = [2, 5]
df.loc[event, 0] = 1
df = df.replace(0, np.nan)
grp=df[0].cumsum().ffill()
df['x'] = df.groupby(grp).cumcount().mask(grp.isna())
df
输出:
| | 0 | x |
|---:|----:|----:|
| 0 | nan | nan |
| 1 | nan | nan |
| 2 | 1 | 0 |
| 3 | nan | 1 |
| 4 | nan | 2 |
| 5 | 1 | 0 |
| 6 | nan | 1 |
0 x
0 0.0 NaN
1 0.0 NaN
2 0.0 0.0
3 0.0 1.0
4 0.0 2.0
5 0.0 0.0
6 0.0 1.0
IIUC,你可以做双重分组:
s = df.index.isin(event).cumsum()
# or equivalently
# s = df.loc[event, 0].reindex(df.index).isna().cumsum()
df['x'] = np.where(s>0,df.groupby(s).cumcount(), np.nan)
输出:
| | 0 | x |
|---:|----:|----:|
| 0 | nan | nan |
| 1 | nan | nan |
| 2 | 1 | 0 |
| 3 | nan | 1 |
| 4 | nan | 2 |
| 5 | 1 | 0 |
| 6 | nan | 1 |
0 x
0 0.0 NaN
1 0.0 NaN
2 0.0 0.0
3 0.0 1.0
4 0.0 2.0
5 0.0 0.0
6 0.0 1.0
IIUC,你可以做双重分组:
s = df.index.isin(event).cumsum()
# or equivalently
# s = df.loc[event, 0].reindex(df.index).isna().cumsum()
df['x'] = np.where(s>0,df.groupby(s).cumcount(), np.nan)
输出:
| | 0 | x |
|---:|----:|----:|
| 0 | nan | nan |
| 1 | nan | nan |
| 2 | 1 | 0 |
| 3 | nan | 1 |
| 4 | nan | 2 |
| 5 | 1 | 0 |
| 6 | nan | 1 |
0 x
0 0.0 NaN
1 0.0 NaN
2 0.0 0.0
3 0.0 1.0
4 0.0 2.0
5 0.0 0.0
6 0.0 1.0
谢谢对于“到下一个偶数的时间”列,我从您的答案推断:t=df[::-1].index.isin(event.cumsum()[::-1];df['y']=np.where(t>0,df.groupby(t).cumcount(升序=False),np.nan)谢谢。对于“到下一个偶数的时间”列,我从您的答案推断:t=df[::-1].index.isin(event.cumsum()[::-1];df['y']=np.where(t>0,df.groupby(t).cumcount(升序=False),np.nan)