Python 在等周内迭代
我有一个带有Python 在等周内迭代,python,pandas,date,Python,Pandas,Date,我有一个带有日期、值和isoweek字段的数据框,如下所示: date | value | isoweek ----------------------------- 2018-04-01 | 5 | 2018-13 2018-04-10 | 10 | 2018-15 2018-05-01 | 10 | 2018-18 date | value | isoweek ----------------------------- 2018-04-01
日期
、值
和isoweek
字段的数据框,如下所示:
date | value | isoweek
-----------------------------
2018-04-01 | 5 | 2018-13
2018-04-10 | 10 | 2018-15
2018-05-01 | 10 | 2018-18
date | value | isoweek
-----------------------------
2018-04-01 | 5 | 2018-13
NaN | 0 | 2018-14
2018-04-10 | 10 | 2018-15
NaN | 0 | 2018-16
NaN | 0 | 2018-17
2018-05-01 | 10 | 2018-18
其中,isoweek
是相应日期的年周。我的目标是遍历isoweeks,找到数据中不存在的isoweeks,并在数据框中插入一行,值为0
预期输出如下所示:
date | value | isoweek
-----------------------------
2018-04-01 | 5 | 2018-13
2018-04-10 | 10 | 2018-15
2018-05-01 | 10 | 2018-18
date | value | isoweek
-----------------------------
2018-04-01 | 5 | 2018-13
NaN | 0 | 2018-14
2018-04-10 | 10 | 2018-15
NaN | 0 | 2018-16
NaN | 0 | 2018-17
2018-05-01 | 10 | 2018-18
如何迭代原始数据帧,并在数据中找到所有缺失的isoweeks?可能有点冗长,但在将isoweeks转换为最新版本后,您可以尝试重新采样:
s = pd.to_datetime(df['isoweek']+"-0",format='%Y-%W-%w')
u = df.set_index(s).resample("W").first()
iso_week = u.index.year.astype(str)+'-'+u.index.weekofyear.astype(str)
u['isoweek'] = u['isoweek'].fillna(pd.Series(iso_week,index=u.index))
out = u.fillna({"value":0}).reset_index(drop=True)
您可以尝试使用
apply
:
def func(row):
year = (row.name)
r = row['isoweek'].str.split('-').str[1].astype(int)
min_week = min(r)
max_week = max(r)
val_range = range(min_week, max_week)
missing = (set(val_range) - set(r.values))
for mis_week in missing:
row = (row.append({'isoweek': f"{year}-{mis_week}", 'date': np.nan, 'value':0}, ignore_index=True))
return (row.sort_values(by='isoweek').reset_index(drop=True))
您可以使用生成一个每周从
开始
到结束
的日期列表
dates=pd.Series(pd.date_范围(start=df['date'].min(),end=df['date'].max(),freq='W'))
isoweeks=(dates.dt.isocalendar().year.astype(str)+'-'+dates.dt.isocalendar().week.astype(str)).tolist()
max_isoweek=str(df['date'].max().isocalendar()[0])+'-'+str(df['date'].max().isocalendar()[1])
如果最大等周数不在等周数内:
isoweeks.append(最大isoweeks)
这是为了获得开始
日期和结束
日期之间的所有iso周
然后,您可以将df合并到一个helper数据帧中,以获得所需的内容
df=df.merge(pd.DataFrame({'isoweek':isoweeks}),how='right')
df['value'].fillna(0,原地=真)
太棒了,谢谢,这很有效!
date value isoweek
0 0 2018-04-01 5 2018-13
1 NaN 0 2018-14
2 2018-04-10 10 2018-15
3 NaN 0 2018-16
4 NaN 0 2018-17
5 2018-05-01 10 2018-18
# print(df)
date value isoweek
0 2018-04-01 5.0 2018-13
1 NaT 0.0 2018-14
2 2018-04-10 10.0 2018-15
3 NaT 0.0 2018-16
4 NaT 0.0 2018-17
5 2018-05-01 10.0 2018-18