Python 当列名在某个日期范围内时,dataframe将NaN替换为0
我有这样一个数据帧:Python 当列名在某个日期范围内时,dataframe将NaN替换为0,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个数据帧: time A time B 2017-11 2017-12 2018-01 2018-02 2017-01-24 2020-01-01 NaN NaN NaN NaN 2016-11-28 2020-01-01 NaN 4.0 2.0 2.0 2017-03-18 2017-12-21 NaN NaN NaN NaN 当列名称
time A time B 2017-11 2017-12 2018-01 2018-02
2017-01-24 2020-01-01 NaN NaN NaN NaN
2016-11-28 2020-01-01 NaN 4.0 2.0 2.0
2017-03-18 2017-12-21 NaN NaN NaN NaN
当列名称介于时间A和时间B之间时,我希望将所有NaN替换为0。例如,对于第三行,时间范围为2017-03-18到2017-12-21,因此第三行的数据的列名称介于此范围之间,如果是NaN,则将其替换为0,否则保持不变。希望它清楚。谢谢请尝试以下代码:
newdf=df[(df.date>some_date) & (df.date<somedate)]
newdf.fillna(0)
newdf=df[(df.date>some_date)&(df.date也许不是最好的解决方案,但它仍然有效
这是我的测试样本:
d = pd.DataFrame([
{"time A": "2017-01-24", "time B": np.nan, "2016-11": np.nan, "2016-12": np.nan, "2017-01": np.nan, "2017-02": np.nan},
{"time A": "2016-11-28", "time B": np.nan, "2016-11": np.nan, "2016-12": 4, "2017-01": 2, "2017-02": 2},
{"time A": "2016-12-18", "time B": "2017-01-01", "2016-11": np.nan, "2016-12": np.nan, "2017-01": np.nan, "2017-02": np.nan},
])
d["time B"].fillna("2020-01-01", inplace=True)
d.set_index(["time A", "time B"], inplace=True)
初始表格:
time A time B 2016-11 2016-12 2017-01 2017-02
2017-01-24 2020-01-01 NaN NaN NaN NaN
2016-11-28 2020-01-01 NaN 4.0 2.0 2.0
2016-12-18 2017-01-01 NaN NaN NaN NaN
time A time B month value
0 2017-01-24 2020-01-01 2016-11-01 NaN
1 2017-01-24 2020-01-01 2016-12-01 NaN
2 2017-01-24 2020-01-01 2017-01-01 NaN
3 2017-01-24 2020-01-01 2017-02-01 0.0
time A time B 2016-12 2017-01 2017-02
2016-11-28 2020-01-01 4.0 2.0 2.0
2016-12-18 2017-01-01 NaN 0.0 NaN
2017-01-24 2020-01-01 NaN NaN 0.0
看起来时间A
是开放日期,时间B
是关闭日期,或类似的smth。因此,为了方便起见,我用任何未来日期填充了缺少的时间B
,例如'2020-01-01'
我不喜欢使用数据透视表,所以我习惯于对其进行堆栈并格式化日期列:
d_stack = d.stack(dropna=False).reset_index()
d_stack.columns = ["time A", "time B", "month", "value"]
for col in ["time A", "time B"]:
d_stack[col] = pd.to_datetime(d_stack[col], format="%Y-%m-%d", errors="ignore")
d_stack["month"] = pd.to_datetime(d_stack["month"], format="%Y-%m", errors="ignore")
现在填充缺少的值更方便了
def fill_existing(x):
if (x["time A"] <= x["month"] <= x["time B"] and
np.isnan(x["value"])):
return 0
else:
return x["value"]
d_stack["value"] = d_stack.apply(fill_existing, axis=1)
最后,将month
格式化并返回到初始表格格式:
d_stack["month"] = d_stack["month"].apply(lambda x: x.strftime("%Y-%m"))
pd.pivot_table(d_stack, columns="month", index=["time A", "time B"],
values="value", aggfunc=np.sum)
结果:
time A time B 2016-11 2016-12 2017-01 2017-02
2017-01-24 2020-01-01 NaN NaN NaN NaN
2016-11-28 2020-01-01 NaN 4.0 2.0 2.0
2016-12-18 2017-01-01 NaN NaN NaN NaN
time A time B month value
0 2017-01-24 2020-01-01 2016-11-01 NaN
1 2017-01-24 2020-01-01 2016-12-01 NaN
2 2017-01-24 2020-01-01 2017-01-01 NaN
3 2017-01-24 2020-01-01 2017-02-01 0.0
time A time B 2016-12 2017-01 2017-02
2016-11-28 2020-01-01 4.0 2.0 2.0
2016-12-18 2017-01-01 NaN 0.0 NaN
2017-01-24 2020-01-01 NaN NaN 0.0
请编辑您的文章,用您的数据样本替换图像,您的问题不清楚