Python 如何随时间创建重复的数据帧并将其映射到时间列表?
我已经创建了以下数据框:Python 如何随时间创建重复的数据帧并将其映射到时间列表?,python,pandas,date,dataframe,Python,Pandas,Date,Dataframe,我已经创建了以下数据框: df = pd.DataFrame() df['date'] = pd.date_range(start="2019-12-01", end="2019-12-20", freq='D') 我有以下两行 Line Start End Amount A 2019-12-01 2019-12-08 100 B 2019-12-06 2019-12-15 200 我希望得到以下结果: Output: date
df = pd.DataFrame()
df['date'] = pd.date_range(start="2019-12-01", end="2019-12-20", freq='D')
我有以下两行
Line Start End Amount
A 2019-12-01 2019-12-08 100
B 2019-12-06 2019-12-15 200
我希望得到以下结果:
Output:
date amount line
0 2019-12-01 100 A
1 2019-12-02 100 A
2 2019-12-03 100 A
3 2019-12-04 100 A
4 2019-12-05 100 A
5 2019-12-06 300 A,B
6 2019-12-07 300 A,B
7 2019-12-08 300 A,B
8 2019-12-09 200 B
9 2019-12-10 200 B
10 2019-12-11 200 B
11 2019-12-12 200 B
12 2019-12-13 200 B
13 2019-12-14 200 B
14 2019-12-15 200 B
15 2019-12-16 0
16 2019-12-17 0
17 2019-12-18 0
18 2019-12-19 0
19 2019-12-20 0
我能做些什么来实现这一点?我曾尝试使用“地图”功能,但我无法得到结果
对不起,伙计们,如果这两行都有索引,我如何在结果中添加该列?下面是我的交叉合并/连接解决方案:
(pd.merge(*[d.assign(dummy=1) for d in [df, df1]],
on='dummy')
.query('Start <= date <= End')
.groupby('date')['Amount'].sum()
.reindex(df['date'], fill_value=0)
.reset_index()
)
试试这个。假设第二个列表是一个数据帧
import pandas as pd
df = pd.DataFrame()
df['date'] = pd.date_range(start="2019-12-01", end="2019-12-20", freq='D')
df2 = pd.DataFrame({"Start":["2019-12-01","2019-12-06"],"End":["2019-12-08","2019-12-15"],"Amount":[100,200]})
df2["Start"] = pd.to_datetime(df2["Start"])
df2["End"] = pd.to_datetime(df2["End"])
def f(x):
df_ = df2[(df2.Start<= x) & (df2.End>=x)]["Amount"]
v = df_.values
i = df_.index.values
return v,i
s=df.date.apply(lambda x: pd.Series({"amount":sum(f(x)[0]),"line":','.join(map(str, f(x)[1]))}))
df= pd.concat([df,s],axis=1)
我相信,这将提供您想要的结果,数据结构如下:
df['date'] = pd.date_range(start="2019-12-01", end="2019-12-20", freq='D')
df2 = pd.DataFrame({"Start":["2019-12-01","2019-12-06"],"End":["2019-12-08","2019-12-15"],"Amount":[100,200]})
df2.End = df2.End.apply(lambda x: pd.Timestamp(x))
df2.Start = df2.Start.apply(lambda x: pd.Timestamp(x))
df['AM1'] = df.apply(lambda x: df2.Amount[0] if (x.date >= df2.Start[0] and x.date <= df2.End[0]) else 0 , axis = 1)
df['AM2'] = df.apply(lambda x: df2.Amount[1] if (x.date >= df2.Start[1] and x.date <= df2.End[1]) else 0 , axis = 1)
df['Amount'] = df.iloc[:, 1:3].sum(axis=1)
df['line'] = df.groupby(['date']).apply(lambda x: '0' if x.AM1[0] > 0 and x.AM2[0] == 0 else '1' if x.AM2[0] > 0 and x.AM1[0] == 0 else '' if x.AM1[0] == 0 and x.AM2[0] == 0 else '0, 1').to_list()
df.drop(columns=['AM1', 'AM2'], inplace=True)
如果我有100行怎么办?我必须创建一个循环来生成100 df吗?不,你不需要循环。[df,df1]中d的
行仅在两个数据帧上循环。很抱歉,我不太了解[df,df1]]中d的(pd.merge(*[d.assign(dummy=1)for d,on='dummy'),您能解释一下吗?这相当于pd.merge(df.assign(dummy=1),df1.assign(dummy=1),on='dummy')
。我只是在炫耀:D。而且,df.assign(dummy=1)
几乎等同于df['dummy']=1
没有真正触及df
。对不起,如果这两行有索引,我如何在结果中添加该列?你说的“两行有索引”是什么意思?我已经更新了问题。你能看一看吗?很好,很高兴编码。如果行索引更改为字符串,如a和B。在这种情况下,index.values不能我们不能执行这样的结果
import pandas as pd
df = pd.DataFrame()
df['date'] = pd.date_range(start="2019-12-01", end="2019-12-20", freq='D')
df2 = pd.DataFrame({"Start":["2019-12-01","2019-12-06"],"End":["2019-12-08","2019-12-15"],"Amount":[100,200]})
df2["Start"] = pd.to_datetime(df2["Start"])
df2["End"] = pd.to_datetime(df2["End"])
def f(x):
df_ = df2[(df2.Start<= x) & (df2.End>=x)]["Amount"]
v = df_.values
i = df_.index.values
return v,i
s=df.date.apply(lambda x: pd.Series({"amount":sum(f(x)[0]),"line":','.join(map(str, f(x)[1]))}))
df= pd.concat([df,s],axis=1)
date amount line
0 2019-12-01 100 0
1 2019-12-02 100 0
2 2019-12-03 100 0
3 2019-12-04 100 0
4 2019-12-05 100 0
5 2019-12-06 300 0,1
6 2019-12-07 300 0,1
7 2019-12-08 300 0,1
8 2019-12-09 200 1
9 2019-12-10 200 1
10 2019-12-11 200 1
11 2019-12-12 200 1
12 2019-12-13 200 1
13 2019-12-14 200 1
14 2019-12-15 200 1
15 2019-12-16 0
16 2019-12-17 0
17 2019-12-18 0
18 2019-12-19 0
19 2019-12-20 0
df['date'] = pd.date_range(start="2019-12-01", end="2019-12-20", freq='D')
df2 = pd.DataFrame({"Start":["2019-12-01","2019-12-06"],"End":["2019-12-08","2019-12-15"],"Amount":[100,200]})
df2.End = df2.End.apply(lambda x: pd.Timestamp(x))
df2.Start = df2.Start.apply(lambda x: pd.Timestamp(x))
df['AM1'] = df.apply(lambda x: df2.Amount[0] if (x.date >= df2.Start[0] and x.date <= df2.End[0]) else 0 , axis = 1)
df['AM2'] = df.apply(lambda x: df2.Amount[1] if (x.date >= df2.Start[1] and x.date <= df2.End[1]) else 0 , axis = 1)
df['Amount'] = df.iloc[:, 1:3].sum(axis=1)
df['line'] = df.groupby(['date']).apply(lambda x: '0' if x.AM1[0] > 0 and x.AM2[0] == 0 else '1' if x.AM2[0] > 0 and x.AM1[0] == 0 else '' if x.AM1[0] == 0 and x.AM2[0] == 0 else '0, 1').to_list()
df.drop(columns=['AM1', 'AM2'], inplace=True)
date Amount line
0 2019-12-01 100 0
1 2019-12-02 100 0
2 2019-12-03 100 0
3 2019-12-04 100 0
4 2019-12-05 100 0
5 2019-12-06 300 0, 1
6 2019-12-07 300 0, 1
7 2019-12-08 300 0, 1
8 2019-12-09 200 1
9 2019-12-10 200 1
10 2019-12-11 200 1
11 2019-12-12 200 1
12 2019-12-13 200 1
13 2019-12-14 200 1
14 2019-12-15 200 1
15 2019-12-16 0
16 2019-12-17 0
17 2019-12-18 0
18 2019-12-19 0
19 2019-12-20 0