如何从另一个数据帧映射Pandas数据帧w.r.t索引和列中的数据
假设我有两个数据帧,如下所示: DF1:如何从另一个数据帧映射Pandas数据帧w.r.t索引和列中的数据,pandas,dataframe,dictionary,time-series,Pandas,Dataframe,Dictionary,Time Series,假设我有两个数据帧,如下所示: DF1: from datetime import date, timedelta import pandas as pd import numpy as np sdate = date(2019,1,1) # start date edate = date(2019,1,7) # end date required_dates = pd.date_range(sdate,edate-timedelta(days=1),freq='d') # initi
from datetime import date, timedelta
import pandas as pd
import numpy as np
sdate = date(2019,1,1) # start date
edate = date(2019,1,7) # end date
required_dates = pd.date_range(sdate,edate-timedelta(days=1),freq='d')
# initialize list of lists
data = [['2019-01-01', 1001], ['2019-01-03', 1121] ,['2019-01-02', 1500],
['2019-01-02', 1400],['2019-01-04', 1501],['2019-01-01', 1200],
['2019-01-04', 1201],['2019-01-04', 1551],['2019-01-05', 1400]]
# Create the pandas DataFrame
df1 = pd.DataFrame(data, columns = ['OnlyDate', 'TBID'])
df1.sort_values(by='OnlyDate',inplace=True)
df1
OnlyDate TBID
0 2019-01-01 1001
5 2019-01-01 1200
2 2019-01-02 1500
3 2019-01-02 1400
1 2019-01-03 1121
4 2019-01-04 1501
6 2019-01-04 1201
7 2019-01-04 1551
8 2019-01-05 1400
df2=pd.DataFrame(columns=[sorted(df1['TBID'].unique())],index=required_dates)
df2
1001 1121 1200 1201 1400 1500 1501 1551
2019-01-01 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-02 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-03 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-04 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-05 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-06 NaN NaN NaN NaN NaN NaN NaN NaN
DF2:
from datetime import date, timedelta
import pandas as pd
import numpy as np
sdate = date(2019,1,1) # start date
edate = date(2019,1,7) # end date
required_dates = pd.date_range(sdate,edate-timedelta(days=1),freq='d')
# initialize list of lists
data = [['2019-01-01', 1001], ['2019-01-03', 1121] ,['2019-01-02', 1500],
['2019-01-02', 1400],['2019-01-04', 1501],['2019-01-01', 1200],
['2019-01-04', 1201],['2019-01-04', 1551],['2019-01-05', 1400]]
# Create the pandas DataFrame
df1 = pd.DataFrame(data, columns = ['OnlyDate', 'TBID'])
df1.sort_values(by='OnlyDate',inplace=True)
df1
OnlyDate TBID
0 2019-01-01 1001
5 2019-01-01 1200
2 2019-01-02 1500
3 2019-01-02 1400
1 2019-01-03 1121
4 2019-01-04 1501
6 2019-01-04 1201
7 2019-01-04 1551
8 2019-01-05 1400
df2=pd.DataFrame(columns=[sorted(df1['TBID'].unique())],index=required_dates)
df2
1001 1121 1200 1201 1400 1500 1501 1551
2019-01-01 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-02 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-03 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-04 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-05 NaN NaN NaN NaN NaN NaN NaN NaN
2019-01-06 NaN NaN NaN NaN NaN NaN NaN NaN
我试图将(True或1)应用于此DF3数据帧w.r.t,并应用于df1的值,如以下输出:
df3 =df2.copy()
for index, row in df1.iterrows():
df3.loc[row['OnlyDate'],row['TBID']] = 1
df3.fillna(0, inplace=True)
df3
1001 1121 1200 1201 1400 1500 1501 1551
2019-01-01 1 0 1 0 0 0 0 0
2019-01-02 0 0 0 0 1 1 0 0
2019-01-03 0 1 0 0 0 0 0 0
2019-01-04 0 0 0 1 0 0 1 1
2019-01-05 0 0 0 0 1 0 0 0
2019-01-06 0 0 0 0 0 0 0 0
有更好的方法吗?与max
一起用于指示器(始终0,1
)或求和
如果需要计数值:
df = pd.get_dummies(df1.set_index('OnlyDate')['TBID']).max(level=0)
print (df)
1001 1121 1200 1201 1400 1500 1501 1551
OnlyDate
2019-01-01 1 0 1 0 0 0 0 0
2019-01-02 0 0 0 0 1 1 0 0
2019-01-03 0 1 0 0 0 0 0 0
2019-01-04 0 0 0 1 0 0 1 1
2019-01-05 0 0 0 0 1 0 0 0