Python Dataframe—使用使用行值和列名的计算填充空列
我有一个数据帧df,它可以通过以下方式创建:Python Dataframe—使用使用行值和列名的计算填充空列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据帧df,它可以通过以下方式创建: import pandas as pd import datetime #create the dates to make into columns datestart=datetime.date(2018,1,1) dateend=datetime.date(2018,1,5) newcols=pd.date_range(datestart,dateend).date #create the test data d={'name':['a','b'
import pandas as pd
import datetime
#create the dates to make into columns
datestart=datetime.date(2018,1,1)
dateend=datetime.date(2018,1,5)
newcols=pd.date_range(datestart,dateend).date
#create the test data
d={'name':['a','b','c','d'],'earlydate': [datetime.date(2018,1,1),datetime.date(2018,1,3),datetime.date(2018,1,4),datetime.date(2018,1,5)]}
#create initial test dataframe
df=pd.DataFrame(data=d)
#create the new dataframe with empty newcols
df=pd.concat([df,pd.DataFrame(columns=newcols)])
dresultdata={'name':['a','b','c','d'],
'earlydate': [datetime.date(2018,1,1),datetime.date(2018,1,3),datetime.date(2018,1,4),datetime.date(2018,1,5)],
datetime.date(2018,1,1):[0,-2,-3,-4], #this is the difference in days between the column name and the earlydate
datetime.date(2018,1,2):[-1,1,2,3],
datetime.date(2018,1,3):[-2,0,1,2],
datetime.date(2018,1,4):[-3,-1,0,1]}
dferesult=pd.DataFrame(data=dresultdata)
看起来是这样的:
df
Out[17]:
name earlydate 2018-01-01 ... 2018-01-03 2018-01-04 2018-01-05
0 a 2018-01-01 NaN ... NaN NaN NaN
1 b 2018-01-03 NaN ... NaN NaN NaN
2 c 2018-01-04 NaN ... NaN NaN NaN
3 d 2018-01-05 NaN ... NaN NaN NaN
[4 rows x 7 columns]
dferesult
Out[19]:
name earlydate 2018-01-01 2018-01-02 2018-01-03 2018-01-04
0 a 2018-01-01 0 -1 -2 -3
1 b 2018-01-03 -2 1 0 -1
2 c 2018-01-04 -3 2 1 0
3 d 2018-01-05 -4 3 2 1
我想做的是用newcol名称和earlydate(newcolname(这是一个日期)-earlydate(这是一个日期)之间的天数差来填充所有空的newcol。我希望以“明智”的方式执行此数据帧,而不是使用函数、lambda、apply或for循环。我相当确定这应该能够以数据帧的方式执行,而不是以列或行的方式
可以使用以下方法创建结果/预期结束df:
import pandas as pd
import datetime
#create the dates to make into columns
datestart=datetime.date(2018,1,1)
dateend=datetime.date(2018,1,5)
newcols=pd.date_range(datestart,dateend).date
#create the test data
d={'name':['a','b','c','d'],'earlydate': [datetime.date(2018,1,1),datetime.date(2018,1,3),datetime.date(2018,1,4),datetime.date(2018,1,5)]}
#create initial test dataframe
df=pd.DataFrame(data=d)
#create the new dataframe with empty newcols
df=pd.concat([df,pd.DataFrame(columns=newcols)])
dresultdata={'name':['a','b','c','d'],
'earlydate': [datetime.date(2018,1,1),datetime.date(2018,1,3),datetime.date(2018,1,4),datetime.date(2018,1,5)],
datetime.date(2018,1,1):[0,-2,-3,-4], #this is the difference in days between the column name and the earlydate
datetime.date(2018,1,2):[-1,1,2,3],
datetime.date(2018,1,3):[-2,0,1,2],
datetime.date(2018,1,4):[-3,-1,0,1]}
dferesult=pd.DataFrame(data=dresultdata)
看起来是这样的:
df
Out[17]:
name earlydate 2018-01-01 ... 2018-01-03 2018-01-04 2018-01-05
0 a 2018-01-01 NaN ... NaN NaN NaN
1 b 2018-01-03 NaN ... NaN NaN NaN
2 c 2018-01-04 NaN ... NaN NaN NaN
3 d 2018-01-05 NaN ... NaN NaN NaN
[4 rows x 7 columns]
dferesult
Out[19]:
name earlydate 2018-01-01 2018-01-02 2018-01-03 2018-01-04
0 a 2018-01-01 0 -1 -2 -3
1 b 2018-01-03 -2 1 0 -1
2 c 2018-01-04 -3 2 1 0
3 d 2018-01-05 -4 3 2 1
我通过如下循环完成了这项工作:
for d in newcols:
df.loc[:,d]=d-df.earlydate
但对于大型帧(1米行)来说,这需要永远的时间。欢迎您提出想法!IIUC:
i = pd.to_datetime(df.earlydate.values).values
j = pd.to_datetime(df.columns[2:]).values
df.iloc[:, 2:] = (j - i[:, None]).astype('timedelta64[D]').astype(int)
df
earlydate name 2018-01-01 2018-01-02 2018-01-03 2018-01-04 2018-01-05
0 2018-01-01 a 0 1 2 3 4
1 2018-01-03 b -2 -1 0 1 2
2 2018-01-04 c -3 -2 -1 0 1
3 2018-01-05 d -4 -3 -2 -1 0
IIUC:
Numpy有一个名称
Numpy.newaxis
,用于将数组扩展到另一个维度。这个np.newaxis
就是None
,所以我可以用None
来代替。它的作用是将I
从一个具有形状的数组(4,)
转换为形状(4,1)
这使我能够进行减法运算并调用Numpy的广播。Numpy有一个名称Numpy.newaxis
,用于将数组扩展到另一个维度。那np.newaxis
就是None
,所以我可以使用None
来代替。它的作用是将I
从一个具有形状(4,)
来塑造(4,1)
,这使我能够进行减法运算并调用Numpy的广播。