Python 使用第二个字段上的条件为列赋值
我有一个带有日期和位置的熊猫数据框:Python 使用第二个字段上的条件为列赋值,python,pandas,Python,Pandas,我有一个带有日期和位置的熊猫数据框: df1 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', '1-3-2013'], 'locations':['L1','L2','L3']}) 以及另一个数据帧,其具有与每个位置相交的感兴趣点的计数: df2 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', '1-3-2013'], 'locations':['L1','L1',
df1 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013',
'1-3-2013'], 'locations':['L1','L2','L3']})
以及另一个数据帧,其具有与每个位置相交的感兴趣点的计数:
df2 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013',
'1-3-2013'], 'locations':['L1','L1','L1'], 'poi_cts':[23,12,23]})
df2中的日期是df1中日期的一小部分
我想在df1(df1['counts'])中创建一列,对指定日期范围内(例如,df1中日期之前14天内)的每个位置/日期的poi_CT求和
我试过:
def ct_pts(window=14):
Date = row.Date
cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
return cts
df1.apply(ct_pts, axis = 1)
def ct_pts(窗口=14):
日期=行。日期
cts=np.sum(df2[(df2['Date'](Date-np.timedelta64(window,'D')))]['poi_cts'])
返回cts
df1.应用(ct_点,轴=1)
但这不起作用(不确定如何为每一行分配列,我在某个地方看到了这个示例,但它不起作用)
我也可以写这篇专栏文章,但我也在努力:
def ct_pts():
new = pd.DataFrame()
for location in pd.unique(df1['locations']):
subset = df1[df1['locations']==location]
for date in pd.unique(df1['Date']):
df2 = df[df['Date'] == date]
df2['spray'] = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
new = new.append(df2)
return new
def ct_pts():
new=pd.DataFrame()
对于pd.unique中的位置(df1['locations']):
子集=df1[df1['locations']==location]
对于pd.unique中的日期(df1['date']):
df2=df[df['Date']==Date]
df2['spray']=np.sum(df2[(df2['Date'](Date-np.timedelta64(window,'D')))]['poi_cts']))
new=new.append(df2)
还新
这也不行
我觉得我错过了一些非常简单的事情,有没有简单的方法 我正在使用
numpy
boardcast
来加速整个过程
l=[]
for x , y in df1.groupby('locations'):
s=df2.loc[df2.locations==x,'dates'].values
t=y['dates'].values
v=((t[:,None]-s)/np.timedelta64(1, 'D'))
l.extend(np.dot(((v>-14)&(v<=0)),df2.loc[df2.locations==x,'poi_cts'].values))
df1['cts']=l
df1
Out[167]:
dates locations cts
0 2013-01-01 L1 23
1 2013-02-01 L2 0
2 2013-03-01 L3 0
l=[]
对于df1.groupby('locations')中的x,y:
s=df2.loc[df2.locations==x,'dates'].值
t=y['dates']。值
v=((t[:,None]-s)/np.timedelta64(1,'D'))
l、 扩展(np.dot)((v>-14)和(v这可能会稍微慢一点,但下面是如何使用apply
:
创建一个新列以获取开始日期
,以便更容易筛选:
df1['dates'] = pd.to_datetime(df1['dates'])
df1['start_dates'] = df1['dates'] - pd.to_timedelta(14, unit='d')
在整个数据帧上应用函数:
def ct_pts(row):
df_fil = df2[(df2['dates'] <= row['dates']) & (df2['dates'] >=
row['start_dates']) & (df2['locations'] == row['locations'])]
row['counts'] = sum(df_fil['poi_cts'])
return row
df1 = df1.apply(ct_pts, axis=1)
我首次尝试使用apply进行工作:
def num_spray(row):
Date = row['Date']
cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
return cts
df1.apply(ct_pts, axis = 1)
def num_喷雾(世界其他地区):
日期=行[“日期”]
cts=np.sum(df2[(df2['Date'](Date-np.timedelta64(window,'D')))]['poi_cts'])
返回cts
df1.应用(ct_点,轴=1)
def num_spray(row):
Date = row['Date']
cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
return cts
df1.apply(ct_pts, axis = 1)