Python 使用第二个字段上的条件为列赋值_Python_Pandas

Python 使用第二个字段上的条件为列赋值

python pandas

Python 使用第二个字段上的条件为列赋值,python,pandas,Python,Pandas,我有一个带有日期和位置的熊猫数据框： df1 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', '1-3-2013'], 'locations':['L1','L2','L3']}) 以及另一个数据帧，其具有与每个位置相交的感兴趣点的计数： df2 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', '1-3-2013'], 'locations':['L1','L1',

我有一个带有日期和位置的熊猫数据框：

df1 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', 
      '1-3-2013'], 'locations':['L1','L2','L3']})

以及另一个数据帧，其具有与每个位置相交的感兴趣点的计数：

df2 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', 
      '1-3-2013'], 'locations':['L1','L1','L1'], 'poi_cts':[23,12,23]})

df2中的日期是df1中日期的一小部分

我想在df1（df1['counts']）中创建一列，对指定日期范围内（例如，df1中日期之前14天内）的每个位置/日期的poi_CT求和

我试过：

def ct_pts(window=14):

    Date = row.Date

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

return cts

df1.apply(ct_pts, axis = 1)

def ct_pts（窗口=14）：
日期=行。日期
cts=np.sum（df2[（df2['Date']（Date-np.timedelta64（window，'D'）））]['poi_cts']）
返回cts
df1.应用（ct_点，轴=1）

但这不起作用（不确定如何为每一行分配列，我在某个地方看到了这个示例，但它不起作用）

我也可以写这篇专栏文章，但我也在努力：

def ct_pts():
    new = pd.DataFrame()
    for location in pd.unique(df1['locations']):
        subset = df1[df1['locations']==location]
        for date in pd.unique(df1['Date']):
            df2 = df[df['Date'] == date]
            df2['spray'] = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
            new = new.append(df2)
    return new

def ct_pts（）：
new=pd.DataFrame（）
对于pd.unique中的位置（df1['locations']）：
子集=df1[df1['locations']==location]
对于pd.unique中的日期（df1['date']）：
df2=df[df['Date']==Date]
df2['spray']=np.sum（df2[（df2['Date']（Date-np.timedelta64（window，'D'）））]['poi_cts']））
new=new.append（df2）
还新

这也不行

我觉得我错过了一些非常简单的事情，有没有简单的方法

我正在使用

numpy

boardcast

来加速整个过程

l=[]
for x , y in df1.groupby('locations'):
    s=df2.loc[df2.locations==x,'dates'].values
    t=y['dates'].values
    v=((t[:,None]-s)/np.timedelta64(1, 'D'))
    l.extend(np.dot(((v>-14)&(v<=0)),df2.loc[df2.locations==x,'poi_cts'].values))



df1['cts']=l
df1
Out[167]: 
       dates locations  cts
0 2013-01-01        L1   23
1 2013-02-01        L2    0
2 2013-03-01        L3    0

l=[]
对于df1.groupby（'locations'）中的x，y：
s=df2.loc[df2.locations==x，'dates'].值
t=y['dates']。值
v=（（t[：，None]-s）/np.timedelta64（1，'D'））
l、 扩展（np.dot）（（v>-14）和（v这可能会稍微慢一点，但下面是如何使用apply
：
创建一个新列以获取开始日期
，以便更容易筛选：
df1['dates'] = pd.to_datetime(df1['dates'])
df1['start_dates'] = df1['dates'] - pd.to_timedelta(14, unit='d')


在整个数据帧上应用函数：
def ct_pts(row):
    df_fil = df2[(df2['dates'] <= row['dates']) & (df2['dates'] >= 
                  row['start_dates']) & (df2['locations'] == row['locations'])]
    row['counts'] = sum(df_fil['poi_cts'])
    return row

df1 = df1.apply(ct_pts, axis=1)

我首次尝试使用apply进行工作：
def num_spray(row):

    Date = row['Date']

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

    return cts

df1.apply(ct_pts, axis = 1)

def num_喷雾（世界其他地区）：
日期=行[“日期”]
cts=np.sum（df2[（df2['Date']（Date-np.timedelta64（window，'D'）））]['poi_cts']）
返回cts
df1.应用（ct_点，轴=1）

def num_spray(row):

    Date = row['Date']

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

    return cts

df1.apply(ct_pts, axis = 1)