Python 使用timedelta为df1中的每一行保留df2中的数据帧行_Python_Pandas_Dataframe

Python 使用timedelta为df1中的每一行保留df2中的数据帧行

python pandas dataframe

Python 使用timedelta为df1中的每一行保留df2中的数据帧行,python,pandas,dataframe,Python,Pandas,Dataframe,我有两只熊猫。我想保留df2中的所有行，其中Type等于df1中的Type，并且Date介于df1中的Date之间（-1天或+1天）。我该怎么做 df1 IBSN Type Date 0 1 X 2014-08-17 1 1 Y 2019-09-22 df2 IBSN Type Date 0 2 X 2014-08-16 1 2 D 2019-09-22

我有两只熊猫。我想保留

df2

中的所有行，其中

Type

等于

df1

中的

Type

，并且

Date

介于

df1

中的

Date

之间（-1天或+1天）。我该怎么做

df1

   IBSN  Type          Date
0     1     X    2014-08-17
1     1     Y    2019-09-22

df2

   IBSN  Type          Date
0     2     X    2014-08-16
1     2     D    2019-09-22
2     9     X    2014-08-18
3     3     H    2019-09-22
4     3     Y    2019-09-23
5     5     G    2019-09-22

res

   IBSN  Type          Date
0     2     X    2014-08-16 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] - 1
1     9     X    2014-08-18 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] + 1
2     3     Y    2019-09-23 <-- keep because Type = df1[1]['Type'] AND Date = df1[1]['Date'] + 1

IBSN类型日期
02 X 2014-08-16这应该可以做到：
import pandas as pd
from datetime import timedelta

# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date'])  # might not be necessary if your Date column already contain datetime objects

df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date'])  # might not be necessary if your Date column already contain datetime objects


# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))

# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
    drop(['Date_from', 'Date_to'], axis=1)

将熊猫作为pd导入
从日期时间导入时间增量
#创建虚拟数据
df1=pd.数据帧（[[1，'X'，'2014-08-17']，[1，'Y'，'2019-09-22']]，列=['IBSN'，'Type'，'Date']）
如果日期列已包含datetime对象，则可能不需要df1['Date']=pd.to_datetime（df1['Date']）
df2=pd.数据帧（[2，'X'，'2014-08-16']，[2，'D'，'2019-09-22']，[9，'X'，'2014-08-18']，[3，'H'，'2019-09-22']，[3，'Y'，'2014-09-23']，[5，'G'，'2019-09-22']，列=['IBSN'，类型'，'日期']））
如果日期列已包含datetime对象，则可能不需要df2['Date']=pd.to_datetime（df2['Date']）
#将日期边界添加到第一个数据帧
df1['Date_from']=df1['Date']。应用（λx:x-时间增量（天=1））
df1['Date_to']=df1['Date']。应用（λx:x+timedelta（天=1））
#在“类型”中将日期边界合并到df2。筛选日期介于之间的行
#数据起始日期和截止日期（含）。删除“日期从”和“日期到”列
df2=df2.merge（df1.loc[：，['Type'，'Date\u from'，'Date\u to']]，on='Type'，how='left'）
df2[（df2['Date']>=df2['Date_from']）和（df2['Date']这应该可以做到：
import pandas as pd
from datetime import timedelta

# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date'])  # might not be necessary if your Date column already contain datetime objects

df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date'])  # might not be necessary if your Date column already contain datetime objects


# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))

# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
    drop(['Date_from', 'Date_to'], axis=1)

将熊猫作为pd导入
从日期时间导入时间增量
#创建虚拟数据
df1=pd.数据帧（[[1，'X'，'2014-08-17']，[1，'Y'，'2019-09-22']]，列=['IBSN'，'Type'，'Date']）
如果日期列已包含datetime对象，则可能不需要df1['Date']=pd.to_datetime（df1['Date']）
df2=pd.数据帧（[2，'X'，'2014-08-16']，[2，'D'，'2019-09-22']，[9，'X'，'2014-08-18']，[3，'H'，'2019-09-22']，[3，'Y'，'2014-09-23']，[5，'G'，'2019-09-22']，列=['IBSN'，类型'，'日期']））
如果日期列已包含datetime对象，则可能不需要df2['Date']=pd.to_datetime（df2['Date']）
#将日期边界添加到第一个数据帧
df1['Date_from']=df1['Date']。应用（λx:x-时间增量（天=1））
df1['Date_to']=df1['Date']。应用（λx:x+timedelta（天=1））
#将“类型”上的日期边界合并到df2。筛选日期介于
#数据从和日期到（包括）。删除“日期从”和“日期到”列
df2=df2.merge（df1.loc[：，['Type'，'Date\u from'，'Date\u to']]，on='Type'，how='left'）
df2[（df2['Date']>=df2['Date_from']）和（df2['Date']假设两个数据帧中的Date
列已经在dtypedatetime
中。我将构造IntervalIndex
以分配给df1
的索引映射df1
的列类型到df2
。最后检查等式以创建要切片的掩码
iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1), 
                                   df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]

Out[1131]:
   IBSN Type       Date
0     2    X 2014-08-16
2     9    X 2014-08-18
4     3    Y 2019-09-23

假设两个数据帧中的Date
列已经在dtypedatetime
中。我将构造IntervalIndex
以分配给df1
的索引Map
列Type
到df2
。最后检查等式以创建要切片的掩码
iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1), 
                                   df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]

Out[1131]:
   IBSN Type       Date
0     2    X 2014-08-16
2     9    X 2014-08-18
4     3    Y 2019-09-23

是的，2014年是一个错误。我已经修复了它。谢谢！@RubenB，在我的数据集df2中有2次a、B、C、D、E村庄，而在df1数据中我只有a、B和C村庄。我希望结果是a、B和C数据，在df2数据中重复2次。我们怎么能得到？是的，2014年是一个错误。我已经修复了。谢谢！@RubenB，在我的数据集df2中有a、B、C、D、E村庄es 2次，在df1数据中，我只有A、B和C村庄。我希望结果是A、B和C数据，在df2数据中重复2次。我们如何获得？在我的数据集中，df2有A、B、C、D、E村庄2次，在df1数据中，我只有A、B和C村庄。我希望结果是df2数据中的A、B和C数据。>>>>将熊猫作为pd导入>>>导入numpy作为np>>>df2=pd.read_excel（“/home/desktop/desktop/df.xlsx”）>>>df2村地区0 A 1 B 2 C 3 D 4 E 5 A 6 B 7 C 8 D 9 E 10>>df1=pd.read_excel（“/home/desktop/desktop/df1.xlsx”）>>>df1村庄区域0 A 1 B 2 C 3>>@Kiran：这似乎不难。但是，请发布一个更详细的新问题。特别是在我的数据集中，df2有两个A、B、C、D、E村庄，在df1数据中，我只有A、B和C村庄。我希望结果是df2数据中的A、B和C数据a.>>>将熊猫作为pd导入>>>将numpy作为np导入>>>df2=pd.read\u excel（“/home/desktop/desktop/df.xlsx”）>>>df2村庄区域0 a 1 B 2 C 3 D 4 E 5 a 6 B 7 C 8 8 D 9 E 10>>df1=pd.read\u excel（“/home/desktop/desktop/df1.xlsx”）>>>df1村地区0 A 1 B 2 C 3>>>@Kiran：这似乎不难。但是，请发布一个更详细的新问题。尤其是详细的预期输出