Python 使用timedelta为df1中的每一行保留df2中的数据帧行
我有两只熊猫。我想保留Python 使用timedelta为df1中的每一行保留df2中的数据帧行,python,pandas,dataframe,Python,Pandas,Dataframe,我有两只熊猫。我想保留df2中的所有行,其中Type等于df1中的Type,并且Date介于df1中的Date之间(-1天或+1天)。我该怎么做 df1 IBSN Type Date 0 1 X 2014-08-17 1 1 Y 2019-09-22 df2 IBSN Type Date 0 2 X 2014-08-16 1 2 D 2019-09-22
df2
中的所有行,其中Type
等于df1
中的Type
,并且Date
介于df1
中的Date
之间(-1天或+1天)。我该怎么做
df1
IBSN Type Date
0 1 X 2014-08-17
1 1 Y 2019-09-22
df2
IBSN Type Date
0 2 X 2014-08-16
1 2 D 2019-09-22
2 9 X 2014-08-18
3 3 H 2019-09-22
4 3 Y 2019-09-23
5 5 G 2019-09-22
res
IBSN Type Date
0 2 X 2014-08-16 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] - 1
1 9 X 2014-08-18 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] + 1
2 3 Y 2019-09-23 <-- keep because Type = df1[1]['Type'] AND Date = df1[1]['Date'] + 1
IBSN类型日期
02 X 2014-08-16这应该可以做到:
import pandas as pd
from datetime import timedelta
# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date']) # might not be necessary if your Date column already contain datetime objects
df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date']) # might not be necessary if your Date column already contain datetime objects
# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))
# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
drop(['Date_from', 'Date_to'], axis=1)
将熊猫作为pd导入
从日期时间导入时间增量
#创建虚拟数据
df1=pd.数据帧([[1,'X','2014-08-17'],[1,'Y','2019-09-22']],列=['IBSN','Type','Date'])
如果日期列已包含datetime对象,则可能不需要df1['Date']=pd.to_datetime(df1['Date'])
df2=pd.数据帧([2,'X','2014-08-16'],[2,'D','2019-09-22'],[9,'X','2014-08-18'],[3,'H','2019-09-22'],[3,'Y','2014-09-23'],[5,'G','2019-09-22'],列=['IBSN',类型','日期']))
如果日期列已包含datetime对象,则可能不需要df2['Date']=pd.to_datetime(df2['Date'])
#将日期边界添加到第一个数据帧
df1['Date_from']=df1['Date']。应用(λx:x-时间增量(天=1))
df1['Date_to']=df1['Date']。应用(λx:x+timedelta(天=1))
#在“类型”中将日期边界合并到df2。筛选日期介于之间的行
#数据起始日期和截止日期(含)。删除“日期从”和“日期到”列
df2=df2.merge(df1.loc[:,['Type','Date\u from','Date\u to']],on='Type',how='left')
df2[(df2['Date']>=df2['Date_from'])和(df2['Date']这应该可以做到:
import pandas as pd
from datetime import timedelta
# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date']) # might not be necessary if your Date column already contain datetime objects
df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date']) # might not be necessary if your Date column already contain datetime objects
# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))
# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
drop(['Date_from', 'Date_to'], axis=1)
将熊猫作为pd导入
从日期时间导入时间增量
#创建虚拟数据
df1=pd.数据帧([[1,'X','2014-08-17'],[1,'Y','2019-09-22']],列=['IBSN','Type','Date'])
如果日期列已包含datetime对象,则可能不需要df1['Date']=pd.to_datetime(df1['Date'])
df2=pd.数据帧([2,'X','2014-08-16'],[2,'D','2019-09-22'],[9,'X','2014-08-18'],[3,'H','2019-09-22'],[3,'Y','2014-09-23'],[5,'G','2019-09-22'],列=['IBSN',类型','日期']))
如果日期列已包含datetime对象,则可能不需要df2['Date']=pd.to_datetime(df2['Date'])
#将日期边界添加到第一个数据帧
df1['Date_from']=df1['Date']。应用(λx:x-时间增量(天=1))
df1['Date_to']=df1['Date']。应用(λx:x+timedelta(天=1))
#将“类型”上的日期边界合并到df2。筛选日期介于
#数据从和日期到(包括)。删除“日期从”和“日期到”列
df2=df2.merge(df1.loc[:,['Type','Date\u from','Date\u to']],on='Type',how='left')
df2[(df2['Date']>=df2['Date_from'])和(df2['Date']假设两个数据帧中的Date
列已经在dtypedatetime
中。我将构造IntervalIndex
以分配给df1
的索引映射df1
的列类型到df2
。最后检查等式以创建要切片的掩码
iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1),
df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]
Out[1131]:
IBSN Type Date
0 2 X 2014-08-16
2 9 X 2014-08-18
4 3 Y 2019-09-23
假设两个数据帧中的Date
列已经在dtypedatetime
中。我将构造IntervalIndex
以分配给df1
的索引Map
列Type
到df2
。最后检查等式以创建要切片的掩码
iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1),
df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]
Out[1131]:
IBSN Type Date
0 2 X 2014-08-16
2 9 X 2014-08-18
4 3 Y 2019-09-23
是的,2014年是一个错误。我已经修复了它。谢谢!@RubenB,在我的数据集df2中有2次a、B、C、D、E村庄,而在df1数据中我只有a、B和C村庄。我希望结果是a、B和C数据,在df2数据中重复2次。我们怎么能得到?是的,2014年是一个错误。我已经修复了。谢谢!@RubenB,在我的数据集df2中有a、B、C、D、E村庄es 2次,在df1数据中,我只有A、B和C村庄。我希望结果是A、B和C数据,在df2数据中重复2次。我们如何获得?在我的数据集中,df2有A、B、C、D、E村庄2次,在df1数据中,我只有A、B和C村庄。我希望结果是df2数据中的A、B和C数据。>>>>将熊猫作为pd导入>>>导入numpy作为np>>>df2=pd.read_excel(“/home/desktop/desktop/df.xlsx”)>>>df2村地区0 A 1 B 2 C 3 D 4 E 5 A 6 B 7 C 8 D 9 E 10>>df1=pd.read_excel(“/home/desktop/desktop/df1.xlsx”)>>>df1村庄区域0 A 1 B 2 C 3>>@Kiran:这似乎不难。但是,请发布一个更详细的新问题。特别是在我的数据集中,df2有两个A、B、C、D、E村庄,在df1数据中,我只有A、B和C村庄。我希望结果是df2数据中的A、B和C数据a.>>>将熊猫作为pd导入>>>将numpy作为np导入>>>df2=pd.read\u excel(“/home/desktop/desktop/df.xlsx”)>>>df2村庄区域0 a 1 B 2 C 3 D 4 E 5 a 6 B 7 C 8 8 D 9 E 10>>df1=pd.read\u excel(“/home/desktop/desktop/df1.xlsx”)>>>df1村地区0 A 1 B 2 C 3>>>@Kiran:这似乎不难。但是,请发布一个更详细的新问题。尤其是详细的预期输出