Python 在两个特定日期时间范围之间出现的数字
我有2个CSV文件,如下所示Python 在两个特定日期时间范围之间出现的数字,python,pandas,datetime,date-arithmetic,Python,Pandas,Datetime,Date Arithmetic,我有2个CSV文件,如下所示 我想要一个新的列Difference,其中。。。 如果手机号码出现在Book\u date的日期范围内,App\u date:Difference=DifferenceApp\u date和occurrent\u date 如果不在该日期范围内,则为NaN 我还想根据唯一的类别和手机号码对其进行过滤 csv_1 csv_2 我希望在csv_1中有一个新列,其中如果移动电话号码出现在csv_2中Book_date和App_date的日期范围内,则App_date和
Difference
,其中。。。
- 如果手机号码出现在
的日期范围内,Book\u date
:App\u date
=DifferenceDifference
和App\u date
occurrent\u date
- 如果不在该日期范围内,则为NaN李>
Mobile_Number Book_Date App_Date Difference
503477334 2018-10-12 2018-10-18 2
506002884 2018-10-12 2018-10-19 -2
501022162 2018-10-12 2018-10-16 1
503487338 2018-10-13 2018-10-13 0
506012887 2018-10-13 2018-10-21 7
503427339 2018-10-14 2018-10-17 NaN
Category Mobile_Number Book_Date App_Date Difference
A 503477334 2018-10-12 2018-10-18 2
B 503477334 2018-10-07 2018-10-16 3
C 501022162 2018-10-12 2018-10-16 NaN
A 503487338 2018-10-13 2018-10-13 0
C 506012887 2018-10-13 2018-10-21 7
E 503427339 2018-10-14 2018-10-17 NaN
编辑:
如果我想根据上述两个csv文件上的唯一类别和手机号码对其进行过滤。如何做到这一点
csv_1
csv_2
我希望根据手机号码和类别对输出进行过滤
输出
Mobile_Number Book_Date App_Date Difference
503477334 2018-10-12 2018-10-18 2
506002884 2018-10-12 2018-10-19 -2
501022162 2018-10-12 2018-10-16 1
503487338 2018-10-13 2018-10-13 0
506012887 2018-10-13 2018-10-21 7
503427339 2018-10-14 2018-10-17 NaN
Category Mobile_Number Book_Date App_Date Difference
A 503477334 2018-10-12 2018-10-18 2
B 503477334 2018-10-07 2018-10-16 3
C 501022162 2018-10-12 2018-10-16 NaN
A 503487338 2018-10-13 2018-10-13 0
C 506012887 2018-10-13 2018-10-21 7
E 503427339 2018-10-14 2018-10-17 NaN
用于新的系列
匹配的手机号码
和列之间的测试值,然后通过掩码分配值:
编辑:
您可以使用merge
代替map
进行两列联接:
df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])
df3 = df1.merge(df2, on=['Category','Mobile_Number'], how='left')
print (df3)
Category Mobile_Number Book_Date App_Date Occur_Date
0 A 503477334 2018-10-12 2018-10-18 2018-10-16
1 B 503477334 2018-10-07 2018-10-16 2018-10-13
2 C 501022162 2018-10-12 2018-10-16 NaT
3 A 503487338 2018-10-13 2018-10-13 2018-10-13
4 C 506012887 2018-10-13 2018-10-21 2018-10-14
5 E 503427339 2018-10-14 2018-10-17 NaT
m = df3['Occur_Date'].between(df3['Book_Date'], df3['App_Date'])
#print (m)
df3['Difference2'] = np.where(m, df3['App_Date'].sub(df3['Occur_Date']).dt.days, np.nan)
print (df3)
Category Mobile_Number Book_Date App_Date Occur_Date Difference2
0 A 503477334 2018-10-12 2018-10-18 2018-10-16 2.0
1 B 503477334 2018-10-07 2018-10-16 2018-10-13 3.0
2 C 501022162 2018-10-12 2018-10-16 NaT NaN
3 A 503487338 2018-10-13 2018-10-13 2018-10-13 0.0
4 C 506012887 2018-10-13 2018-10-21 2018-10-14 7.0
5 E 503427339 2018-10-14 2018-10-17 NaT NaN
pandas具有
系列.between()
运算符。见1229个现有问题。此外,当您在中或之后阅读datetime列时,它通常有助于将它们转换为datetime,将它们作为字符串没有多大用处。dPac很难理解您的问题,它分散在数据块之间的多个片段中,您能否重写以在第一段中陈述问题?假设您首先在Mobile\u Number
上加入csv\u 1、\u 2
,然后过滤。介于('Book\u date'…'App\u date')
。但是,在这个序列中,您希望按类别过滤到哪里?这很让人困惑,因为你说“根据一个唯一的类别进行过滤”,但是你当前的输出对于category==a','C',对于不同的Book\u Date,App\u Date
值有多个结果。另外,什么是类别
,它来自哪里?……您是将任意类别
值分配给中间结果(例如应用程序日期、书籍日期的不同组合
),还是来自其他地方?无论如何,请编辑您的问题,以重申,它是不清楚的。因此很难找到重复的/相关的问题。我试图编辑它,以便在顶部清楚地陈述问题。a) 无论您指的是“时间范围”、“日期范围”、“日期时间范围”,请尽量保持一致。b) 我们仍然不知道Category
是从哪里来的,它是来自另一个文件,还是只是一些默认分配给临时结果的文件?c) 当你不断地提到“csv_2的专栏”时,它会分散你的注意力。。。“在csv_1中创建新列”。为什么不一开始就将数据合并到一个数据帧中呢?(您始终可以写出单独的列集,以将CSV文件与分隔为_CSV(…,列)
)…但请告诉我们类别
来自何处?!成功了:D你是数据争论中的野兽!谢谢
Category Mobile_Number Book_Date App_Date Difference
A 503477334 2018-10-12 2018-10-18 2
B 503477334 2018-10-07 2018-10-16 3
C 501022162 2018-10-12 2018-10-16 NaN
A 503487338 2018-10-13 2018-10-13 0
C 506012887 2018-10-13 2018-10-21 7
E 503427339 2018-10-14 2018-10-17 NaN
df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])
s1 = df2.drop_duplicates('Mobile_Number').set_index('Mobile_Number')['Occur_Date']
s2 = df1['Mobile_Number'].map(s1)
m = s2.between(df1['Book_Date'], df1['App_Date'])
#solution with no mask
df1['Difference1'] = df1['App_Date'].sub(s2).dt.days
#solution with test between
df1['Difference2'] = np.where(m, df1['App_Date'].sub(s2).dt.days, np.nan)
print (df1)
Mobile_Number Book_Date App_Date Difference Difference1 Difference2
0 503477334 2018-10-12 2018-10-18 2018-10-16 2.0 2.0
1 506002884 2018-10-12 2018-10-19 2018-10-21 -2.0 NaN
2 501022162 2018-10-12 2018-10-16 2018-10-15 1.0 1.0
3 503487338 2018-10-13 2018-10-13 2018-10-13 0.0 0.0
4 506012887 2018-10-13 2018-10-21 2018-10-14 7.0 7.0
5 503427339 2018-10-14 2018-10-17 NaT NaN NaN
df1['Book_Date'] = pd.to_datetime(df1['Book_Date'])
df1['App_Date'] = pd.to_datetime(df1['App_Date'])
df2['Occur_Date'] = pd.to_datetime(df2['Occur_Date'])
df3 = df1.merge(df2, on=['Category','Mobile_Number'], how='left')
print (df3)
Category Mobile_Number Book_Date App_Date Occur_Date
0 A 503477334 2018-10-12 2018-10-18 2018-10-16
1 B 503477334 2018-10-07 2018-10-16 2018-10-13
2 C 501022162 2018-10-12 2018-10-16 NaT
3 A 503487338 2018-10-13 2018-10-13 2018-10-13
4 C 506012887 2018-10-13 2018-10-21 2018-10-14
5 E 503427339 2018-10-14 2018-10-17 NaT
m = df3['Occur_Date'].between(df3['Book_Date'], df3['App_Date'])
#print (m)
df3['Difference2'] = np.where(m, df3['App_Date'].sub(df3['Occur_Date']).dt.days, np.nan)
print (df3)
Category Mobile_Number Book_Date App_Date Occur_Date Difference2
0 A 503477334 2018-10-12 2018-10-18 2018-10-16 2.0
1 B 503477334 2018-10-07 2018-10-16 2018-10-13 3.0
2 C 501022162 2018-10-12 2018-10-16 NaT NaN
3 A 503487338 2018-10-13 2018-10-13 2018-10-13 0.0
4 C 506012887 2018-10-13 2018-10-21 2018-10-14 7.0
5 E 503427339 2018-10-14 2018-10-17 NaT NaN