Python 如何在不使用for循环的情况下基于另一个数据帧的值对数据帧进行切片？_Python_Pandas_Merge_Vectorization_Slice

Python 如何在不使用for循环的情况下基于另一个数据帧的值对数据帧进行切片？

python pandas merge

Python 如何在不使用for循环的情况下基于另一个数据帧的值对数据帧进行切片？,python,pandas,merge,vectorization,slice,Python,Pandas,Merge,Vectorization,Slice,我有一个数据帧df1：以及第二数据帧df2：我需要为id_x的每个值切片df1，并计算区间dt_f:dt_l内的行数。对于id_y的值，必须再次执行此操作。最后，应将结果合并到df2上，并将以下数据帧作为输出： df_result.head() = dt_f dt_l n_x n_y id_y id_x 670 715 2000-02-14 2003-09-30 8 10 704 2963 2000-02

我有一个数据帧df1：

以及第二数据帧df2：

我需要为id_x的每个值切片df1，并计算区间dt_f:dt_l内的行数。对于id_y的值，必须再次执行此操作。最后，应将结果合并到df2上，并将以下数据帧作为输出：

df_result.head() = 

               dt_f       dt_l     n_x   n_y
id_y  id_x
670   715   2000-02-14 2003-09-30   8     10 
704   2963  2000-02-11 2004-01-13   13    25 
886   18350 2000-02-09 2001-09-24   32    75
1451  18159 2005-11-14 2007-03-06   48    6

其中，n_xn_y对应于id_xid_y的每个值的间隔dt_f:dt_l中包含的行数

以下是我使用的for循环：

idx_list = df2.index.tolist()
k = 1 
for j in idx_list: 
    n_y = df1[df1.id == j[0]][df2['dt_f'].iloc[k]:df2['dt_l'].iloc[k]]['id'].count() 
    n_x = df1[df1.id == j[1]][df2['dt_f'].iloc[k]:df2['dt_l'].iloc[k]]['id'].count()

不使用for循环就可以实现吗？DataFrame DF1包含大约30000行，我担心一个循环会太慢进程，因为这是整个脚本的一小部分

你想要这样的东西：

#Merge the tables together - making sure we keep the index column
mg = df1.reset_index().merge(df2, left_on = 'id', right_on = 'id_x')

#Select only the rows that are within the start and end
mg = mg[(mg['index'] > mg['dt_f']) & (mg['index'] < mg['dt_l'])]

#Finally count by id_x
mg.groupby('id_x').count()

之后，您需要整理这些列并重复id_y。

为什么n_y与n_x不同？你能给我们看看你的for循环吗？你对目前的答案满意吗？作为旁注，您应该检查如何发布一个。理想情况下，您的输入将导致您期望的输出，这将使人们更容易检查他们的答案，也更容易理解问题。谢谢您的评论。谢谢！它工作得很好！我通过使用mg['dates']而不是mg['index']重置df和I上的索引，使其适应我的代码

idx_list = df2.index.tolist()
k = 1 
for j in idx_list: 
    n_y = df1[df1.id == j[0]][df2['dt_f'].iloc[k]:df2['dt_l'].iloc[k]]['id'].count() 
    n_x = df1[df1.id == j[1]][df2['dt_f'].iloc[k]:df2['dt_l'].iloc[k]]['id'].count()

#Merge the tables together - making sure we keep the index column
mg = df1.reset_index().merge(df2, left_on = 'id', right_on = 'id_x')

#Select only the rows that are within the start and end
mg = mg[(mg['index'] > mg['dt_f']) & (mg['index'] < mg['dt_l'])]

#Finally count by id_x
mg.groupby('id_x').count()