Pandas 在两个大数据帧上迭代，以拉取值的方式进行矢量化？_Pandas_Loops_Optimization_Iterator_Vectorization

Pandas 在两个大数据帧上迭代，以拉取值的方式进行矢量化？

pandas loops optimization

Pandas 在两个大数据帧上迭代，以拉取值的方式进行矢量化？,pandas,loops,optimization,iterator,vectorization,Pandas,Loops,Optimization,Iterator,Vectorization,我有一个1毫米行的数据框，看起来像这样 shipname timestamp 0 11/1/2019 0 11/2/2019 ... ... 100 10/1/2018 我有第二个数据帧，它有一系列数据，如下所示 shipname dateorigin datedestination 0 10/1/20

我有一个1毫米行的数据框，看起来像这样

shipname       timestamp
0                  11/1/2019
0                  11/2/2019
...                  ...
100                10/1/2018

我有第二个数据帧，它有一系列数据，如下所示

shipname       dateorigin     datedestination
0                10/1/2019       10/5/2019
0                10/20/2019      11/10/2019
...
99               11/1/2019       11/20/2019

我想运行一个函数，如果shipname在DataFrame 2中，并且时间戳在dateorigin和datedestination之间，则返回DF2中的索引

目前我正在使用df.iterrows来完成这项工作，但这会减慢我的PC速度，并使python几乎无法使用。另外，在某些情况下，DF2中的值可能大于1，这是真的（在这种情况下，我只想返回第一个值）。到目前为止，我一直在使用代码

for t in shipbase.itertuples():
    try:
        idx = (t.shipname== df.shipname) & (t.Timestamp >= df.DateOrigin) & (
                t.Timestamp <= df.DateDestination)
        list_index.append(df.loc[idx].index.values)
    except ValueError:
        list_index.append(np.nan)
        print(t)

any help to get this code to work better / optimize would be greatly appreciated. I have been trying to vectorize, but cant think of an easy solution.

用于shippase.itertuples（）中的t：
尝试：
idx=（t.shipname==df.shipname）&（t.Timestamp>=df.DateOrigin）&(
t、 时间戳如果内存没有用完，可以尝试以下操作：
df = pd.merge(df1, df2, how='inner', on='shipname')
# If you can do the merge, and run out of memory after, try to delete df1 and df2 by
# del df1, df2

df= df[df['timestamp'].between(df['dateorigin'], df['datedestination'])]


请注意，pd.merge
可以复制某些行，因为shipname
值在两个数据帧中看起来都不唯一
请参见merge+series.between