Python 在不使用嵌套for循环的情况下迭代数据帧_Python_Pandas_Performance

Python 在不使用嵌套for循环的情况下迭代数据帧

python pandas performance

Python 在不使用嵌套for循环的情况下迭代数据帧,python,pandas,performance,Python,Pandas,Performance,我有两个数据帧，一个包含时间戳，另一个包含一些带有时间戳的tweet，看起来像这样。我正在尝试将tweets分配给timestamps dataframe中的tweet列如果时间戳为“t”，则它可以获取在[t-30，t+30]的时间间隔内发出的所有tweet。我在时间戳数据帧中创建了一个名为tweet的新列，该列包含空列表，并尝试使用以下逻辑分配tweet： for i in range(0,len(timestamps)): for j in tweet_data.date:

我有两个数据帧，一个包含时间戳，另一个包含一些带有时间戳的tweet，看起来像这样。我正在尝试将tweets分配给timestamps dataframe中的tweet列

如果时间戳为“t”，则它可以获取在[t-30，t+30]的时间间隔内发出的所有tweet。我在时间戳数据帧中创建了一个名为tweet的新列，该列包含空列表，并尝试使用以下逻辑分配tweet：

for i in range(0,len(timestamps)):
    for j in tweet_data.date:
         if (pd.to_timedelta([(pd.Timestamp(timestamps.date[i])-pd.Timestamp(j))]).astype('timedelta64[m]')[0]) < 30 and (pd.to_timedelta([(pd.Timestamp(timestamps.date[i])-pd.Timestamp(j))]).astype('timedelta64[m]')[0]) >= -30 :
             timestamps.iloc[i].tweets.append(tweet_data.tweet[getIndexes(tweet_data, j)])

范围（0，len（时间戳））内的i的

：
对于tweet_data.date中的j：
如果（pd.to_timedelta（[（pd.Timestamp.date[i]）-pd.Timestamp（j）））.astype（'timedelta64[m]'）[0]）<30和（pd.to_timedelta（[（pd.Timestamp.date[i]）-pd.Timestamp（Timestamp.date[i]）-pd.Timestamp（j））.astype（'timedelta64[m]'）[0]）>=-30：
timestamps.iloc[i].tweets.append（tweet\u data.tweet[getindex（tweet\u data，j）]）

这里getIndexes（）用于获取要分配的tweet的时间戳的索引。由于两个数据帧都很大，而且for循环都是嵌套的，所以执行起来要花很多时间。如何更快地映射tweet

提前感谢。

欢迎！您正在尝试pandas称为“重采样”的过程。这两个stackoverflow链接中有很好的答案，欢迎！您正在尝试pandas称为“重采样”的过程。这两个stackoverflow链接和