Python 熊猫：从第二个数据帧的索引创建列_Python_Pandas_Dataframe

Python 熊猫：从第二个数据帧的索引创建列

python pandas dataframe

Python 熊猫：从第二个数据帧的索引创建列,python,pandas,dataframe,Python,Pandas,Dataframe,我正在比较两个数据帧Small_df和Big_df。两个数据帧都有一个时间列。大的时间列按时间顺序排列，时间步长为10秒，而小的时间列没有固定的时间步长。大_df中的一些时间值存在于小_df中，有时不止一次我试图完成的是在Small_df中创建一个新列，该列保存Big_df中具有匹配时间值的行的索引。这是两个数据帧的结构：请注意，时间是时间戳格式的小型风力发电机： print(Small_df['Date'].head()) 0 2019-05-22 15:37:05 1 2019

我正在比较两个数据帧Small_df和Big_df。两个数据帧都有一个时间列。大的时间列按时间顺序排列，时间步长为10秒，而小的时间列没有固定的时间步长。大_df中的一些时间值存在于小_df中，有时不止一次

我试图完成的是在Small_df中创建一个新列，该列保存Big_df中具有匹配时间值的行的索引。这是两个数据帧的结构：请注意，时间是时间戳格式的

小型风力发电机：

print(Small_df['Date'].head())
0   2019-05-22 15:37:05
1   2019-05-22 15:40:25
2   2019-05-22 15:40:45
3   2019-05-22 15:40:45
4   2019-05-22 15:41:55

Big_df：

print(Big_df['Date'].head())
0    2019-05-22 15:20:25
1    2019-05-22 15:20:35
2    2019-05-22 15:20:45
3    2019-05-22 15:20:55
4    2019-05-22 15:21:05

我们可以在Big_df中的该位置找到小_df显示的相应时间：

print(Big_df['Date'].iloc[100:130])
100    2019-05-22 15:37:05
101    2019-05-22 15:37:15
102    2019-05-22 15:37:25
103    2019-05-22 15:37:35
104    2019-05-22 15:37:45
105    2019-05-22 15:37:55
106    2019-05-22 15:38:05
107    2019-05-22 15:38:15
108    2019-05-22 15:38:25
109    2019-05-22 15:38:35
110    2019-05-22 15:38:45
111    2019-05-22 15:38:55
112    2019-05-22 15:39:05
113    2019-05-22 15:39:15
114    2019-05-22 15:39:25
115    2019-05-22 15:39:35
116    2019-05-22 15:39:45
117    2019-05-22 15:39:55
118    2019-05-22 15:40:05
119    2019-05-22 15:40:15
120    2019-05-22 15:40:25
121    2019-05-22 15:40:35
122    2019-05-22 15:40:45
123    2019-05-22 15:40:55
124    2019-05-22 15:41:05
125    2019-05-22 15:41:15
126    2019-05-22 15:41:25
127    2019-05-22 15:41:35
128    2019-05-22 15:41:45
129    2019-05-22 15:41:55

我期待的结果是这样的：

print(Small_df[['Date','Big_df_idx']].head())
0   2019-05-22 15:37:05   100
1   2019-05-22 15:40:25   120
2   2019-05-22 15:40:45   122
3   2019-05-22 15:40:45   122
4   2019-05-22 15:41:55   129

通过执行以下操作，我可以获得匹配值的相应索引：

Big_df_idx = Big_df[Big_df['Date'].isin(Small_df['Date'].astype(str).tolist())].index

print(Big_df_idx[0:10])
 Int64Index([100, 120, 122, 129, 153, 156, 159, 160, 177, 178], dtype='int64')

然而，这只返回一次索引，而我需要一些可以解释重复索引的东西

感谢您执行任务运行：

pd.merge(Small_df, Big_df.reset_index().rename(
    columns={'index': 'Big_df_idx'}), how='left')

成功的关键是将Big_df的索引复制到一个常规列中并将其重命名为Big_df_idx

然后，在左模式下，将这样一个临时数据帧与小_df合并，仅从小_df获取日期，但从

Big_df column.

在相对较小的数据上，您可以使用map函数来解决问题，而不是创建新的DataFrame对象：

Small_df['id'] = Small_df['Date'].map(dict(zip(Big_df['Date'], Big_df.index)))

可能重复使用Small_df中新列中的索引值吗？@jeschwar I将对它们执行+1操作，并获得Big_df的后续时间步长，然后用此更新的时间替换Small_df time列。很有趣。我只需将.astypestr添加到Small_df中，就行了。谢谢