[Python][Pandas]高效地交叉和比较两个groupby列中的值_Python_Pandas_Pandas Groupby

[Python][Pandas]高效地交叉和比较两个groupby列中的值

python pandas

[Python][Pandas]高效地交叉和比较两个groupby列中的值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我目前正在尝试交叉两个数据帧（df和marche_type_jointure）的数据，这些数据帧是关于按特定列分组的公交车行程（英文：站点、方向、日期类型、季节、年份），如下代码所示： df_grouped = df.groupby(['arret', 'sens', 'type_jour','saison', 'annee']) marche_type_jointure_grouped = marche_type_jointure.groupby(['arret', 'sens', 'type

我目前正在尝试交叉两个数据帧（df和marche_type_jointure）的数据，这些数据帧是关于按特定列分组的公交车行程（英文：站点、方向、日期类型、季节、年份），如下代码所示：

df_grouped = df.groupby(['arret', 'sens', 'type_jour','saison', 'annee'])
marche_type_jointure_grouped = marche_type_jointure.groupby(['arret', 'sens', 'type_jour','saison', 'annee'])

这为我提供了以下数据帧以及以下相关列（请参见屏幕截图），您可以使用以下数据帧作为示例（只有您可以看到的最重要的变量）：

以下是最重要的变量：在df中：

'sens'

（第6列），

“arret”

（停止，第9列），

“heure_arrivee_relle”

（实际到达时间，第13列），

'type_jour'

（日期类型，带'vac_s'表示假日，'samedi'表示周六，'dimanche'表示周日，最后是'semaine'表示周，第25列），

'saison'

（第26栏，冬季为“hiver”，夏季为“ete”），

'annee'

（即2018年或1978年，指2017-2018年冬季，即2017年1月至2017年5月，然后2017年9月至2018年5月，第29栏）， “temps\u trajet\u mt”暂时为空，但值将来自“marche\u type\u jointure\u grouped”（.temps\u trajet\u sur\u periode）

对于marche\u type\u jointure\u分组：

'arret'

（停止），

'heure\u处女秀\u周期'

和

'heure\u fin\u周期'

，这是从上一站到我们所处站所需的时间，是特定时间段的开始和结束（在公共汽车线路中）。

'type_jour'

、

'saison'

和

'sens'

与

中的df_grouped

相同

我实际上想做的是，对于分组数据帧的每个“键”（停止、方向、日期类型、季节和年份），将

df\u grouped

中的值

'heure\u arrivee\u reelle'

（'react\u time\u of\u arrival'）与

'heure\u fin\u period'

（英语中的'end\u time\u period'值）进行比较和/或

'heure\u dust\u period'

（“启动时间\u period”符合以下条件： “如果heure_到达[heure_首秀周期，heure_fin_周期]”（作为间隔，但可能有一个更容易的条件给出相同的结果）
，则df_分组中的temps_trajet_mt的值成为marche_type_jointure_分组中的temps_trajet_周期的对应值。显然，我希望以一种有效的方式来做这件事。以下是我试图做的事情，也许你会更了解我想做的事情： for key,group in df_grouped: df_mt = marche_type_jointure_grouped.get_group(key) #we take the corresponding group from the other dataframe index_har = list(group.heure_arrivee_reelle.index.values) #list of index of actual arrival times from df_grouped referring to key index_hdp = list(df_mt.heure_debut_periode.index.values) #list of index of beginning time of period from marche_type_jointure_grouped refering to key for i in index_hdp: for j in index_har: if df_mt.heure_fin_periode[i] >= group.heure_arrivee_reelle[j]: group.temps_trajet_mt[j] = df_mt.temps_trajet_sur_periode[i] index_har.remove(j) #so that I do not have to compare it again else: pass df_grouped.size().unstack() #so that I can see the result 提前谢谢你的帮助，这对我来说真的很重要，我已经做了一个多星期了！如果我不清楚我想做什么，请告诉我（如果你愿意的话，这或多或少是一个有条件的合作…）欢迎使用SO。请提供一个。有关熊猫的具体建议，请参阅。我实际上解决了这个问题：在计算时间段后，我对数据使用了一个合并，然后使用了两个过滤器（而不是进行比较，我认为这更有效）。无论如何，谢谢：-）当然，在这种情况下，您可以发布自己的答案。这可以帮助更广泛的社区。 for key,group in df_grouped: df_mt = marche_type_jointure_grouped.get_group(key) #we take the corresponding group from the other dataframe index_har = list(group.heure_arrivee_reelle.index.values) #list of index of actual arrival times from df_grouped referring to key index_hdp = list(df_mt.heure_debut_periode.index.values) #list of index of beginning time of period from marche_type_jointure_grouped refering to key for i in index_hdp: for j in index_har: if df_mt.heure_fin_periode[i] >= group.heure_arrivee_reelle[j]: group.temps_trajet_mt[j] = df_mt.temps_trajet_sur_periode[i] index_har.remove(j) #so that I do not have to compare it again else: pass df_grouped.size().unstack() #so that I can see the result