Python 当值在表中其他两列的范围内时,如何聚合行的平均值
我有2个数据帧要合并。两者都持续300秒(在“开始”列中)。它们在下面 df_1: df_2: ……等等 当df_2['start']值介于df_1['start']和df_1['stop']值之间时,我想合并df_2['confidence']的聚合平均值 理想情况下,它看起来是这样的:Python 当值在表中其他两列的范围内时,如何聚合行的平均值,python,pandas,Python,Pandas,我有2个数据帧要合并。两者都持续300秒(在“开始”列中)。它们在下面 df_1: df_2: ……等等 当df_2['start']值介于df_1['start']和df_1['stop']值之间时,我想合并df_2['confidence']的聚合平均值 理想情况下,它看起来是这样的: color start stop confidence 0 blue 2.72 2.85 .11 1 green 2.86 3.09 .72 2 b
color start stop confidence
0 blue 2.72 2.85 .11
1 green 2.86 3.09 .72
2 blue 3.10 3.47 .22
3 green 3.48 4.69 .68
4 blue 4.70 5.97 .57
5 green 5.98 7.07 .49
谢谢 您可以使用
IntervalIndex
构建区间树,然后使用IntervalIndex获取df2['start']
的位置。获取索引器,然后最后分组并找到平均值:
idx = pd.IntervalIndex.from_arrays(df['start'], df['stop'])
df.join(
df2.groupby(idx.get_indexer(df2['start']))['confidence'].mean(), how='left')
color start stop confidence
0 blue 2.72 2.85 0.1100
1 green 2.86 3.09 0.7150
2 blue 3.10 3.47 0.2200
3 green 3.48 4.69 0.6780
4 blue 4.70 5.97 0.5675
5 green 5.98 7.07 0.4860
IIUC,您可以使用pd.cut
和groupby
,然后使用merge
:
# bins for cut
bins=[df1.start[0] ] + df1.stop.to_list()
# label the start in df2 by cuts:
s = pd.cut(df2.start, bins=bins, labels=df1.start)
# group df2 by the cuts:
new_df = df2.groupby(s).confidence.mean()
# merge
df1.merge(new_df, left_on='start', right_index=True)
给你:
color start stop confidence
0 blue 2.72 2.85 0.110000
1 green 2.85 3.09 0.715000
2 blue 3.09 3.47 0.220000
3 green 8.43 8.69 0.577857
4 blue 8.69 8.97 NaN
5 green 8.97 9.07 NaN
使用已编辑的df1
(这在与输出匹配时有意义):
它与OP的输出匹配吗?@harvpan cs95编辑df1
。我使用了原始的df1
。啊,我看到OP被编辑了+明白了,谢谢你的提示。两者都是有用的,但接受你的,因为这对我来说更直观。非常感谢。
# bins for cut
bins=[df1.start[0] ] + df1.stop.to_list()
# label the start in df2 by cuts:
s = pd.cut(df2.start, bins=bins, labels=df1.start)
# group df2 by the cuts:
new_df = df2.groupby(s).confidence.mean()
# merge
df1.merge(new_df, left_on='start', right_index=True)
color start stop confidence
0 blue 2.72 2.85 0.110000
1 green 2.85 3.09 0.715000
2 blue 3.09 3.47 0.220000
3 green 8.43 8.69 0.577857
4 blue 8.69 8.97 NaN
5 green 8.97 9.07 NaN
color start stop confidence
0 blue 2.72 2.85 0.1100
1 green 2.86 3.09 0.7150
2 blue 3.1 3.47 0.2200
3 green 3.48 4.69 0.6780
4 blue 4.7 5.97 0.5675
5 green 5.98 7.07 0.4860