Python 如何基于列值标记多个dataframe行_Python_Pandas_Comparison

Python 如何基于列值标记多个dataframe行

python pandas

Python 如何基于列值标记多个dataframe行,python,pandas,comparison,Python,Pandas,Comparison,我有一个数据帧，如下所示： ID Reviews Sorted pairwise scores A This is great 0 [(0, 1)] [0.26386763883335373] A works well 1 [] [] B can this be changed 0 [(0, 1), (0, 2)] [0.11792

我有一个数据帧，如下所示：

ID Reviews              Sorted  pairwise         scores
A   This is great         0     [(0, 1)]         [0.26386763883335373]
A   works well            1     []               []
B   can this be changed   0     [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]
B   how to perform that   1     [(1, 2)]         [0.03299057711398918]
B   summarize it          2     []               []

排序将是ID中重复项的顺序。成对组合将是按ID分组的成对组合。我使用成对组合获得分数列。现在我需要创建一个标志列，如果分数>0.15，则根据成对列标记“是”。例如，当按ID分组时，值B的得分>0.15为0.36，当我们查看成对列（0,2）时，i、e 0和2行应标记为“是”

我期望的输出是：

ID Reviews              Sorted  pairwise         scores                                    Flag
A   This is great         0     [(0, 1)]         [0.26386763883335373]                      yes
A   works well            1     []               []                                         yes
B   can this be changed   0     [(0, 1), (0, 2)] [0.1179287227608669, 0.36815020951152794]  yes
B   how to perform that   1     [(1, 2)]         [0.03299057711398918]                      No
B   summarize it          2     []               []                                         yes

我试着用np.where来计算分数，但对我来说不起作用

有人能提出一个解决方案或想法吗？

提前谢谢

我们进行

分解

，然后

合并

s=df.scores.explode()
s=df.set_index('ID').pairwise.explode()[(s>0.15).values].explode()
df=df.merge(s.to_frame('Sorted').reset_index().assign(flag='Yes'),how='left')
df.flag.fillna('No',inplace=True)
df
                                      scores          pairwise Sorted ID flag
0                      [0.26386763883335373]          [(0, 1)]      0  A  Yes
1                                         []                []      1  A  Yes
2  [0.1179287227608669, 0.36815020951152794]  [(0, 1), (0, 2)]      0  B  Yes
3                      [0.03299057711398918]          [(1, 2)]      1  B   No
4                                         []                []      2  B  Yes

我们先分解，然后再合并

s=df.scores.explode()
s=df.set_index('ID').pairwise.explode()[(s>0.15).values].explode()
df=df.merge(s.to_frame('Sorted').reset_index().assign(flag='Yes'),how='left')
df.flag.fillna('No',inplace=True)
df
                                      scores          pairwise Sorted ID flag
0                      [0.26386763883335373]          [(0, 1)]      0  A  Yes
1                                         []                []      1  A  Yes
2  [0.1179287227608669, 0.36815020951152794]  [(0, 1), (0, 2)]      0  B  Yes
3                      [0.03299057711398918]          [(1, 2)]      1  B   No
4                                         []                []      2  B  Yes

尝试更新我的答案~请检查更新~尝试更新我的答案~请检查更新~此编辑对我更有意义。非常感谢。我注意到的唯一一个问题是，如果分数上面有@gamyanaidu重做，只需将s>0.15改为s，这个编辑对我来说更有意义。非常感谢。我注意到的唯一一个问题是，如果分数上面有@gamyanaidu重做，只需将s>0.15改为s