Python 3.x 如果值低于阈值,则获取数据帧的标头,否则放入';噪音';
我有一个数据框,如下所示:Python 3.x 如果值低于阈值,则获取数据帧的标头,否则放入';噪音';,python-3.x,pandas,numpy,dataframe,Python 3.x,Pandas,Numpy,Dataframe,我有一个数据框,如下所示: speaker Scarlett Johanson Mark Ruffalo Chris Evans 0 0.790857 1.044091 0.984198 1 0.895030 0.672590 1.072131 2 0.925493 0.078618 0.800736 3 0.
speaker Scarlett Johanson Mark Ruffalo Chris Evans
0 0.790857 1.044091 0.984198
1 0.895030 0.672590 1.072131
2 0.925493 0.078618 0.800736
3 0.296032 0.550027 0.978062
4 0.669364 0.499356 0.940024
因此,我试图实现的是,如果行最小值大于阈值,比如说0.3,我希望值为“噪波”,否则我希望列的名称为值
例如:第0行->最小值为0.7,大于0.3,因此噪音
第二行->最小值为0.07,小于0.3,因此值应为标记Ruffalo
我试图在一个新的专栏中得到这个信息,比如说“最终结果”
我试过这样的方法:
d['final'] = np.where(d.min(axis=1) >= 0.3, 'noise', 'no_noise')
但不了解如何用列标题替换文本'no\u noise'
。提前感谢您的帮助。解决方案1:df.idxmin
:
使用查找最小索引,它返回请求轴上第一次出现的最小值的索引
# set speaker as index so it's out of the way
df.set_index('speaker', inplace=True)
# set your threshold
thresh = 0.3
# use np.where with `df.idxmin` as the other
df['final'] = np.where(df.min(1) > thresh, 'noise', df.idxmin(1))
>>> df
Scarlett Johanson Mark Ruffalo Chris Evans final
speaker
0 0.790857 1.044091 0.984198 noise
1 0.895030 0.672590 1.072131 noise
2 0.925493 0.078618 0.800736 Mark Ruffalo
3 0.296032 0.550027 0.978062 Scarlett Johanson
4 0.669364 0.499356 0.940024 noise
解决方案2:np.argmin
您可以使用查找找到最小值的位置,并根据调用np的结果对列名进行索引。其中
:
# set speaker as index so it's out of the way
df.set_index('speaker', inplace=True)
# set your threshold
thresh = 0.3
# use np.where and np.argmin:
df['final'] = np.where(df.min(1) > thresh, 'noise', df.columns[np.argmin(df.values,1)])
>>> df
Scarlett Johanson Mark Ruffalo Chris Evans final
speaker
0 0.790857 1.044091 0.984198 noise
1 0.895030 0.672590 1.072131 noise
2 0.925493 0.078618 0.800736 Mark Ruffalo
3 0.296032 0.550027 0.978062 Scarlett Johanson
4 0.669364 0.499356 0.940024 noise