Python 查找多列中的值是否大于Dataframe中的常量_Python_Python 3.x_Pandas

Python 查找多列中的值是否大于Dataframe中的常量

python python-3.x pandas

Python 查找多列中的值是否大于Dataframe中的常量,python,python-3.x,pandas,Python,Python 3.x,Pandas,以下是我的数据帧（df）的外观：我想确定status 01和03列中是否有任何一列超过2000的常量值，并设置另一列（标志）表示该值大于2000。因此，在上面的输入行中，1和2将满足条件，而不是3 我能想到的解决方案是过滤数据帧，使其在新的数据帧中只有status 01和03列，并使用复杂的np.where子句设置标志 df1 = df[[status01,status03]] df1[more_than_2000] = np.where((df1['status01_01'] >=

以下是我的数据帧（df）的外观：

我想确定status 01和03列中是否有任何一列超过2000的常量值，并设置另一列（标志）表示该值大于2000。因此，在上面的输入行中，1和2将满足条件，而不是3

我能想到的解决方案是过滤数据帧，使其在新的数据帧中只有status 01和03列，并使用复杂的np.where子句设置标志

df1 = df[[status01,status03]]
df1[more_than_2000] = np.where((df1['status01_01'] >= 2000) | (df1['status01_02'] >= 2000) | ...), 1,0)

执行此操作的更好方法是什么？

您可以在此处使用

max（）

函数：

col_max=df1['status01_01'，'status01_02'，…].max（轴=1）
df1[大于2000]=df1[col\u max>=2000]

您可以在此处使用

max（）

函数：

col_max=df1['status01_01'，'status01_02'，…].max（轴=1）
df1[大于2000]=df1[col\u max>=2000]

让我们试试：

df1['ge_2000'] = df1.filter(like='status01').max(axis=1).ge(2000).astype(int)

或

输出：

      id  activity_date          status01_1    status01_2    status02  status03_01    status03_02       status04    ge_2000
--  ----  -------------------  ------------  ------------  ----------  -------------  -------------  -----------  ---------
 0     1  2020-12-09 22:13:16             0             0        3560  0              nan                    nan          0
 1     1  2020-12-10 01:02:33          8327             0           0                 nan                    nan          1
 2     1  2020-12-11 01:02:33             0             0         230  0                                     nan          0

更新：关于评论中的额外问题：

s = df.filter(like='status')

df.join(s.groupby(s.columns.str.split('_').str[0], axis=1)
  .max().gt(2000).astype(int)
  .add_suffix('_ge2000')
)

让我们试试：

df1['ge_2000'] = df1.filter(like='status01').max(axis=1).ge(2000).astype(int)

或

输出：

      id  activity_date          status01_1    status01_2    status02  status03_01    status03_02       status04    ge_2000
--  ----  -------------------  ------------  ------------  ----------  -------------  -------------  -----------  ---------
 0     1  2020-12-09 22:13:16             0             0        3560  0              nan                    nan          0
 1     1  2020-12-10 01:02:33          8327             0           0                 nan                    nan          1
 2     1  2020-12-11 01:02:33             0             0         230  0                                     nan          0

更新：关于评论中的额外问题：

s = df.filter(like='status')

df.join(s.groupby(s.columns.str.split('_').str[0], axis=1)
  .max().gt(2000).astype(int)
  .add_suffix('_ge2000')
)

您的数据参差不齐，各行上有4,3,4个数据点。您的数据参差不齐，各行上有4,3,4个数据点。您的意思是

max（1）

？此外，您的代码似乎没有按OP预期运行。您的意思是

max（1）

？此外，您的代码似乎没有按照OP的预期运行。like='status01'我如何将like='status03'也包括在其中？@Ram假设

like

接受正则表达式，您可以使用以下内容：

like='status0[1-3]“

@Ram查看更新的答案。like='status01'我如何将like='status03'也包括在其中？@Ram假设

like

接受正则表达式，您可以使用：

like='status0[1-3]”

@Ram查看更新的答案。