Python 如果两列中的一行包含相同的字符串_Python_String_Pandas_Dataframe

Python 如果两列中的一行包含相同的字符串

python string pandas dataframe

Python 如果两列中的一行包含相同的字符串,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我有一个如下所示的数据框： id k1 k2 same 1 re_setup oo_setup true 2 oo_setup oo_setup true 3 alerting bounce false 4 bounce re_oversetup false 5 re_oversetup alerting false

我有一个如下所示的数据框：

    id      k1        k2         same
    1    re_setup    oo_setup   true
    2    oo_setup    oo_setup   true
    3    alerting    bounce     false
    4    bounce      re_oversetup   false
    5    re_oversetup    alerting   false
    6    alerting_s  re_setup   false
    7    re_oversetup    oo_setup   true
    8    alerting    bounce     false

因此，我需要对包含或不包含字符串“setup”的行进行分类

And simple output would be:
    id      k1        k2         same
    1    re_setup    oo_setup   true
    2    oo_setup    oo_setup   true
    3    alerting    bounce     false
    4    bounce      re_setup   false
    5    re_setup    alerting   false
    6    alerting_s  re_setup   false
    7    re_setup    oo_setup   true
    8    alerting    bounce     false

我已经尝试过这样做，但当我解释时，我在选择多个列时出错

data['same'] = data[data['k1', 'k2'].str.contains('setup')==True]

我认为您需要，因为它仅适用于

系列

（一列）：

然后添加以检查每行是否所有

True

data['same'] = data[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).all(1)
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup  False
4   5    re_setup  alerting  False
5   6  alerting_s  re_setup  False
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

或检查每行至少一个

True

：

data['same'] = data[['k1', 'k2']].applymap(lambda x: 'setup' in x).any(1)
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup   True
4   5    re_setup  alerting   True
5   6  alerting_s  re_setup   True
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

另一种针对元素检查的解决方案：

data['same'] = data[['k1', 'k2']].applymap(lambda x: 'setup' in x).all(1)
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup  False
4   5    re_setup  alerting  False
5   6  alerting_s  re_setup  False
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

如果只有两列简单的链条件与

类似

所有或|类似任何：
data['same'] = data['k1'].str.contains('setup') & data['k2'].str.contains('setup')
print (data)
   id          k1        k2   same
0   1    re_setup  oo_setup   True
1   2    oo_setup  oo_setup   True
2   3    alerting    bounce  False
3   4      bounce  re_setup  False
4   5    re_setup  alerting  False
5   6  alerting_s  re_setup  False
6   7    re_setup  oo_setup   True
7   8    alerting    bounce  False

下面是另一个通用的reduce操作，无需apply

In [114]: np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
Out[114]: array([ True,  True, False,  True,  True,  True,  True, False], dtype=bool)

细部
In [115]: df['same'] = np.logical_or.reduce(
                         [df[c].str.contains('setup') for c in ['k1', 'k2']])

In [116]: df
Out[116]:
   id            k1            k2   same
0   1      re_setup      oo_setup   True
1   2      oo_setup      oo_setup   True
2   3      alerting        bounce  False
3   4        bounce  re_oversetup   True
4   5  re_oversetup      alerting   True
5   6    alerting_s      re_setup   True
6   7  re_oversetup      oo_setup   True
7   8      alerting        bounce  False

计时
小的
大的
如果我在安装之前没有下划线“u”，比如我的问题中的现在，我已经编辑过了，这会起作用吗？谢谢。是的，它只检查字符串设置
In [115]: df['same'] = np.logical_or.reduce(
                         [df[c].str.contains('setup') for c in ['k1', 'k2']])

In [116]: df
Out[116]:
   id            k1            k2   same
0   1      re_setup      oo_setup   True
1   2      oo_setup      oo_setup   True
2   3      alerting        bounce  False
3   4        bounce  re_oversetup   True
4   5  re_oversetup      alerting   True
5   6    alerting_s      re_setup   True
6   7  re_oversetup      oo_setup   True
7   8      alerting        bounce  False

In [111]: df.shape
Out[111]: (8, 4)

In [108]: %timeit np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
1000 loops, best of 3: 421 µs per loop

In [109]: %timeit df[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).any(1)
1000 loops, best of 3: 2.01 ms per loop

In [110]: df.shape
Out[110]: (40000, 4)

In [112]: %timeit np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
10 loops, best of 3: 59.5 ms per loop

In [113]: %timeit df[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).any(1)
10 loops, best of 3: 88.4 ms per loop