Python 数据框选择具有多列的行'；字符串条件_Python_Pandas

Python 数据框选择具有多列的行'；字符串条件

python pandas

Python 数据框选择具有多列的行'；字符串条件,python,pandas,Python,Pandas,我有一个类似的数据帧： df = pd.DataFrame([{'year':2017, 'text':'yes it is', 'label_one':'POSITIVE', 'label_two':'positive'}, {'year':2017, 'text':'it could be', 'label_one':'POSITIVE', 'label_two':'negative'}, {'year':2017, 'text':'it may be', 'label_one':'NEG

我有一个类似的数据帧：

df = pd.DataFrame([{'year':2017, 'text':'yes it is', 'label_one':'POSITIVE', 'label_two':'positive'}, 
{'year':2017, 'text':'it could be', 'label_one':'POSITIVE', 'label_two':'negative'},
{'year':2017, 'text':'it may be', 'label_one':'NEGATIVE', 'label_two':'positive'},
{'year':2018, 'text':'it has to be done', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2018, 'text':'no', 'label_one':'NEGATIVE', 'label_two':'negative'},
{'year':2019, 'text':'you should be afraid of it', 'label_one':'POSITIVE', 'label_two':'negative'},
{'year':2019, 'text':'he is right', 'label_one':'POSITIVE', 'label_two':'positive'},
{'year':2020, 'text':'do not mind, I wil fix it', 'label_one':'NEGATIVE', 'label_two':'positive'},
{'year':2020, 'text':'that is a trap', 'label_one':'NEGATIVE', 'label_two':'negative'},
{'year':2021, 'text':'I am on my way', 'label_one':'POSITIVE', 'label_two':'positive'}])

我如何过滤它以便只包含

label\u one

和

label\u two

字符串值均为

正/正

或

负/负

我尝试了以下方法，但不起作用：

ptp = df.loc[(df['label_one'].str.startswith('P') and df['label_two'].str.startswith('p')) & (df['label_one'].str.startswith('N') and df['label_two'].str.startswith('n'))]

那怎么办

df[df['label_one'].str.lower() == df['label_two'].str.lower()]

假设

label\u one

和

label\u two

仅适用于

负值

、

正值

、

负值

或

正值

如何

df[df['label_one'].str.lower() == df['label_two'].str.lower()]

假设

label\u one

和

label\u two

仅适用于

负值

、

正值

、

负值

或

正值

，则此操作有效。按照您的模式，两者都以P/P或N/N开头

ptp = df.loc[((df['label_one'].str.startswith('P')) &
              (df['label_two'].str.startswith('p'))) |          
             ((df['label_one'].str.startswith('N')) &        
              (df['label_two'].str.startswith('n')))]

给予

这很有效。按照您的模式，两者都以P/P或N/N开头

ptp = df.loc[((df['label_one'].str.startswith('P')) &
              (df['label_two'].str.startswith('p'))) |          
             ((df['label_one'].str.startswith('N')) &        
              (df['label_two'].str.startswith('n')))]

给予

谢谢你，保罗！谢谢你，保罗！