Python将遍历整个列并检查它是否包含某个str_Python_Pandas_Dataframe

Python将遍历整个列并检查它是否包含某个str

python pandas dataframe

Python将遍历整个列并检查它是否包含某个str,python,pandas,dataframe,Python,Pandas,Dataframe,我对python数据帧有点陌生，所以这听起来很简单。我在数据框中有一个名为“body_text”的列，我想看看body_text的每一行是否包含单词“Hello”。如果是，我想制作另一个列，它的值是1或0 我尝试使用str.contains（“Hello”），但出现了一个错误，它只选择了包含“Hello”的行，并试图将其放在另一列中。我试着寻找其他的解决方案，结果却出现了更多的错误——for循环和str中的str textdf = traindf[['request_title','requ

我对python数据帧有点陌生，所以这听起来很简单。我在数据框中有一个名为“body_text”的列，我想看看body_text的每一行是否包含单词“Hello”。如果是，我想制作另一个列，它的值是1或0

我尝试使用

str.contains（“Hello”）

，但出现了一个错误，它只选择了包含“Hello”的行，并试图将其放在另一列中。我试着寻找其他的解决方案，结果却出现了更多的错误——for循环和str中的str

textdf = traindf[['request_title','request_text_edit_aware']]

traindf是一个巨大的数据帧，我只从中提取了两列，如您在问题中定义的textdf，请尝试：

textdf['new_column'] = [1 if t == 'Hello' else 0 for t in textdf['body_text'] ]

如果您的匹配项区分大小写，请使用并链接以强制转换为

int

：

df['contains_hello'] = df['body_text'].str.contains('Hello').astype(int)

如果它应该匹配，不区分大小写，则添加

case=False

参数：

df['contains_hello'] = df['body_text'].str.contains('Hello', case=False).astype(int)

更新如果需要匹配多个模式，请使用带有

（'OR'）字符的

regex

。根据您的要求，您可能还需要一个“单词边界”字符

如果您想了解有关

regex

模式和字符类的更多信息，这是一个很好的资源

例子

您可以在Panda中使用

get_dummies（）

函数

是指向文档的链接。

请在Hi中添加您的尝试和错误，并欢迎加入社区。请记住格式化代码片段（）并查看一下，以便正确回答这个问题。我们需要了解数据帧的外观。根据您提供给我们的信息，我们不知道您是否要在多个列中搜索“hello”，是否需要在字符串中搜索字符串或仅搜索hello等。如果是多个字符，您将如何实现它？就像我希望它也包括“Hi”作为一个字符串来检查。谢谢你的回答@MeiTei我已经更新了我的答案，我希望它能有所帮助。

df = pd.DataFrame({'body_text': ['no matches here', 'Hello, this should match', 'high low - dont match', 'oh hi there - match me']})

#                      body_text
#    0           no matches here   
#    1  Hello, this should match   <--  we want to match this 'Hello'
#    2     high low - dont match   <-- 'hi' exists in 'high', but we don't want to match it
#    3    oh hi there - match me   <--  we want to match 'hi' here

df['contains_hello'] = df['body_text'].str.contains(r'Hello|\bhi\b', regex=True).astype(int)

                  body_text  contains_hello
0           no matches here               0
1  Hello, this should match               1
2     high low - dont match               0
3    oh hi there - match me               1

match = ['hello', 'hi']    
pat = '|'.join([fr'\b{x}\b' for x in match])
# '\bhello\b|\bhi\b'  -  meaning 'hello' OR 'hi'

df.body_text.str.contains(pat)