Python 如何使用正则表达式重新标记数据帧中的行？_Python_Regex_R_Pandas

Python 如何使用正则表达式重新标记数据帧中的行？

python regex r pandas

Python 如何使用正则表达式重新标记数据帧中的行？,python,regex,r,pandas,Python,Regex,R,Pandas,我计划访问某一列下的所有条目，并搜索字符串模式熊猫数据框中的数据项示例如下： https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=kitty+pictures https://search.yahoo.com/search;_ylc=X3oDMTFiN25laTRvBF9TAzIwMjM1MzgwNzUEaXRjAzEEc2VjA3NyY2hf

我计划访问某一列下的所有条目，并搜索字符串模式

熊猫数据框中的数据项示例如下：

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=kitty+pictures
https://search.yahoo.com/search;_ylc=X3oDMTFiN25laTRvBF9TAzIwMjM1MzgwNzUEaXRjAzEEc2VjA3NyY2hfcWEEc2xrA3NyY2h3ZWI-?p=kitty+pictures&fr=yfp-t-694
https://duckduckgo.com/?q=kitty+pictures
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=cat+pictures

我想使用正则表达式来查找web搜索引擎，并将其替换为一个单词。因此，您可以使用正则表达式查找

google

，并将上面的所有URL替换为

google

通常，人们会尝试

import re
string_example = "https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=cat+pictures"
re.search(r'google', string_example)

然而，这只是返回谷歌，而不是取代它

（1）如何在此数据框中搜索整个列条目以查找

r'google

，然后将该URL替换为“google”

（2）如何仅搜索列条目？我不能每次都传入一个字符串

用于处理生成布尔掩码的各种方法，以通过

loc

并设置这些行：

In [126]:
df = pd.DataFrame({'url':['google', 'cat', 'google cat', 'dog']})
df

Out[126]:
          url
0      google
1         cat
2  google cat
3         dog

In [127]:    
df['url'].str.contains('google')

Out[127]:
0     True
1    False
2     True
3    False
Name: url, dtype: bool

In [128]:    
df['url'].str.contains('google|cat')

Out[128]:
0     True
1     True
2     True
3    False
Name: url, dtype: bool

In [129]:
(df['url'].str.contains('google')) & (~df['url'].str.contains('cat'))

Out[129]:
0     True
1    False
2    False
3    False
Name: url, dtype: bool

然后，您可以将这些条件传递给loc：

In [130]:
df.loc[df['url'].str.contains('google'), 'url'] = 'yahoo'
df

Out[130]:
     url
0  yahoo
1    cat
2  yahoo
3    dog

IIUC然后

df.loc[df['url'].str.contains（'google'），'url']='google'

应该work@EdChum当然我是个傻瓜。“字符串包含谷歌和猫”怎么样？或者“字符串包含谷歌而不是猫”？也就是说，如何搜索多个单词？

df.loc[df['url'].str.contains（'google | cat'），'url']='google'

，

df.loc[（df['url'].str.contains（'google'））和（~df['url'].str.contains（'cat'），'url']='google'

最后一个问题：“如果它不包含google并且不包含cat，请删除行条目”如何？我不确定在这种情况下，否定tilde

将如何工作。那将是

~df['url'].str.contains（'google | cat'）

我现在知道了。感谢您对n00b的耐心！