Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 正则表达式提取完整捕获组_Python_Regex_Pandas - Fatal编程技术网

Python 正则表达式提取完整捕获组

Python 正则表达式提取完整捕获组,python,regex,pandas,Python,Regex,Pandas,我试图提取URL,但我只得到最后一部分,如“com”,而不是完整的“amazon.com”或“google.com”。我正在使用以下正则表达式: data = [['website is amazon.com'], ['url is google.com']] reviews = pd.DataFrame(data, columns = ['ALL_TEXT']) reviews['regex_match'] = reviews['ALL_TEXT'].str.extract(r'[^@A-

我试图提取URL,但我只得到最后一部分,如“com”,而不是完整的“amazon.com”或“google.com”。我正在使用以下正则表达式:

data = [['website is amazon.com'], ['url is google.com']] 
reviews = pd.DataFrame(data, columns = ['ALL_TEXT']) 
reviews['regex_match'] = reviews['ALL_TEXT'].str.extract(r'[^@A-Z][-A-Z0-9:%_\+~#=]+\.(CO|COM|NET|ORG|GOV)\b', flags=re.IGNORECASE)
我试着在整个正则表达式中使用一个捕获组

reviews['regex_match'] = reviews['ALL_TEXT'].str.extract(r'([^@A-Z][-A-Z0-9:%_\+~#=]+\.(CO|COM|NET|ORG|GOV)\b)', flags=re.IGNORECASE)
但是我得到了错误

Wrong number of items passed 2, placement implies 1

该错误意味着您正在将
Series.str.extract
的结果分配到单个列(
reviews['regex\u match']
),但您的正则表达式包含两个捕获组,即您告诉它返回两列

你可以用


>>查看['ALL_TEXT'].str.extract(r'(?由于传递了2个捕获组,因此出现该错误。可以使用非捕获组对扩展使用
(?:
),对完整模式使用单个捕获组

([^@A-Z][-A-Z0-9:%_+~#=]+\.(?:COM?|NET|ORG|GOV))\b
                           |__________________|
                             Non capture group
|______________________________________________|
                  Capture group
更新后的代码可能看起来像

reviews['regex_match'] = reviews['ALL_TEXT'].str.extract(
    r'([^@A-Z][-A-Z0-9:%_+~#=]+\.(?:COM?|NET|ORG|GOV))\b',
    flags=re.IGNORECASE
)
输出

                ALL_TEXT  regex_match
0  website is amazon.com   amazon.com
1      url is google.com   google.com