Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用正则表达式提取熊猫中的字符串?_Python_Regex - Fatal编程技术网

Python 如何使用正则表达式提取熊猫中的字符串?

Python 如何使用正则表达式提取熊猫中的字符串?,python,regex,Python,Regex,我有一个数据帧: 游泳(4) 1 4远足(1) 2运行(12) 3.5捕鱼(2) 如何通过删除第二列中的括号来提取字符串,如下所示: | | sid | Hobby (times per month) | |-----+-------+-------------------------| | 0 | 3 | swimming | |-----+-------+-------------------------| | 1 | 4 |

我有一个数据帧:

游泳(4) 1 4远足(1) 2运行(12) 3.5捕鱼(2)

如何通过删除第二列中的括号来提取字符串,如下所示:

|     |  sid  | Hobby (times per month) |
|-----+-------+-------------------------|
|  0  |   3   |        swimming         |
|-----+-------+-------------------------|
|  1  |   4   |        hiking           |
|-----+-------+-------------------------|
|  2  |   2   |        running          |
|-----+-------+-------------------------|
|  3  |   5   |        fishing          |

例如,如果希望将
游泳(4)
更改为
游泳
,可以使用以下正则表达式:

^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$
演示:

测试用例:

swimming(4)
hiking   (1 )
running ( 12 )
fishing( 2 )
hiking(1) 
匹配:

Match 1
Full match  0-11    `swimming(4)`
Group 1.    0-8 `swimming`
Match 2
Full match  12-25   `hiking   (1 )`
Group 1.    12-18   `hiking`
Match 3
Full match  26-40   `running ( 12 )`
Group 1.    26-33   `running`
Match 4
Full match  41-53   `fishing( 2 )`
Group 1.    41-48   `fishing`
Match 5
Full match  54-64   `hiking(1) `
Group 1.    54-60   `hiking`

您可以使用'str'方法匹配pandas中的字符串

df.columns = ['sid','Hobby']
df.Hobby = df.Hobby.str.extract(r'(\w*)')

要在pandas中实现正则表达式,可以使用pandas.apply():


查看apply(或者它是map?)如果它始终是结构
hobby\u xyz(n\u次)
,那么您可以在
上拆分字符串,只保留第一个元素。
df.columns = ['sid','Hobby']
df.Hobby = df.Hobby.str.extract(r'(\w*)')
import re

def remove_brackets(string):
    part = regexp_matcher.findall(string)
    if not part:
        return string
    return part[0]

regexp_matcher = re.compile(r'^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$')
df = pd.DataFrame()
df['string'] = ['swimming(4)', 'swimming(4)', 'swimming(4)']    
df['new_string'] = df['string'].apply(remove_brackets)