Python解析记录_Python_Regex_Pandas

Python解析记录

python regex pandas

Python解析记录,python,regex,pandas,Python,Regex,Pandas,我需要解析数据帧中的数据，消除括号之外的所有内容，然后将所述数据移动到新列。理想情况下，如果可以在新列中删除括号，那也太好了，但我认为这两种结果都会产生预期的解决方案： current column new column /reports/industry(5315)/2018 (5315) /reports/limit/sector(139)/2017 (13

我需要解析数据帧中的数据，消除括号之外的所有内容，然后将所述数据移动到新列。理想情况下，如果可以在新列中删除括号，那也太好了，但我认为这两种结果都会产生预期的解决方案：

current column                                  new column
/reports/industry(5315)/2018                    (5315)
/reports/limit/sector(139)/2017                 (139)
/reports/sector/region(147,189 and 132)/2018    (147,189 and 132)

谢谢你，任何你能给的方向都会很棒

>>> import re
>>> re.sub('.*(\(.*\)).*', '\\1', '/reports/industry(5315)/2018')
'(5315)'

完整的例子

import pandas as pd
import re


old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])


def grab_dat(x):
    dat = re.sub('.*(\(.*\)).*', '\\1', x)
    return(dat)


df['new_col'] =  df['current_column'].apply(grab_dat)

完整的例子

import pandas as pd
import re


old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])


def grab_dat(x):
    dat = re.sub('.*(\(.*\)).*', '\\1', x)
    return(dat)


df['new_col'] =  df['current_column'].apply(grab_dat)

使用正则表达式和函数

df['new_column'] = df['col'].str.extract(r'(?P<new_column>(?<=\().*(?=\)))', expand=False)

df['new_column']=df['col'].str.extract（r'）（？P（？使用正则表达式和pandas
str
函数
df['new_column'] = df['col'].str.extract(r'(?P<new_column>(?<=\().*(?=\)))', expand=False)

df['new_column']=df['col'].str.extract（r'（？P（？你可以用正则表达式这样做：
old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
df['new_column'] = df['current_column'].str.extract(r'\((.*)\)')

current_column                                       new_column
0   /reports/industry(5315)/2018                        5315
1   /reports/limit/sector(139)/2017                      139
2   /reports/sector/region(147,189 and 132)/2018    147,189 and 132

输出如下：
old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
df['new_column'] = df['current_column'].str.extract(r'\((.*)\)')

current_column                                       new_column
0   /reports/industry(5315)/2018                        5315
1   /reports/limit/sector(139)/2017                      139
2   /reports/sector/region(147,189 and 132)/2018    147,189 and 132

您可以使用regex这样做：
old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
df['new_column'] = df['current_column'].str.extract(r'\((.*)\)')

current_column                                       new_column
0   /reports/industry(5315)/2018                        5315
1   /reports/limit/sector(139)/2017                      139
2   /reports/sector/region(147,189 and 132)/2018    147,189 and 132

输出如下：
old_col = ['/reports/industry(5315)/2018', '/reports/limit/sector(139)/2017', '/reports/sector/region(147,189 and 132)/2018']
df = pd.DataFrame(old_col, columns=['current_column'])
df['new_column'] = df['current_column'].str.extract(r'\((.*)\)')

current_column                                       new_column
0   /reports/industry(5315)/2018                        5315
1   /reports/limit/sector(139)/2017                      139
2   /reports/sector/region(147,189 and 132)/2018    147,189 and 132

IIUC提取物
df.current.str.extract('.*\((.*)\).*',expand=True)
Out[785]: 
               0
0           5315
1            139
2147,189 and 132

IIUC提取物
df.current.str.extract('.*\((.*)\).*',expand=True)
Out[785]: 
               0
0           5315
1            139
2147,189 and 132