Regex 如何从python字符串中提取数字？_Regex_Python 3.x_String

Regex 如何从python字符串中提取数字？

regex python-3.x string

Regex 如何从python字符串中提取数字？,regex,python-3.x,string,Regex,Python 3.x,String,我必须从statsmodel包生成的系数参数中提取结值，并将其放在自己的列中下面是熊猫数据帧的当前示例，下面是我正在寻找的解决方案。当使用statsmodels包拟合分段线性模型时，变量返回为patsy语句。如果一个人打一个结，就会有两个系数。如果用户放置两个节点，则三个系数。在每个变量语句的末尾，括号内有一个数字。如果该数字=[0]，则我需要新列中的值表示0。如果数字是[1]，那么我需要将新列中的值设置为字符串节点=[]部分的第一个值。如果数字是[2]，那么我需要拉出knots=[]语句中的

我必须从statsmodel包生成的系数参数中提取结值，并将其放在自己的列中

下面是熊猫数据帧的当前示例，下面是我正在寻找的解决方案。当使用statsmodels包拟合分段线性模型时，变量返回为

patsy

语句。如果一个人打一个结，就会有两个系数。如果用户放置两个节点，则三个系数。在每个变量语句的末尾，括号内有一个数字。如果该数字=

[0]

，则我需要新列中的值表示

。如果数字是

[1]

，那么我需要将新列中的值设置为字符串

节点=[]

部分的第一个值。如果数字是

[2]

，那么我需要拉出

knots=[]

语句中的第二个数字，依此类推。我尝试过在线助手工具，但没有取得任何突破

import pandas as pd
#current

dict = {'index': ['bs(np.clip(vehicle_age_model, 0, np.inf), degree=1, knots=[10, 25])[0]'
        , 'bs(np.clip(vehicle_age_model, 0, np.inf), degree=1, knots=[10, 25])[1]'
        , 'bs(np.clip(vehicle_age_model, 0, np.inf), degree=1, knots=[10, 25])[2]'
        ,'bs(np.clip(driver_age_model, 0, np.inf), degree=1, knots=[25])[0]'
        , 'bs(np.clip(driver_age_model, 0, np.inf), degree=1, knots=[25])[1]'
        ,'bs(np.clip(length_ft_model, 0, np.inf), degree=1, knots=[32])[0]'
        ,'bs(np.clip(length_ft_model, 0, np.inf), degree=1, knots=[32])[0]']}

df1 = pd.DataFrame.from_dict(dict)

df1

# Solution

dict2 = {'index': ['bs(np.clip(vehicle_age_model, 0, np.inf), degree=1, knots=[10, 25])[0]'
        , 'bs(np.clip(vehicle_age_model, 0, np.inf), degree=1, knots=[10, 25])[1]'
        , 'bs(np.clip(vehicle_age_model, 0, np.inf), degree=1, knots=[10, 25])[2]'
        ,'bs(np.clip(driver_age_model, 0, np.inf), degree=1, knots=[10, 25])[0]'
        , 'bs(np.clip(driver_age_model, 0, np.inf), degree=1, knots=[10, 25])[1]'
        ,'bs(np.clip(length_ft_model, 0, np.inf), degree=1, knots=[32])[0]'
        ,'bs(np.clip(length_ft_model, 0, np.inf), degree=1, knots=[32])[0]'],
       'desired_1': [0,10,25,0,25,0,32]}

df2 = pd.DataFrame.from_dict(dict2)
df2

你可以这样做：

 df1.assign(desired1 = df1['index'].str.replace('.*=.','([0, ').apply(eval))
Out: 
                                               index  desired1
0  bs(np.clip(vehicle_age_model, 0, np.inf), degr...         0
1  bs(np.clip(vehicle_age_model, 0, np.inf), degr...        10
2  bs(np.clip(vehicle_age_model, 0, np.inf), degr...        25
3  bs(np.clip(driver_age_model, 0, np.inf), degre...         0
4  bs(np.clip(driver_age_model, 0, np.inf), degre...        25
5  bs(np.clip(length_ft_model, 0, np.inf), degree...         0
6  bs(np.clip(length_ft_model, 0, np.inf), degree...         0

不过，我不建议您使用

eval

，否则您应该使用

ast.literal\u eval
您可以这样做：
 df1.assign(desired1 = df1['index'].str.replace('.*=.','([0, ').apply(eval))
Out: 
                                               index  desired1
0  bs(np.clip(vehicle_age_model, 0, np.inf), degr...         0
1  bs(np.clip(vehicle_age_model, 0, np.inf), degr...        10
2  bs(np.clip(vehicle_age_model, 0, np.inf), degr...        25
3  bs(np.clip(driver_age_model, 0, np.inf), degre...         0
4  bs(np.clip(driver_age_model, 0, np.inf), degre...        25
5  bs(np.clip(length_ft_model, 0, np.inf), degree...         0
6  bs(np.clip(length_ft_model, 0, np.inf), degree...         0

尽管如此，我不建议您使用eval
，否则您应该使用ast.literal\u eval
重新导入
def pull_编号和索引（输入字符串）：
patt=r'.*\[（\d）\]$'
l_idx=int（re.sub（patt，r'\g'，输入字符串））
l\u patt=r'.*结=\[（.*）\]\）.'
l_str=re.sub（l_patt，r'\g'，输入字符串）
纽结列表=列表（l_str.split（'，'））
如果l_idx==0：
返回0
其他：
返回节点列表[l\U idx-1]
df1['desired1']=df1['index']。应用（拉取编号和索引）

正则表达式有点奇怪，patt
匹配捕获组中括号中的最后一个数字，提取该数字并将其转换为int
l_patt
匹配捕获组中的knots=
下面的列表，使用re.sub
将其提取出来。生成的字符串将转换为带有str.split
的列表
那么比较就相当直接了。
import re
def pull_编号和索引（输入字符串）：
patt=r'.*\[（\d）\]$'
l_idx=int（re.sub（patt，r'\g'，输入字符串））
l\u patt=r'.*结=\[（.*）\]\）.'
l_str=re.sub（l_patt，r'\g'，输入字符串）
纽结列表=列表（l_str.split（'，'））
如果l_idx==0：
返回0
其他：
返回节点列表[l\U idx-1]
df1['desired1']=df1['index']。应用（拉取编号和索引）

正则表达式有点奇怪，patt
匹配捕获组中括号中的最后一个数字，提取该数字并将其转换为int
l_patt
匹配捕获组中的knots=
下面的列表，使用re.sub
将其提取出来。生成的字符串将转换为带有str.split
的列表
那么比较就相当直接了。
字符串似乎包含表达式。为什么不直接运行它们，或者对它们进行评估呢？软件包确实提供了系数。你应该看看这个。还是需要正则表达式解决方案？其中我不推荐@Onyanbu。我需要把它们拉出来，这样我就可以用它们加入另一个离散年龄表。这将允许我根据年龄加入正确的系数。我提供的答案没有解决问题吗？你的字符串似乎包含表达式。为什么不直接运行它们，或者对它们进行评估呢？软件包确实提供了系数。你应该看看这个。还是需要正则表达式解决方案？其中我不推荐@Onyanbu。我需要把它们拉出来，这样我就可以用它们加入另一个离散年龄表。这将允许我根据年龄加入正确的系数。我提供的答案没有解决问题吗？