Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从文本中提取年龄值以在文本中创建新列_Python_Regex_Python 3.x_Pandas - Fatal编程技术网

Python 从文本中提取年龄值以在文本中创建新列

Python 从文本中提取年龄值以在文本中创建新列,python,regex,python-3.x,pandas,Python,Regex,Python 3.x,Pandas,我的数据集如下: df=pd.DataFrame([["Sam is 5", 2000],["John is 3 years and 6 months",1200],["Jack is 4.5 years",7000],["Shane is 25 years old",2000]], columns = ['texts','amount']) print(df) texts amount 0 Sam is 5

我的数据集如下:

df=pd.DataFrame([["Sam is 5", 2000],["John is 3 years and 6 months",1200],["Jack is 4.5 years",7000],["Shane is 25 years old",2000]], columns = ['texts','amount'])

print(df)

    texts                          amount
0   Sam is 5                        2000
1   John is 3 years and 6 months    1200
2   Jack is 4.5 years               7000
3   Shane is 25 years old           2000
我想从
df['text']
中提取年龄值,并用它来计算新列
df['value']

df['value'] = df['amount'] / val 
其中val是
df['text']

这是我的密码

val = df['texts'].str.extract('(\d+\.?\d*)', expand=False).astype(float)
df['value'] = df['amount']/val
print(df)
输出:

    texts                          amount     value
0   Sam is 5                       2000     400.000000
1   John is 3 years and 6 months   1200     400.000000
2   Jack is 4.5 years              7000     1555.555556
3   Shane is 25 years old          2000     80.000000
预期产出:

    texts                          amount     value
0   Sam is 5                       2000     400.000000
1   John is 3 years and 6 months   1200     342.85
2   Jack is 4.5 years              7000     1555.555556
3   Shane is 25 years old          2000     80.000000
上面代码中的问题是,我不知道如何将3年6个月转换为3.5年

其他信息:文本列仅包含按年份和月份排列的年龄值

欢迎提出任何建议。谢谢

我相信您需要:

注意:如果没有年和月文本,则解决方案以年计算



3年6个月不是3.6年。我觉得你应该为每个人存储一个绝对数字,例如生日,然后在此基础上计算。@DyZ是3.5年。你的表达式返回3.0,因为它忽略了“6个月”。你需要一个类似于
'(\d+)(:?\。\d*)?\d+(\d*)的正则表达式。
。谢谢。这就是我要找的。
#extract all first numbers
a = df['texts'].str.extract('(\d+\.?\d*)', expand=False).astype(float)
#extract years only
b = df['texts'].str.extract('(\d+\.?\d*)\s+years', expand=False).astype(float)
#replace NaNs by a
y = b.combine_first(a)
print(y)
0     5.0
1     3.0
2     4.5
3    25.0
Name: texts, dtype: float64

#extract months only
m = df['texts'].str.extract('(\d+\.?\d*)\s+months', expand=False).astype(float) / 12
print (m)
0    NaN
1    0.5
2    NaN
3    NaN
Name: texts, dtype: float64

#add together
val = y.add(m, fill_value=0)
print (val)
0     5.0
1     3.5
2     4.5
3    25.0
Name: texts, dtype: float64
df['value'] = df['amount']/val
print (df)
                          texts  amount        value
0                      Sam is 5    2000   400.000000
1  John is 3 years and 6 months    1200   342.857143
2             Jack is 4.5 years    7000  1555.555556
3         Shane is 25 years old    2000    80.000000