Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从文本列中提取两列定量。_Python_Pandas - Fatal编程技术网

Python 从文本列中提取两列定量。

Python 从文本列中提取两列定量。,python,pandas,Python,Pandas,我有一个数据帧: df = pd.DataFrame({"id": [1,2,3,4,5], "text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],

我有一个数据帧:

df = pd.DataFrame({"id": [1,2,3,4,5],
                "text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],
                "name":["Joe","Adam","Sara","Jose","Bob"]})
我想把这些数字分成两列,得出以下结果:

df = pd.DataFrame({"id": [1,2,3,4,5],
                "text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],
                "name":["Joe","Adam","Sara","Jose","Bob"],
                "rating_nominator":[13.4,11,15,12,17],
                "rating_denominator":[10,9,10,10,10]})

非常感谢您的帮助。

您希望匹配的一般模式是
(一些号码)/(其他号码)
。匹配浮点数并不是一项简单的任务,上面有很多答案,所以可以回答这个问题,所以你可以在这里利用它

一个相当健壮的表达式,改编自is
([+-]?(?:[0-9]*[.])?[0-9]+)
。您可以将其与和f字符串一起使用:

fpr = r'([+-]?(?:[0-9]*[.])?[0-9]+)'

res = df.text.str.extract(fr'{fpr}\/{fpr}').astype(float)

要将其分配给数据帧,请执行以下操作:

df[['rating_nominator', 'rating_denominator']] = res

你可以用

df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)
正则表达式
(?\d+(?:\.\d+)/(?\d+(?:\.\d+))
将捕获整数或浮点数作为命名符或分母

edit:中的正则表达式涵盖了更多的情况。我做了一些假设,例如,在数字中找不到一元
+
符号。)

演示:

   id                                               text  name  rating_nominator  rating_denominator
0   1                         This is a ratio of 13.4/10   Joe              13.4                10.0
1   2                   Favorate rate of this id is 11/9  Adam              11.0                 9.0
2   3  It may not be a good looking person. But he is...  Sara              15.0                10.0
3   4                                     Ratio is 12/10  Jose              12.0                10.0
4   5                                 very popular 17/10   Bob              17.0                10.0
df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)
>>> df
   id                  text
0   1  foo 14.12/10.123 bar
1   2                 10/12
2   3             13.4/14.5
3   4          -12.24/-13.5
4   5                1/-1.2
>>>
>>> df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)
>>> df
   id                  text  rating_nominator  rating_denominator
0   1  foo 14.12/10.123 bar               14.12            10.123
1   2                 10/12               10.00            12.000
2   3             13.4/14.5               13.40            14.500
3   4          -12.24/-13.5              -12.24           -13.500
4   5                1/-1.2                1.00            -1.20