Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中提取dataframe中一列的特定字符之前的值_Python_Regex_Pandas_Dataframe_Extract - Fatal编程技术网

在Python中提取dataframe中一列的特定字符之前的值

在Python中提取dataframe中一列的特定字符之前的值,python,regex,pandas,dataframe,extract,Python,Regex,Pandas,Dataframe,Extract,如何从以下数据框中的address列中提取area值 address quantity price 0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20 1 P.O. Box 283 856

如何从以下数据框中的
address
列中提取
area

                                                               address     quantity   price  
0                         711-2880 Nulla St. Mankato Mississippi 96.5㎡          2       20   
1                  P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡          3       13   
2  606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616          5       23    
3                   Ap #867-859 Sit Rd. Azusa New York 39 square metre          3       32   
4             7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392          5       45   
请注意,这是
平方米

所需的输出如下:

                                                               address   area    quantity  price  
0                         711-2880 Nulla St. Mankato Mississippi 96.5㎡   96.5          2     20  
1                  P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡  206.0          3     13  
2  606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616  115.0          5     23  
3                   Ap #867-859 Sit Rd. Azusa New York 39 square metre   39.0          3     32  
4             7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392  470.0          5     45  

使用
str.extract

Ex:

df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
                                             address  area
0       711-2880 Nulla St. Mankato Mississippi 96.5㎡  96.5
1  P.O. Box 283 8562 Fusce Rd. Frederick Nebraska...   206
2  606-3727 Ullamcorper. Street Roseville NH 115㎡...   115
3  Ap #867-859 Sit Rd. Azusa New York 39 square m...    39
4  7292 Dictum Av. San Antonio MI 470㎡ 47096 (492...   470
输出:

df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
                                             address  area
0       711-2880 Nulla St. Mankato Mississippi 96.5㎡  96.5
1  P.O. Box 283 8562 Fusce Rd. Frederick Nebraska...   206
2  606-3727 Ullamcorper. Street Roseville NH 115㎡...   115
3  Ap #867-859 Sit Rd. Azusa New York 39 square m...    39
4  7292 Dictum Av. San Antonio MI 470㎡ 47096 (492...   470

非常感谢。你用的是正则表达式匹配法吗?@ahbon。是的,我不明白在
平方米
?顺便问一下,是否可以使用rsplit
或split
?很抱歉,我不理解,好的…我使用regex先行模式-->
(?=)