在Python中提取dataframe中一列的特定字符之前的值
如何从以下数据框中的在Python中提取dataframe中一列的特定字符之前的值,python,regex,pandas,dataframe,extract,Python,Regex,Pandas,Dataframe,Extract,如何从以下数据框中的address列中提取area值 address quantity price 0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20 1 P.O. Box 283 856
address
列中提取area
值
address quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 5 45
请注意,这是㎡代码>或平方米
所需的输出如下:
address area quantity price
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5 2 20
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡ 206.0 3 13
2 606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616 115.0 5 23
3 Ap #867-859 Sit Rd. Azusa New York 39 square metre 39.0 3 32
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392 470.0 5 45
使用str.extract
Ex:
df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
address area
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska... 206
2 606-3727 Ullamcorper. Street Roseville NH 115㎡... 115
3 Ap #867-859 Sit Rd. Azusa New York 39 square m... 39
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492... 470
输出:
df = pd.DataFrame({'address': ['711-2880 Nulla St. Mankato Mississippi 96.5㎡', 'P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 206㎡', '606-3727 Ullamcorper. Street Roseville NH 115㎡ 11523 (786) 713-8616', 'Ap #867-859 Sit Rd. Azusa New York 39 square metre', '7292 Dictum Av. San Antonio MI 470㎡ 47096 (492) 709-6392']})
df['area'] = df['address'].str.extract(r"(\d+\.?\d*)\s*(?=㎡|\bsquare metre\b)")
print(df)
address area
0 711-2880 Nulla St. Mankato Mississippi 96.5㎡ 96.5
1 P.O. Box 283 8562 Fusce Rd. Frederick Nebraska... 206
2 606-3727 Ullamcorper. Street Roseville NH 115㎡... 115
3 Ap #867-859 Sit Rd. Azusa New York 39 square m... 39
4 7292 Dictum Av. San Antonio MI 470㎡ 47096 (492... 470
非常感谢。你用的是正则表达式匹配法吗?@ahbon。是的,我不明白在㎡代码>或平方米
?顺便问一下,是否可以使用rsplit
或split
?很抱歉,我不理解,好的…我使用regex先行模式-->(?=)