Python 从位置列提取城市和州，改为点击AttributeError_Python_Regex_Pandas

Python 从位置列提取城市和州，改为点击AttributeError

python regex pandas

Python 从位置列提取城市和州，改为点击AttributeError,python,regex,pandas,Python,Regex,Pandas,我想清除location列，删除邮政编码，但在提取和使用regex时出现以下错误： AttributeError: 'str' object has no attribute 'str' 下面是一个仅包含位置列的示例数据帧： In [52]: df Out[52]: location 0 New Feliciamouth, WA 16422 1 Bakerfurt, CO 76376 2 Lake Elizabethvi

我想清除location列，删除邮政编码，但在提取和使用regex时出现以下错误：

AttributeError: 'str' object has no attribute 'str'

下面是一个仅包含位置列的示例数据帧：

In [52]: df
Out[52]:
                       location
0    New Feliciamouth, WA 16422
1           Bakerfurt, CO 76376
2  Lake Elizabethview, GA 59017
3      Robertschester, TX 92366
4       Robinsonmouth, AL 99445
5        North Connor, AZ 79552
6          Morganstad, WA 73506
7         New Roberto, IA 11832
8         Collierstad, DC 22151
9          Reneemouth, NJ 93901

这是为了说明问题而随机生成的数据

我希望将不同的城市显示为：

新费利西亚茅斯，华盛顿州贝克福公司等等

我正在使用以下代码：

def get_city(address):
    pattern = r'(.+\,\w.+)\w.+)'
    return address.str.extract(pattern,flags=re.I)

location = df['location']        
location.apply(get_city)
location.head()

但是，当我运行此命令时，会出现异常：

AttributeError                            Traceback (most recent call last)
<ipython-input-62-cdec695003fd> in <module>
      4
      5 location = df['location']
----> 6 location.apply(get_city)
      7 location.head()

.../lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4043             else:
   4044                 values = self.astype(object).values
-> 4045                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4046
   4047         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-62-cdec695003fd> in get_city(address)
      1 def get_city(address):
      2     pattern = r'(.+\,\w.+)\w.+)'
----> 3     return address.str.extract(pattern,flags=re.I)
      4
      5 location = df['location']

AttributeError: 'str' object has no attribute 'str'

或者在提取之前删除.str时，我得到：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-bfed6f810b40> in <module>
      4
      5 location = df['location']
----> 6 location.apply(get_city)
      7 location.head()

.../lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4043             else:
   4044                 values = self.astype(object).values
-> 4045                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4046
   4047         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-63-bfed6f810b40> in get_city(address)
      1 def get_city(address):
      2     pattern = r'(.+\,\w.+)\w.+)'
----> 3     return address.extract(pattern,flags=re.I)
      4
      5 location = df['location']

AttributeError: 'str' object has no attribute 'extract'

Series.apply将每个值传递给get_city函数。你不能使用Series.str。。。。函数的一个字符串值，您没有完整的系列

因为您有一个用于此的矢量化字符串函数，所以不要使用Series.apply，只需直接在location列上使用Series.str.extract方法：

In [52]: df
Out[52]:
                       location
0    New Feliciamouth, WA 16422
1           Bakerfurt, CO 76376
2  Lake Elizabethview, GA 59017
3      Robertschester, TX 92366
4       Robinsonmouth, AL 99445
5        North Connor, AZ 79552
6          Morganstad, WA 73506
7         New Roberto, IA 11832
8         Collierstad, DC 22151
9          Reneemouth, NJ 93901

请注意，我还更改了您的正则表达式模式：

有一个额外的在那里，这使得模式无效，因为它的开口丢失了。我在开头将.+替换为[^，]+，因此逗号永远不匹配。我在逗号后插入了\s*，以便在逗号和状态之间留出空白。我将\w.+替换为\w+，您不需要字母后跟任何内容，您只需要字母。不过，在这里只考虑大写字母，限制长度。[A-Z]{，3}将允许0到3个大写字母。 zipcode现在由一系列数字匹配。模式仍然可以匹配之后的其他字符，但这些字符不会是提取值的一部分。使用屏幕截图中实际可见的一整列值进行演示：

In [1]: import pandas as pd

In [2]: import re

In [3]: df = pd.DataFrame([["Atlanta, GA 30301"]], columns=["location"])

In [4]: pattern = r'([^,]+,\s*\w+)\s*\d*'

In [5]: df['location'].str.extract(pattern, flags=re.I)
Out[5]:
                        0
0    New Feliciamouth, WA
1           Bakerfurt, CO
2  Lake Elizabethview, GA
3      Robertschester, TX
4       Robinsonmouth, AL
5        North Connor, AZ
6          Morganstad, WA
7         New Roberto, IA
8         Collierstad, DC
9          Reneemouth, NJ

你的模式有不匹配的括号。我用随机生成的地址样本替换了你问题中的图像。在未来，请参阅，以获取有关如何制作好产品的重要提示，以便我们为您提供帮助。请注意，我们只需要location列，其余数据在这里并不重要。