Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从位置列提取城市和州,改为点击AttributeError_Python_Regex_Pandas - Fatal编程技术网

Python 从位置列提取城市和州,改为点击AttributeError

Python 从位置列提取城市和州,改为点击AttributeError,python,regex,pandas,Python,Regex,Pandas,我想清除location列,删除邮政编码,但在提取和使用regex时出现以下错误: AttributeError: 'str' object has no attribute 'str' 下面是一个仅包含位置列的示例数据帧: In [52]: df Out[52]: location 0 New Feliciamouth, WA 16422 1 Bakerfurt, CO 76376 2 Lake Elizabethvi

我想清除location列,删除邮政编码,但在提取和使用regex时出现以下错误:

AttributeError: 'str' object has no attribute 'str'
下面是一个仅包含位置列的示例数据帧:

In [52]: df
Out[52]:
                       location
0    New Feliciamouth, WA 16422
1           Bakerfurt, CO 76376
2  Lake Elizabethview, GA 59017
3      Robertschester, TX 92366
4       Robinsonmouth, AL 99445
5        North Connor, AZ 79552
6          Morganstad, WA 73506
7         New Roberto, IA 11832
8         Collierstad, DC 22151
9          Reneemouth, NJ 93901
这是为了说明问题而随机生成的数据

我希望将不同的城市显示为:

新费利西亚茅斯,华盛顿州 贝克福公司 等等

我正在使用以下代码:

def get_city(address):
    pattern = r'(.+\,\w.+)\w.+)'
    return address.str.extract(pattern,flags=re.I)

location = df['location']        
location.apply(get_city)
location.head()
但是,当我运行此命令时,会出现异常:

AttributeError                            Traceback (most recent call last)
<ipython-input-62-cdec695003fd> in <module>
      4
      5 location = df['location']
----> 6 location.apply(get_city)
      7 location.head()

.../lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4043             else:
   4044                 values = self.astype(object).values
-> 4045                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4046
   4047         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-62-cdec695003fd> in get_city(address)
      1 def get_city(address):
      2     pattern = r'(.+\,\w.+)\w.+)'
----> 3     return address.str.extract(pattern,flags=re.I)
      4
      5 location = df['location']

AttributeError: 'str' object has no attribute 'str'
或者在提取之前删除.str时,我得到:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-bfed6f810b40> in <module>
      4
      5 location = df['location']
----> 6 location.apply(get_city)
      7 location.head()

.../lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4043             else:
   4044                 values = self.astype(object).values
-> 4045                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4046
   4047         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-63-bfed6f810b40> in get_city(address)
      1 def get_city(address):
      2     pattern = r'(.+\,\w.+)\w.+)'
----> 3     return address.extract(pattern,flags=re.I)
      4
      5 location = df['location']

AttributeError: 'str' object has no attribute 'extract'
Series.apply将每个值传递给get_city函数。你不能使用Series.str。。。。函数的一个字符串值,您没有完整的系列

因为您有一个用于此的矢量化字符串函数,所以不要使用Series.apply,只需直接在location列上使用Series.str.extract方法:

In [52]: df
Out[52]:
                       location
0    New Feliciamouth, WA 16422
1           Bakerfurt, CO 76376
2  Lake Elizabethview, GA 59017
3      Robertschester, TX 92366
4       Robinsonmouth, AL 99445
5        North Connor, AZ 79552
6          Morganstad, WA 73506
7         New Roberto, IA 11832
8         Collierstad, DC 22151
9          Reneemouth, NJ 93901
请注意,我还更改了您的正则表达式模式:

有一个额外的在那里,这使得模式无效,因为它的开口丢失了。 我在开头将.+替换为[^,]+,因此逗号永远不匹配。 我在逗号后插入了\s*,以便在逗号和状态之间留出空白。 我将\w.+替换为\w+,您不需要字母后跟任何内容,您只需要字母。不过,在这里只考虑大写字母,限制长度。[A-Z]{,3}将允许0到3个大写字母。 zipcode现在由一系列数字匹配。模式仍然可以匹配之后的其他字符,但这些字符不会是提取值的一部分。 使用屏幕截图中实际可见的一整列值进行演示:

In [1]: import pandas as pd

In [2]: import re

In [3]: df = pd.DataFrame([["Atlanta, GA 30301"]], columns=["location"])

In [4]: pattern = r'([^,]+,\s*\w+)\s*\d*'

In [5]: df['location'].str.extract(pattern, flags=re.I)
Out[5]:
                        0
0    New Feliciamouth, WA
1           Bakerfurt, CO
2  Lake Elizabethview, GA
3      Robertschester, TX
4       Robinsonmouth, AL
5        North Connor, AZ
6          Morganstad, WA
7         New Roberto, IA
8         Collierstad, DC
9          Reneemouth, NJ

你的模式有不匹配的括号。我用随机生成的地址样本替换了你问题中的图像。在未来,请参阅,以获取有关如何制作好产品的重要提示,以便我们为您提供帮助。请注意,我们只需要location列,其余数据在这里并不重要。