Python 在pandas中使用apply函数创建新列TypeError：字符串索引必须是整数_Python_Pandas_Google Maps Api 3_Geocoding_Geopy

Python 在pandas中使用apply函数创建新列TypeError：字符串索引必须是整数

python pandas google-maps-api-3

Python 在pandas中使用apply函数创建新列TypeError：字符串索引必须是整数,python,pandas,google-maps-api-3,geocoding,geopy,Python,Pandas,Google Maps Api 3,Geocoding,Geopy,我有一个pandas数据框，其中有一个不完整的地址列表，我将其推送到Google Maps API，以获取尽可能多的关于每个地址的数据，并将这些数据存储在一个名为Components的列中，然后使用其他函数对该列进行解析，以获取区域名称、邮政编码等这就是它的样子 df['Components'][0]: "{'access_points': [], 'address_components': [{'long_name': '350', 'short_name': '350',

我有一个pandas数据框，其中有一个不完整的地址列表，我将其推送到Google Maps API，以获取尽可能多的关于每个地址的数据，并将这些数据存储在一个名为Components的列中，然后使用其他函数对该列进行解析，以获取区域名称、邮政编码等

这就是它的样子

df['Components'][0]:

"{'access_points': [],
 'address_components': [{'long_name': '350',
   'short_name': '350',
   'types': ['subpremise']},
  {'long_name': '1313', 'short_name': '1313', 'types': ['street_number']},
  {'long_name': 'Broadway', 'short_name': 'Broadway', 'types': ['route']},
  {'long_name': 'New Tacoma',
   'short_name': 'New Tacoma',
   'types': ['neighborhood', 'political']},
  {'long_name': 'Tacoma',
   'short_name': 'Tacoma',
   'types': ['locality', 'political']},
  {'long_name': 'Pierce County',
   'short_name': 'Pierce County',
   'types': ['administrative_area_level_2', 'political']},
  {'long_name': 'Washington',
   'short_name': 'WA',
   'types': ['administrative_area_level_1', 'political']},
  {'long_name': 'United States',
   'short_name': 'US',
   'types': ['country', 'political']},
  {'long_name': '98402', 'short_name': '98402', 'types': ['postal_code']}],
 'formatted_address': '1313 Broadway #350, Tacoma, WA 98402, USA',
 'geometry': {'location': {'lat': 47.250653, 'lng': -122.43913},
  'location_type': 'ROOFTOP',
  'viewport': {'northeast': {'lat': 47.2520019802915,
    'lng': -122.4377810197085},
   'southwest': {'lat': 47.2493040197085, 'lng': -122.4404789802915}}},
 'place_id': 'ChIJcysCMHtVkFQRRUkEIPwScyk',
 'plus_code': {'compound_code': '7H26+78 Tacoma, Washington, United States',
  'global_code': '84VV7H26+78'},
 'types': ['establishment', 'finance', 'point_of_interest']}"

然后我使用下面的函数来获取区域名称

def get_area(address_data):
    for item in address_data['address_components']:
        typs = set(item['types'])
        if typs == set(['neighborhood', 'political']):
            return item['long_name']

    return None

df.loc[:10000, 'area'] = df['Components'][:10000].apply(get_area)

TypeError                                 Traceback (most recent call last)
<ipython-input-233-eb2932e010e3> in <module>
----> 1 dfm.loc[:10000, 'area'] = dfm['Components'][:10000].apply(get_area)
      2 dfm['area'].value_counts()

~/virt_env/virt2/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4040             else:
   4041                 values = self.astype(object).values
-> 4042                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4043 
   4044         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-232-ede4aa629b42> in get_area(address_data)
    149 
    150 def get_area(address_data):
--> 151     for item in address_data['address_components']:
    152         typs = set(item['types'])
    153         if typs == set(['neighborhood', 'political']):

TypeError: string indices must be integers

def get_区域（地址数据）：
对于地址数据['address\u components']中的项：
类型=集合（项目['types']）
如果typs==set（['neighborary'，'political']）：
返回项目['long_name']
一无所获
df.loc[：10000，'area']=df['Components'][：10000]。应用（获取区域）
TypeError回溯（最近一次调用上次）
在里面
---->1 dfm.loc[：10000，'区域']=dfm['Components'][：10000]。应用（获取区域）
2 dfm[“区域”]。值_计数（）
应用中的~/virt_env/virt2/lib/python3.6/site-packages/pandas/core/series.py（self、func、convert_dtype、args、**kwds）
4040其他：
4041 values=self.astype（object.values）
->4042 mapped=lib.map\u expert（值，f，convert=convert\u数据类型）
4043
4044如果len（已映射）和isinstance（已映射[0]，系列）：
pandas/_libs/lib.pyx在pandas中。_libs.lib.map_infere（）
在get_区域（地址_数据）
149
150 def get_区域（地址_数据）：
-->151对于地址数据['address\u components']中的项：
152类型=集合（项目['types']）
153如果typs==set（[‘邻里’，‘政治’]）：
TypeError:字符串索引必须是整数

如何修复此问题以在组件列上运行此函数和其他函数？

出现此问题是因为

df['Components']

是一个字符串，有几种修复方法：

import json
def get_area(address_data_raw): 
   address_data = json.loads(address_data_raw) 
   for item in address_data['address_components']: 
      ...

第二种方式：

import json
def get_area(address_data):
   ...

to_dict = lambda x: json.loads(x)
df.loc[:10000, 'area'] = df['Components'][:10000].apply(to_dict)
df.loc[:10000, 'area'] = df['Components'][:10000].apply(get_area)

以下是一些让它工作的方法

df['Components][N]

（其中0@dvlper我认为是问题的原因。您能建议将其转换为字典的最佳方法吗？

import json def get\u area（address\u data\u raw）：address\u data=json。为address\u data['address\u Components'中的项加载（address\u data\u raw）]：…

可能是这种性质的东西！顺便说一句，这不是一种干净的方式！对于这两个版本，我都得到以下信息：JSONDecodeError：应使用双引号括起属性名：第1行第2列（char 1），然后字符串出现了问题。我注意到字符串中有一个#