Python 通过数据框中的纬度/经度值指定城市名称
我有这个数据框:Python 通过数据框中的纬度/经度值指定城市名称,python,pandas,google-maps,numpy,dataframe,Python,Pandas,Google Maps,Numpy,Dataframe,我有这个数据框: userId latitude longitude dateTime 0 121165 30.314368 76.384381 2018-02-01 00:01:57 1 95592 13.186810 77.643769 2018-02-01 00:02:17 2 111435 28.512889 77.088154 2018-02-01 00:04:02 3 129
userId latitude longitude dateTime
0 121165 30.314368 76.384381 2018-02-01 00:01:57
1 95592 13.186810 77.643769 2018-02-01 00:02:17
2 111435 28.512889 77.088154 2018-02-01 00:04:02
3 129532 9.828420 76.310357 2018-02-01 00:06:03
4 95592 13.121986 77.610539 2018-02-01 00:08:54
我想创建一个新的dataframe列,如:
userId latitude longitude dateTime city
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Bengaluru
1 95592 13.186810 77.643769 2018-02-01 00:02:17 Delhi
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Mumbai
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Chennai
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Delhi
我看到了,但它不起作用
这是这里给出的代码:
from urllib2 import urlopen
import json
def getplace(lat, lon):
url = "http://maps.googleapis.com/maps/api/geocode/json?"
url += "latlng=%s,%s&sensor=false" % (lat, lon)
v = urlopen(url).read()
j = json.loads(v)
components = j['results'][0]['address_components']
country = town = None
for c in components:
if "country" in c['types']:
country = c['long_name']
if "postal_town" in c['types']:
town = c['long_name']
return town, country
for i,j in df['latitude'], df['longitude']:
getplace(i, j)
我在这个地方出错了:
components=j['results'][0]['address\u components']
列表索引超出范围
我把英国的其他经纬度值也算出来了,但对印度各州没有
现在我想试试这样的东西:
if i,j in zip(range(79,80),range(83,84)):
df['City']='Bengaluru'
elif i,j in zip(range(13,14),range(70,71)):
df['City']='Delhi'
等等。那么,如何使用纬度和经度值以更可行的方式分配城市呢?您使用的代码片段是2013年的;Google API已更改,
“postal\u town”
不再可用
您可以使用以下代码,该代码利用请求
库,并在没有返回结果的情况下设置保护
In [48]: def location(lat, long):
...: url = 'http://maps.googleapis.com/maps/api/geocode/json?latlng={0},{1}&sensor=false'.format(lat, long)
...: r = requests.get(url)
...: r_json = r.json()
...: if len(r_json['results']) < 1: return None, None
...: res = r_json['results'][0]['address_components']
...: country = next((c['long_name'] for c in res if 'country' in c['types']), None)
...: locality = next((c['long_name'] for c in res if 'locality' in c['types']), None)
...: return locality, country
...:
In [49]: location(28.512889, 77.088154)
Out[49]: ('Gurugram', 'India')
要将此应用于数据帧,可以像这样使用numpy
的矢量化
(请记住,第二行不会返回任何内容)
我注意到期望输出的城市位置不正确
另请注意,这可能需要一些时间,因为函数每次都需要查询API
您也可以创建范围更广的定位函数,但它将非常粗糙,并且可能覆盖的区域太广。然后,您可以按照前面显示的相同方式使用该功能
In [21]: def location(lat, long):
...: if 9 <= lat < 10 and 76 <= long < 77:
...: return 'Chennai'
...: elif 13 <= lat < 14 and 77 <= long < 78:
...: return 'Dehli'
...: elif 28 <= lat < 29 and 77 <= long < 78:
...: return 'Mumbai'
...: elif 30 <= lat < 31 and 76 <= long < 77:
...: return 'Bengaluru'
...:
In [22]: df['city'] = np.vectorize(location)(df['latitude'], df['longitude'])
In [23]: df
Out[23]:
userId latitude longitude dateTime city
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Bengaluru
1 95592 13.186810 77.643769 2018-02-01 00:02:17 Dehli
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Mumbai
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Chennai
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Dehli
[21]中的:定义位置(横向、纵向):
…:如果9,您的数据帧有多大?大约有220万有没有其他选项,比如我想要5-6个主要城市,所以我想将lat和long值放在zip范围内,并将名称rest fill指定为null?我已经更新了答案。如果对你有帮助,请接受并投票表决。thanksyaa第二部分使用了if,else命令,但我不知道为什么从Google API获取不起作用,尽管当我通过调用location函数打印它时,它的工作速度非常快
In [71]: import numpy as np
In [72]: df['locality'] = np.vectorize(location)(df['latitude'], df['longitude'])
In [73]: df
Out[73]:
userId latitude longitude dateTime locality
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Patiala
1 95592 13.186810 77.643769 2018-02-01 00:02:17 None
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Gurugram
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Ezhupunna
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Bengaluru
In [21]: def location(lat, long):
...: if 9 <= lat < 10 and 76 <= long < 77:
...: return 'Chennai'
...: elif 13 <= lat < 14 and 77 <= long < 78:
...: return 'Dehli'
...: elif 28 <= lat < 29 and 77 <= long < 78:
...: return 'Mumbai'
...: elif 30 <= lat < 31 and 76 <= long < 77:
...: return 'Bengaluru'
...:
In [22]: df['city'] = np.vectorize(location)(df['latitude'], df['longitude'])
In [23]: df
Out[23]:
userId latitude longitude dateTime city
0 121165 30.314368 76.384381 2018-02-01 00:01:57 Bengaluru
1 95592 13.186810 77.643769 2018-02-01 00:02:17 Dehli
2 111435 28.512889 77.088154 2018-02-01 00:04:02 Mumbai
3 129532 9.828420 76.310357 2018-02-01 00:06:03 Chennai
4 95592 13.121986 77.610539 2018-02-01 00:08:54 Dehli