Python:TypeError:zip参数#1必须支持迭代_Python_Pandas_Dataframe_Zip Operator

Python:TypeError:zip参数#1必须支持迭代

python pandas dataframe

Python:TypeError:zip参数#1必须支持迭代,python,pandas,dataframe,zip-operator,Python,Pandas,Dataframe,Zip Operator,使用zip（*map（…）调用时出错。详细解释见下文 TypeError:zip参数#1必须支持迭代这是我得到的。包含城市及其经纬度位置的数据框。现在我想用这个公式计算城市之间的距离起点是这个数据帧： import pandas as pd import numpy as np df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300}, {'city':"Potsd

使用zip（*map（…）调用时出错。详细解释见下文

TypeError:zip参数#1必须支持迭代

这是我得到的。包含城市及其经纬度位置的数据框。现在我想用这个公式计算城市之间的距离

起点是这个数据帧：

import pandas as pd
import numpy as np

df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
                   {'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
                   {'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
df

然后，我将数据帧本身连接起来，以获得成对的城市：

df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y]

这就给了我：

    city_x  lat_x       lng_x       tmp city_y  lat_y       lng_y
1   Berlin  52.52437    13.41053    1   Potsdam 52.39886    13.06566
2   Berlin  52.52437    13.41053    1   Hamburg 53.57532    10.01534
3   Potsdam 52.39886    13.06566    1   Berlin  52.52437    13.41053
5   Potsdam 52.39886    13.06566    1   Hamburg 53.57532    10.01534
6   Hamburg 53.57532    10.01534    1   Berlin  52.52437    13.41053
7   Hamburg 53.57532    10.01534    1   Potsdam 52.39886    13.06566

现在让我们来做重要的部分。HarVersion公式被放入一个函数中：

def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
    """
    Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes 
    based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
    """
    from math import radians, cos, sin, asin, sqrt
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles

    lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])

    # haversine formula 
    dlng = lng2 - lng1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
    c = 2 * asin(sqrt(a)) 
    distance = c * R
    return distance

然后应在连接的数据帧上调用此函数：

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = zip(*map(haversine_distance, lng1, lat1, lng2, lat2))
    return dist

# now invoke the method in order to get a new column (series) back
get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])

问题/错误：这给了我以下错误：

TypeError:zip参数#1必须支持迭代

备注：我不明白的是，为什么我会出现错误，因为另一种方法（见下文）工作得非常好。基本上是一样的

def lat_lng_to_cartesian(lat: float, lng: float) -> float:
    from math import radians, cos, sin
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles

    lat_, lng_ = map(radians, [lat, lng])

    x = R * cos(lat_) * cos(lng_)
    y = R * cos(lat_) * sin(lng_)
    z = R * sin(lat_)
    return x, y, z

def get_cartesian_coordinates(lat: pd.Series, lng: pd.Series) -> (pd.Series, pd.Series, pd.Series):
    if lat is None or lng is None:
        return
    x, y, z = zip(*map(lat_lng_to_cartesian, lat, lng))
    return x, y, z

get_cartesian_coordinates(df2['lat_x'], df2['lng_x'])

您的

haversine\u distance

函数返回一个数字，但是

zip

需要一个iterable，因此它会异常失败

lat_lng_to_cartesian

之所以有效，是因为它返回一个3元组，这是可编辑的

您可以通过返回1元组来消除异常：

return (distance,)

但我不认为在这里这样做有什么意义——实际上你根本不需要拉拉链：

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = map(haversine_distance, lng1, lat1, lng2, lat2)
    return pd.Series(dist)

您的

haversine\u distance

函数返回一个数字，但是

zip

需要一个iterable，因此它会异常失败

lat_lng_to_cartesian

之所以有效，是因为它返回一个3元组，这是可编辑的

您可以通过返回1元组来消除异常：

return (distance,)

但我不认为在这里这样做有什么意义——实际上你根本不需要拉拉链：

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = map(haversine_distance, lng1, lat1, lng2, lat2)
    return pd.Series(dist)

正如我在评论中提到的，为了能够以您当前定义的方式使用

haversine_距离

，您需要在映射之前先

zip

这些列。本质上，您需要编辑

get_haversine_distance

函数，以确保在将每个元组解压为

haversine_distance

函数的参数之前，

将相应的行压缩为元组。以下是使用提供的数据进行的说明：
import pandas as pd
import numpy as np

df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
                   {'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
                   {'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
df

#       city       lat       lng  tmp
# 0   Berlin  52.52437  13.41053    1
# 1  Potsdam  52.39886  13.06566    1
# 2  Hamburg  53.57532  10.01534    1

# Make sure to reset the index after you filter out the unneeded rows
df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y].reset_index(drop=True)

#     city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y
# 0   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566
# 1   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534
# 2  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053
# 3  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534
# 4  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053
# 5  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = pd.Series(map(lambda x: haversine_distance(*x), zip(lng1, lat1, lng2, lat2)))
    return dist


def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
    """
    Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes 
    based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
    """
    from math import radians, cos, sin, asin, sqrt
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles
    lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])
    # haversine formula
    dlng = lng2 - lng1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
    c = 2 * asin(sqrt(a))
    distance = c * R
    return distance


df2['distance'] = get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])

#     city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y    distance
# 0   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566   27.215704
# 1   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534  255.223782
# 2  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053   27.215704
# 3  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534  242.464120
# 4  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053  255.223782
# 5  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566  242.464120

让我知道这是否是您希望输出的样子。
正如我在评论中提到的，为了能够以您当前定义的方式使用haversine_距离
，您需要在映射之前先zip
这些列。本质上，您需要编辑get_haversine_distance
函数，以确保在将每个元组解压为haversine_distance
函数的参数之前，将相应的行压缩为元组。以下是使用提供的数据进行的说明：
import pandas as pd
import numpy as np

df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
                   {'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
                   {'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
df

#       city       lat       lng  tmp
# 0   Berlin  52.52437  13.41053    1
# 1  Potsdam  52.39886  13.06566    1
# 2  Hamburg  53.57532  10.01534    1

# Make sure to reset the index after you filter out the unneeded rows
df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y].reset_index(drop=True)

#     city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y
# 0   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566
# 1   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534
# 2  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053
# 3  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534
# 4  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053
# 5  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566

def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
    dist = pd.Series(map(lambda x: haversine_distance(*x), zip(lng1, lat1, lng2, lat2)))
    return dist


def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
    """
    Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes 
    based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
    """
    from math import radians, cos, sin, asin, sqrt
    R = 6371 # Radius of earth in kilometers. Use 3956 for miles
    lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])
    # haversine formula
    dlng = lng2 - lng1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
    c = 2 * asin(sqrt(a))
    distance = c * R
    return distance


df2['distance'] = get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])

#     city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y    distance
# 0   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566   27.215704
# 1   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534  255.223782
# 2  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053   27.215704
# 3  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534  242.464120
# 4  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053  255.223782
# 5  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566  242.464120

请告诉我这是否是您希望输出的样子。
正如Andrea指出的，问题是haversine_distance返回的是一个数字，而不是迭代器。也就是说，您还可以对df2使用apply
：
df2.apply(lambda row: haversine_distance(row['lng_x'], row['lat_x'], row['lng_y'], row['lat_y']), axis=1)

正如Andrea指出的，问题在于haversine_distance返回的是一个数字，而不是迭代器。也就是说，您还可以对df2使用apply
：
df2.apply(lambda row: haversine_distance(row['lng_x'], row['lat_x'], row['lng_y'], row['lat_y']), axis=1)

好的，试过了，但没用。另外，get_cartesian_coordinations函数在没有列表的情况下也可以工作。我认为map
实际上不是这样工作的。理想情况下，您希望为映射的函数提供iterable的每个元素。你需要做的是zip
pd.Series的elements
，然后使用map
和haversine_distance
函数。比如：dist=pd.Series（map（lambda x:haversine_distance（*x），zip（lng1，lat1，lng2，lat2））
。PS：为什么要投否决票？似乎有人在否决一切。如果他们能让我们都知道问题和答案都有什么问题，那就太好了。好的，试过了，但没用。另外，get_cartesian_coordinations函数在没有列表的情况下也可以工作。我认为map
实际上不是这样工作的。理想情况下，您希望为映射的函数提供iterable的每个元素。你需要做的是zip
pd.Series的elements
，然后使用map
和haversine_distance
函数。比如：dist=pd.Series（map（lambda x:haversine_distance（*x），zip（lng1，lat1，lng2，lat2））
。PS：为什么要投否决票？似乎有人在否决一切。如果他们能让我们都知道问题和答案的错误，那就太好了。没错，我以前也有过这个代码。我想表演也更好。但是如果我将harversine方法添加到一个utils文件中，那么像我在原始文章中提到的那样，使用带有系列参数的函数语法会很酷。我想表演也更好。但是，如果我将harversine方法添加到utils文件中，那么像我最初的post.Perfect中提到的那样，使用带有系列参数的函数语法会很酷。我觉得我还不太习惯拉拉链和地图，甚至不习惯地图。非常好，再加上完美。我觉得我还不太习惯用拉链和地图，甚至不习惯用*地图。很好，加上1后者很好用。谢谢我的错误是用错了zip。后者很好用。谢谢我用错了拉链。