Pandas 如何按行程id分组并找到直线行驶距离?

Pandas 如何按行程id分组并找到直线行驶距离?,pandas,group-by,great-circle,Pandas,Group By,Great Circle,我有以下数据: Trip Start_Lat Start_Long End_lat End_Long Starting_point Ending_point Trip_1 56.5624 -85.56845 58.568 45.568 A B Trip_1 58.568 45.568 -200.568 -290.568

我有以下数据:

Trip      Start_Lat   Start_Long    End_lat      End_Long    Starting_point    Ending_point
Trip_1    56.5624     -85.56845       58.568       45.568         A               B
Trip_1    58.568       45.568       -200.568     -290.568         B               C 
Trip_1   -200.568     -290.568       56.5624     -85.56845        C               D
Trip_2    56.5624     -85.56845     -85.56845    -200.568         A               B
Trip_2   -85.56845    -200.568      -150.568     -190.568         B               C
我想找到电路,它是

   Circuity = Total Distance Travelled(Trip A+B+C+D) - Straight line (Trip A to D)
              -----------------------------------------------------------------------
                       Total Distance Traveled (Trip A+B+C+D)
我尝试了以下代码

    df['Distance']= df['flight_distance'] = df.apply(lambda x: great_circle((x['start_lat'], x['start_long']), (x['end_lat'], x['end_long'])).km, axis = 1) 
    df['Total_Distance'] = ((df.groupby('Trip')['distance'].shift(2) +['distance'].shift(1) + df['distance']).abs())

你能帮我找到直线距离和线路吗?

更新:

您可能希望首先将值转换为数字数据类型:

df[['Start_Lat','Start_Long','End_lat','End_Long']] = \
df[['Start_Lat','Start_Long','End_lat','End_Long']].apply(pd.to_numeric, errors='coerce')
IIUC您可以这样做:

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

def f(df):
    return 1 - haversine(df.iloc[0, 1], df.iloc[0, 2],
                         df.iloc[-1, 3], df.iloc[-1, 4]) \
               / \
               haversine(df['Start_Lat'], df['Start_Long'],
                         df['End_lat'], df['End_Long']).sum()

df.groupby('Trip').apply(f)
结果:

In [120]: df.groupby('Trip').apply(f)
Out[120]:
Trip
Trip_1    1.000000
Trip_2    0.499825
dtype: float64

更新:

您可能希望首先将值转换为数字数据类型:

df[['Start_Lat','Start_Long','End_lat','End_Long']] = \
df[['Start_Lat','Start_Long','End_lat','End_Long']].apply(pd.to_numeric, errors='coerce')
IIUC您可以这样做:

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

def f(df):
    return 1 - haversine(df.iloc[0, 1], df.iloc[0, 2],
                         df.iloc[-1, 3], df.iloc[-1, 4]) \
               / \
               haversine(df['Start_Lat'], df['Start_Long'],
                         df['End_lat'], df['End_Long']).sum()

df.groupby('Trip').apply(f)
结果:

In [120]: df.groupby('Trip').apply(f)
Out[120]:
Trip
Trip_1    1.000000
Trip_2    0.499825
dtype: float64

谢谢你的回答,但我得到了这个错误-类型错误:输入类型不支持ufunc'radians',根据强制转换规则“safe”,输入不能安全地强制为任何支持的类型。什么数据类型有你的DF?我有它作为对象你能将它们转换为数字类型吗?我尝试了pd to_numeric。我再次出错:/ValueError:序列的真值不明确。使用a.empty、a.bool()、a.item()、a.any()或a.all()。当我尝试values.astype(float)时,我得到一个错误-TypeError:“DataFrame”对象不可调用:(感谢您的回答,但我得到了这个错误-类型错误:输入类型不支持ufunc'radians',并且根据强制转换规则“safe”无法将输入安全地强制为任何支持的类型您的DF有哪些数据类型?我有它作为对象您可以将它们转换为数字数据类型吗?我尝试了pd.to_numeric。我再次得到错误:/ValueError:序列的真值是不明确的。请使用a.empty、a.bool()、a.item()、a.any()或a.all()。当我尝试使用values.astype(float)时,我得到一个错误-TypeError:“DataFrame”对象不可调用:(