Python 熊猫-根据其他行中的相对值计算新列
数据如下Python 熊猫-根据其他行中的相对值计算新列,python,pandas,Python,Pandas,数据如下 data = """ Class,Location,Long,Lat A,ABC11,139.6295542,35.61144069 A,ABC20,139.630596,35.61045559 A,ABC03,139.6300307,35.61327781 B,ABC54,139.7787818,35.68847945 B,ABC05,139.7814447,35.6816882 B,ABC06,139.7788191,35.681865 B,ABC24,139.7790396,35
data = """
Class,Location,Long,Lat
A,ABC11,139.6295542,35.61144069
A,ABC20,139.630596,35.61045559
A,ABC03,139.6300307,35.61327781
B,ABC54,139.7787818,35.68847945
B,ABC05,139.7814447,35.6816882
B,ABC06,139.7788191,35.681865
B,ABC24,139.7790396,35.67781697
"""
df = pd.read_csv(StringIO(data))
每行包含与位置相关的数据。对于每个位置,需要按如下方式查找到其他位置(行)的距离(为方便简化)
如果是在熊猫之外做的,我会做如下
import math
rows = df.to_dict('records')
# distance of each location w.r.t other locations excluding self
results = {}
for row in rows:
loc = row['Location']
results[loc] = {}
# get a new list excl the curr row
nrows = [row for row in rows if row['Location'] != loc]
for nrow in nrows:
dist = math.sqrt((row["Long"] - nrow["Long"])**2 + (row["Lat"] - nrow["Lat"])**2)
results[loc][nrow["Location"]] = dist
# find the location with min distance
fin_results = {}
for k, v in results.items():
fin_results[k] = {}
minValKey = min(v, key = v.get)
fin_results[k]["location"] = minValKey
fin_results[k]["dist"] = v[minValKey]
这将给出一个如下的输出,对于每个位置,该输出将给出距离该位置最近的位置和距离
{'ABC11': {'location': 'ABC20', 'dist': 0.001433795400325211}, 'ABC20': {'location': 'ABC11', 'dist': 0.001433795400325211}, 'ABC03': {'location': 'ABC11', 'dist': 0.001897909941062068}, 'ABC54': {'location': 'ABC06', 'dist': 0.006614555169662396}, 'ABC05': {'location': 'ABC06', 'dist': 0.002631545857463665}, 'ABC06': {'location': 'ABC05', 'dist': 0.002631545857463665}, 'ABC24': {'location': 'ABC06', 'dist': 0.004054030973106164}}
虽然这在功能上可行,但我想知道这样做的pandas
方法是什么
期望输出
+----------+-------------------+----------------------------+
| location | nearest_location | nearest_location_distance |
+----------+-------------------+----------------------------+
| 'ABC11' | 'ABC20' | 0.001433795400325211 |
| 'ABC20' | 'ABC11' | 0.001433795400325211 |
| 'ABC03' | 'ABC11' | 0.001897909941062068 |
| 'ABC54' | 'ABC06' | 0.006614555169662396 |
| 'ABC05' | 'ABC06' | 0.002631545857463665 |
| 'ABC06' | 'ABC05' | 0.002631545857463665 |
| 'ABC24' | 'ABC06' | 0.004054030973106164 |
+----------+-------------------+----------------------------+
您可以使用
numpy
广播
long_ = df.Long.to_numpy()
lat = df.Lat.to_numpy()
distances = np.sqrt((long_ - long_[:, None]) ** 2 + (lat - lat[:,None]) **2)
dist_df = pd.DataFrame(distances, index=df.Location, columns=df.Location)
输出数据帧类似于
nearest_location nearest_location_distance
Location
ABC11 ABC20 0.001434
ABC20 ABC11 0.001434
ABC03 ABC11 0.001898
ABC54 ABC06 0.006615
ABC05 ABC06 0.002632
ABC06 ABC05 0.002632
ABC24 ABC06 0.004054
这将找到从一行到所有其他行的距离。我就是这样解释这个问题的,不知道你的目标是不是。你可以使用
numpy
广播
long_ = df.Long.to_numpy()
lat = df.Lat.to_numpy()
distances = np.sqrt((long_ - long_[:, None]) ** 2 + (lat - lat[:,None]) **2)
dist_df = pd.DataFrame(distances, index=df.Location, columns=df.Location)
输出数据帧类似于
nearest_location nearest_location_distance
Location
ABC11 ABC20 0.001434
ABC20 ABC11 0.001434
ABC03 ABC11 0.001898
ABC54 ABC06 0.006615
ABC05 ABC06 0.002632
ABC06 ABC05 0.002632
ABC24 ABC06 0.004054
这将找到从一行到所有其他行的距离。这就是我对这个问题的解释,我不确定你的目标是不是。当安瑟夫提出同样的解决方案时,我完成了一点
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(data))
df['result']= (df['Lat'].diff(-1).pow(2)+df['Long'].diff(-1).pow(2)).pow(1/2)
随着ansev提出同样的解决方案,这项工作已经完成了一点
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(data))
df['result']= (df['Lat'].diff(-1).pow(2)+df['Long'].diff(-1).pow(2)).pow(1/2)
您可以使用
scipy
的距离矩阵
,这实际上就是@rafaelc编码的:
from scipy.spatial import distance_matrix
dist_mat = distance_matrix(df[['Long','Lat']],df[['Long','Lat']])
# assign distance matrix with appropriate name
dist_mat = pd.DataFrame(dist_mat,
index=df.Location,
columns=df.Location)
# convert the data frame to dict
(dist_mat.where(dist_mat>0)
.agg(('idxmin', 'min'))
.to_dict()
)
输出:
{'ABC11': {'idxmin': 'ABC20', 'min': 0.001433795400325211},
'ABC20': {'idxmin': 'ABC11', 'min': 0.001433795400325211},
'ABC03': {'idxmin': 'ABC11', 'min': 0.001897909941062068},
'ABC54': {'idxmin': 'ABC06', 'min': 0.006614555169662396},
'ABC05': {'idxmin': 'ABC06', 'min': 0.002631545857463665},
'ABC06': {'idxmin': 'ABC05', 'min': 0.002631545857463665},
'ABC24': {'idxmin': 'ABC06', 'min': 0.004054030973106164}}
idxmin min
ABC11 ABC20 0.0014338
ABC20 ABC11 0.0014338
ABC03 ABC11 0.00189791
ABC54 ABC06 0.00661456
ABC05 ABC06 0.00263155
ABC06 ABC05 0.00263155
ABC24 ABC06 0.00405403
如果只需要数据帧:
(dist_mat.where(dist_mat>0)
.agg(('idxmin', 'min'))
.T
)
输出:
{'ABC11': {'idxmin': 'ABC20', 'min': 0.001433795400325211},
'ABC20': {'idxmin': 'ABC11', 'min': 0.001433795400325211},
'ABC03': {'idxmin': 'ABC11', 'min': 0.001897909941062068},
'ABC54': {'idxmin': 'ABC06', 'min': 0.006614555169662396},
'ABC05': {'idxmin': 'ABC06', 'min': 0.002631545857463665},
'ABC06': {'idxmin': 'ABC05', 'min': 0.002631545857463665},
'ABC24': {'idxmin': 'ABC06', 'min': 0.004054030973106164}}
idxmin min
ABC11 ABC20 0.0014338
ABC20 ABC11 0.0014338
ABC03 ABC11 0.00189791
ABC54 ABC06 0.00661456
ABC05 ABC06 0.00263155
ABC06 ABC05 0.00263155
ABC24 ABC06 0.00405403
您可以使用
scipy
的距离矩阵
,这实际上就是@rafaelc编码的:
from scipy.spatial import distance_matrix
dist_mat = distance_matrix(df[['Long','Lat']],df[['Long','Lat']])
# assign distance matrix with appropriate name
dist_mat = pd.DataFrame(dist_mat,
index=df.Location,
columns=df.Location)
# convert the data frame to dict
(dist_mat.where(dist_mat>0)
.agg(('idxmin', 'min'))
.to_dict()
)
输出:
{'ABC11': {'idxmin': 'ABC20', 'min': 0.001433795400325211},
'ABC20': {'idxmin': 'ABC11', 'min': 0.001433795400325211},
'ABC03': {'idxmin': 'ABC11', 'min': 0.001897909941062068},
'ABC54': {'idxmin': 'ABC06', 'min': 0.006614555169662396},
'ABC05': {'idxmin': 'ABC06', 'min': 0.002631545857463665},
'ABC06': {'idxmin': 'ABC05', 'min': 0.002631545857463665},
'ABC24': {'idxmin': 'ABC06', 'min': 0.004054030973106164}}
idxmin min
ABC11 ABC20 0.0014338
ABC20 ABC11 0.0014338
ABC03 ABC11 0.00189791
ABC54 ABC06 0.00661456
ABC05 ABC06 0.00263155
ABC06 ABC05 0.00263155
ABC24 ABC06 0.00405403
如果只需要数据帧:
(dist_mat.where(dist_mat>0)
.agg(('idxmin', 'min'))
.T
)
输出:
{'ABC11': {'idxmin': 'ABC20', 'min': 0.001433795400325211},
'ABC20': {'idxmin': 'ABC11', 'min': 0.001433795400325211},
'ABC03': {'idxmin': 'ABC11', 'min': 0.001897909941062068},
'ABC54': {'idxmin': 'ABC06', 'min': 0.006614555169662396},
'ABC05': {'idxmin': 'ABC06', 'min': 0.002631545857463665},
'ABC06': {'idxmin': 'ABC05', 'min': 0.002631545857463665},
'ABC24': {'idxmin': 'ABC06', 'min': 0.004054030973106164}}
idxmin min
ABC11 ABC20 0.0014338
ABC20 ABC11 0.0014338
ABC03 ABC11 0.00189791
ABC54 ABC06 0.00661456
ABC05 ABC06 0.00263155
ABC06 ABC05 0.00263155
ABC24 ABC06 0.00405403
您还可以使用:
您还可以使用: