xy点的度量结构&python
我试图测量xy点的整体结构来表示重复出现的粒子形成。我希望采用成对的方法,通过相对于相邻点的定位来确定结构,而不是采用原始笛卡尔坐标的平均值 为了实现这一点,我想在每个时间戳计算每个点和相邻点之间的向量。然后,每对点之间这些向量的平均值应提供整体结构xy点的度量结构&python,python,pandas,scipy,Python,Pandas,Scipy,我试图测量xy点的整体结构来表示重复出现的粒子形成。我希望采用成对的方法,通过相对于相邻点的定位来确定结构,而不是采用原始笛卡尔坐标的平均值 为了实现这一点,我想在每个时间戳计算每个点和相邻点之间的向量。然后,每对点之间这些向量的平均值应提供整体结构 import pandas as pd from sklearn.neighbors import KernelDensity from scipy.spatial.distance import pdist, squareform import
import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np
# Example 1:
df = pd.DataFrame({
'Time' : [1,1,1,1,1,2,2,2,2,2],
'id' : ['A','B','C','D','E','B','A','C','D','E'],
'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
})
def calculate_distances(group):
group_distances = pd.DataFrame(
squareform(pdist(group[["X", "Y"]].to_numpy())), # Default is Euclidean distance
columns=group["id"],
index=group["id"],
)
return group_distances
# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)
# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}
# Go over the timeframes
for timestamp, group in df.groupby("Time"):
# ---
# "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."
# Determine the density of the group, within this timeframe
kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
log_density = kde.score_samples(group[["X", "Y"]])
# Centroid is the most dense point in the structure
centroid = group.iloc[np.argmax(log_density)]
# Make a list of the other points to keep track of which points we've handled
other_points = group["id"].to_list()
# Start by making the centroid the active point
active_point_id = centroid["id"]
# ---
# "... the relative position of that point’s nearest neighbor (ignoring any point already considered
# in the process) and so on, until the positions of all points in the team have been determined."
# Keep handling the next point until there are no points left
while len(other_points) > 1:
# Remove the active point from the list
other_points = [point for point in other_points if point != active_point_id]
# For the active point, get the nearest neighbor
nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]
# ---
# "... We then identify the relative position of his nearest neighbor ..."
# Determine the relative position of the nearest neigbor (compared to the active point)
active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
relative_position = active_point_coordinates - nearest_neighbor_coordinates
# Add the relative position to the list, for this timestamp
relative_positions[timestamp].append(relative_position)
# The neighbor becomes the active point
active_point_id = nearest_neighbor
# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."
# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)
# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted
plt.scatter(averages[:,0], averages[:,1])
注:如果向量在特定点之间硬编码,则无法正确识别结构。如果点交换位置或不同的点被替换,但保留相同的结构,最终结果将不准确。我希望函数能够根据相邻点确定整体结构
因此,顶点结构应采取两两方法,其中最终空间分布1)将结构质心设置为结构最密集部分的点位置,由到第三近邻的平均距离确定。2) 确定其最近邻点的相对位置、该点最近邻点的相对位置等,直到确定所有点的位置
我将在下面生成两个示例。使用df1,第1帧在第一个时间戳显示点之间的向量。第2帧对某些点的新定位和其他点的位置交换(点A和B在帧之间交换定位)执行相同的操作。最后一帧显示所有帧的每个向量,而点显示平均结构
import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np
# Example 1:
df = pd.DataFrame({
'Time' : [1,1,1,1,1,2,2,2,2,2],
'id' : ['A','B','C','D','E','B','A','C','D','E'],
'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
})
def calculate_distances(group):
group_distances = pd.DataFrame(
squareform(pdist(group[["X", "Y"]].to_numpy())), # Default is Euclidean distance
columns=group["id"],
index=group["id"],
)
return group_distances
# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)
# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}
# Go over the timeframes
for timestamp, group in df.groupby("Time"):
# ---
# "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."
# Determine the density of the group, within this timeframe
kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
log_density = kde.score_samples(group[["X", "Y"]])
# Centroid is the most dense point in the structure
centroid = group.iloc[np.argmax(log_density)]
# Make a list of the other points to keep track of which points we've handled
other_points = group["id"].to_list()
# Start by making the centroid the active point
active_point_id = centroid["id"]
# ---
# "... the relative position of that point’s nearest neighbor (ignoring any point already considered
# in the process) and so on, until the positions of all points in the team have been determined."
# Keep handling the next point until there are no points left
while len(other_points) > 1:
# Remove the active point from the list
other_points = [point for point in other_points if point != active_point_id]
# For the active point, get the nearest neighbor
nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]
# ---
# "... We then identify the relative position of his nearest neighbor ..."
# Determine the relative position of the nearest neigbor (compared to the active point)
active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
relative_position = active_point_coordinates - nearest_neighbor_coordinates
# Add the relative position to the list, for this timestamp
relative_positions[timestamp].append(relative_position)
# The neighbor becomes the active point
active_point_id = nearest_neighbor
# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."
# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)
# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted
plt.scatter(averages[:,0], averages[:,1])
如果我在0,0处手动绘制质心,则输出为:
点结构框架1:
点结构框架2:
两个帧的总向量将高亮显示。因此,它们的平均点结构应为:
如果生成相同的点结构,但在后续帧中将点向右移动,则基础点结构应相同
df2 = pd.DataFrame({
'Time' : [1,1,1,1,1,2,2,2,2,2],
'id' : ['A','B','C','D','E','B','A','C','D','E'],
'X' : [1.0,3.0,4.0,2.0,2.0,3.0,5.0,6.0,4.0,4.0],
'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
})
预期结构:
我试着完全按照你引用的论文来写,但是对他们算法的描述非常模糊。这是我的解决方案:
导入numpy
进口大熊猫
随机输入
从sklearn.neights导入内核密度
从scipy.spatial.distance导入pdist,squareform
#从报纸上看:
# ---------------
#队形是通过计算每个球员和其他球员之间的向量来测量的
#在比赛中的连续瞬间,平均每对球员之间的向量
#玩家在指定的时间间隔内获得其指定相对位置的清晰测量值。
#外场球员的最终空间分布由以下算法确定:
#首先,我们将队形的质心设置为球员在队形最密集部分的位置
#团队,由到第三个最近邻居的平均距离决定。然后,我们确定
#他最近邻居的相对位置,该玩家最近邻居的相对位置
#(忽略过程中已考虑的任何玩家)等,直到所有玩家的位置
#球队的阵容已经确定。
#您的数据,我添加了一些随机性,以获得更真实的设置
df=1.DataFrame(
{
“时间”:[1,1,1,1,1,2,2,2,2],
“id”:[“A”、“B”、“C”、“D”、“E”、“A”、“B”、“C”、“D”、“E”],
“Y:[element+random.random()*0.25表示[1.0,1.0,0.0,1.25,2.0,1.0,1.0,0.0,1.25,2.0]]中的元素,
“X:[element+random.random()*0.25表示[1.0,3.0,2.0,2.25,2.0,3.0,5.0,4.0,4.25,4.0]]中的元素,
}
)
#绘制不同的时间范围(供参考)
对于df[“Time”]中的时间戳。unique()
df.loc[df[“Time”]==时间戳].plot(kind=“scatter”,x=“x”,y=“y”)
def计算_距离(组:pandas.DataFrame)->pandas.DataFrame:
“”“在特定的时间范围内计算玩家之间的距离。
Args:
组(pandas.DataFrame):来自指定时间段的数据
返回:
pandas.DataFrame:距离
"""
组_距离=pandas.DataFrame(
正方形(pdist(组[[“X”,“Y”]]到_numpy()),#默认为欧几里德距离
columns=组[“id”],
索引=组[“id”],
)
返回组距离
#计算每个时间段各点之间的距离
df_距离=df.groupby(“时间”)。应用(计算_距离)
#创建占位符以存储每个时间戳的相对位置
相对位置={timestamp:[]表示df[“Time”]中的时间戳。值}
#检查时间表
对于时间戳,df.groupby中的组(“时间”):
# ---
#“…首先,我们将队形的质心设置为球员在队形最密集部分的位置…”
#在此时间范围内确定集团的密度
kde=KernelDensity(kernel=“gaussian”,带宽=0.5).fit(组[[“X”,“Y”]))
对数密度=kde.得分样本(组[“X”,“Y”])
#质心是地层中最密集的点
质心=group.iloc[numpy.argmax(对数密度)]
#列一张其他球员的名单,记录我们处理过的球员
其他玩家=组[“id”]。到列表()
#首先,使质心成为活动玩家
活动玩家id=质心[“id”]
# ---
#“…该玩家最近邻居的相对位置(忽略已考虑的任何玩家
#在此过程中)等等,直到确定团队中所有球员的位置。”
#继续处理下一个玩家,直到没有玩家离开
而len(其他玩家)>1:
#从列表中删除活动玩家
其他玩家=[其他玩家中的玩家对玩家如果玩家!=活动玩家\u id]
#对于活动玩家,获取最近的邻居
氖