xy点的度量结构&python

xy点的度量结构&python,python,pandas,scipy,Python,Pandas,Scipy,我试图测量xy点的整体结构来表示重复出现的粒子形成。我希望采用成对的方法,通过相对于相邻点的定位来确定结构,而不是采用原始笛卡尔坐标的平均值 为了实现这一点,我想在每个时间戳计算每个点和相邻点之间的向量。然后,每对点之间这些向量的平均值应提供整体结构 import pandas as pd from sklearn.neighbors import KernelDensity from scipy.spatial.distance import pdist, squareform import

我试图测量xy点的整体结构来表示重复出现的粒子形成。我希望采用成对的方法,通过相对于相邻点的定位来确定结构,而不是采用原始笛卡尔坐标的平均值

为了实现这一点,我想在每个时间戳计算每个点和相邻点之间的向量。然后,每对点之间这些向量的平均值应提供整体结构

import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np

# Example 1:
df = pd.DataFrame({   
    'Time' : [1,1,1,1,1,2,2,2,2,2],             
    'id' : ['A','B','C','D','E','B','A','C','D','E'],                 
    'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
    'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
    })

def calculate_distances(group):
    group_distances = pd.DataFrame(
        squareform(pdist(group[["X", "Y"]].to_numpy())),  # Default is Euclidean distance
        columns=group["id"],
        index=group["id"],
    )

    return group_distances

# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)

# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}

# Go over the timeframes
for timestamp, group in df.groupby("Time"):

    # ---
    # "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."

    # Determine the density of the group, within this timeframe
    kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
    log_density = kde.score_samples(group[["X", "Y"]])

    # Centroid is the most dense point in the structure
    centroid = group.iloc[np.argmax(log_density)]

    # Make a list of the other points to keep track of which points we've handled
    other_points = group["id"].to_list()

    # Start by making the centroid the active point
    active_point_id = centroid["id"]

    # ---
    # "... the relative position of that point’s nearest neighbor (ignoring any point already considered
    # in the process) and so on, until the positions of all points in the team have been determined."

    # Keep handling the next point until there are no points left
    while len(other_points) > 1:

        # Remove the active point from the list
        other_points = [point for point in other_points if point != active_point_id]

        # For the active point, get the nearest neighbor
        nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]

        # ---
        # "... We then identify the relative position of his nearest neighbor ..."

        # Determine the relative position of the nearest neigbor (compared to the active point)
        active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
        nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
        relative_position = active_point_coordinates - nearest_neighbor_coordinates

        # Add the relative position to the list, for this timestamp
        relative_positions[timestamp].append(relative_position)

        # The neighbor becomes the active point
        active_point_id = nearest_neighbor

# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."

# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)

# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted

plt.scatter(averages[:,0], averages[:,1])
注:如果向量在特定点之间硬编码,则无法正确识别结构。如果点交换位置或不同的点被替换,但保留相同的结构,最终结果将不准确。我希望函数能够根据相邻点确定整体结构

因此,顶点结构应采取两两方法,其中最终空间分布1)将结构质心设置为结构最密集部分的点位置,由到第三近邻的平均距离确定。2) 确定其最近邻点的相对位置、该点最近邻点的相对位置等,直到确定所有点的位置

我将在下面生成两个示例。使用df1,第1帧在第一个时间戳显示点之间的向量。第2帧对某些点的新定位和其他点的位置交换(点A和B在帧之间交换定位)执行相同的操作。最后一帧显示所有帧的每个向量,而点显示平均结构

import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np

# Example 1:
df = pd.DataFrame({   
    'Time' : [1,1,1,1,1,2,2,2,2,2],             
    'id' : ['A','B','C','D','E','B','A','C','D','E'],                 
    'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
    'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
    })

def calculate_distances(group):
    group_distances = pd.DataFrame(
        squareform(pdist(group[["X", "Y"]].to_numpy())),  # Default is Euclidean distance
        columns=group["id"],
        index=group["id"],
    )

    return group_distances

# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)

# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}

# Go over the timeframes
for timestamp, group in df.groupby("Time"):

    # ---
    # "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."

    # Determine the density of the group, within this timeframe
    kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
    log_density = kde.score_samples(group[["X", "Y"]])

    # Centroid is the most dense point in the structure
    centroid = group.iloc[np.argmax(log_density)]

    # Make a list of the other points to keep track of which points we've handled
    other_points = group["id"].to_list()

    # Start by making the centroid the active point
    active_point_id = centroid["id"]

    # ---
    # "... the relative position of that point’s nearest neighbor (ignoring any point already considered
    # in the process) and so on, until the positions of all points in the team have been determined."

    # Keep handling the next point until there are no points left
    while len(other_points) > 1:

        # Remove the active point from the list
        other_points = [point for point in other_points if point != active_point_id]

        # For the active point, get the nearest neighbor
        nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]

        # ---
        # "... We then identify the relative position of his nearest neighbor ..."

        # Determine the relative position of the nearest neigbor (compared to the active point)
        active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
        nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
        relative_position = active_point_coordinates - nearest_neighbor_coordinates

        # Add the relative position to the list, for this timestamp
        relative_positions[timestamp].append(relative_position)

        # The neighbor becomes the active point
        active_point_id = nearest_neighbor

# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."

# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)

# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted

plt.scatter(averages[:,0], averages[:,1])
如果我在0,0处手动绘制质心,则输出为:

点结构框架1:

点结构框架2:

两个帧的总向量将高亮显示。因此,它们的平均点结构应为:

如果生成相同的点结构,但在后续帧中将点向右移动,则基础点结构应相同

df2 = pd.DataFrame({   
    'Time' : [1,1,1,1,1,2,2,2,2,2],             
    'id' : ['A','B','C','D','E','B','A','C','D','E'],                 
    'X' : [1.0,3.0,4.0,2.0,2.0,3.0,5.0,6.0,4.0,4.0],
    'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
    })
预期结构:


我试着完全按照你引用的论文来写,但是对他们算法的描述非常模糊。这是我的解决方案:

导入numpy
进口大熊猫
随机输入
从sklearn.neights导入内核密度
从scipy.spatial.distance导入pdist,squareform
#从报纸上看:
# ---------------
#队形是通过计算每个球员和其他球员之间的向量来测量的
#在比赛中的连续瞬间,平均每对球员之间的向量
#玩家在指定的时间间隔内获得其指定相对位置的清晰测量值。
#外场球员的最终空间分布由以下算法确定:
#首先,我们将队形的质心设置为球员在队形最密集部分的位置
#团队,由到第三个最近邻居的平均距离决定。然后,我们确定
#他最近邻居的相对位置,该玩家最近邻居的相对位置
#(忽略过程中已考虑的任何玩家)等,直到所有玩家的位置
#球队的阵容已经确定。
#您的数据,我添加了一些随机性,以获得更真实的设置
df=1.DataFrame(
{
“时间”:[1,1,1,1,1,2,2,2,2],
“id”:[“A”、“B”、“C”、“D”、“E”、“A”、“B”、“C”、“D”、“E”],
“Y:[element+random.random()*0.25表示[1.0,1.0,0.0,1.25,2.0,1.0,1.0,0.0,1.25,2.0]]中的元素,
“X:[element+random.random()*0.25表示[1.0,3.0,2.0,2.25,2.0,3.0,5.0,4.0,4.25,4.0]]中的元素,
}
)
#绘制不同的时间范围(供参考)
对于df[“Time”]中的时间戳。unique()
df.loc[df[“Time”]==时间戳].plot(kind=“scatter”,x=“x”,y=“y”)
def计算_距离(组:pandas.DataFrame)->pandas.DataFrame:
“”“在特定的时间范围内计算玩家之间的距离。
Args:
组(pandas.DataFrame):来自指定时间段的数据
返回:
pandas.DataFrame:距离
"""
组_距离=pandas.DataFrame(
正方形(pdist(组[[“X”,“Y”]]到_numpy()),#默认为欧几里德距离
columns=组[“id”],
索引=组[“id”],
)
返回组距离
#计算每个时间段各点之间的距离
df_距离=df.groupby(“时间”)。应用(计算_距离)
#创建占位符以存储每个时间戳的相对位置
相对位置={timestamp:[]表示df[“Time”]中的时间戳。值}
#检查时间表
对于时间戳,df.groupby中的组(“时间”):
# ---
#“…首先,我们将队形的质心设置为球员在队形最密集部分的位置…”
#在此时间范围内确定集团的密度
kde=KernelDensity(kernel=“gaussian”,带宽=0.5).fit(组[[“X”,“Y”]))
对数密度=kde.得分样本(组[“X”,“Y”])
#质心是地层中最密集的点
质心=group.iloc[numpy.argmax(对数密度)]
#列一张其他球员的名单,记录我们处理过的球员
其他玩家=组[“id”]。到列表()
#首先,使质心成为活动玩家
活动玩家id=质心[“id”]
# ---
#“…该玩家最近邻居的相对位置(忽略已考虑的任何玩家
#在此过程中)等等,直到确定团队中所有球员的位置。”
#继续处理下一个玩家,直到没有玩家离开
而len(其他玩家)>1:
#从列表中删除活动玩家
其他玩家=[其他玩家中的玩家对玩家如果玩家!=活动玩家\u id]
#对于活动玩家,获取最近的邻居
氖