xy点的度量结构&python_Python_Pandas_Scipy

xy点的度量结构&python

python pandas

xy点的度量结构&python,python,pandas,scipy,Python,Pandas,Scipy,我试图测量xy点的整体结构来表示重复出现的粒子形成。我希望采用成对的方法，通过相对于相邻点的定位来确定结构，而不是采用原始笛卡尔坐标的平均值为了实现这一点，我想在每个时间戳计算每个点和相邻点之间的向量。然后，每对点之间这些向量的平均值应提供整体结构 import pandas as pd from sklearn.neighbors import KernelDensity from scipy.spatial.distance import pdist, squareform import

我试图测量xy点的整体结构来表示重复出现的粒子形成。我希望采用成对的方法，通过相对于相邻点的定位来确定结构，而不是采用原始笛卡尔坐标的平均值

为了实现这一点，我想在每个时间戳计算每个点和相邻点之间的向量。然后，每对点之间这些向量的平均值应提供整体结构

import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np

# Example 1:
df = pd.DataFrame({   
    'Time' : [1,1,1,1,1,2,2,2,2,2],             
    'id' : ['A','B','C','D','E','B','A','C','D','E'],                 
    'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
    'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
    })

def calculate_distances(group):
    group_distances = pd.DataFrame(
        squareform(pdist(group[["X", "Y"]].to_numpy())),  # Default is Euclidean distance
        columns=group["id"],
        index=group["id"],
    )

    return group_distances

# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)

# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}

# Go over the timeframes
for timestamp, group in df.groupby("Time"):

    # ---
    # "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."

    # Determine the density of the group, within this timeframe
    kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
    log_density = kde.score_samples(group[["X", "Y"]])

    # Centroid is the most dense point in the structure
    centroid = group.iloc[np.argmax(log_density)]

    # Make a list of the other points to keep track of which points we've handled
    other_points = group["id"].to_list()

    # Start by making the centroid the active point
    active_point_id = centroid["id"]

    # ---
    # "... the relative position of that point’s nearest neighbor (ignoring any point already considered
    # in the process) and so on, until the positions of all points in the team have been determined."

    # Keep handling the next point until there are no points left
    while len(other_points) > 1:

        # Remove the active point from the list
        other_points = [point for point in other_points if point != active_point_id]

        # For the active point, get the nearest neighbor
        nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]

        # ---
        # "... We then identify the relative position of his nearest neighbor ..."

        # Determine the relative position of the nearest neigbor (compared to the active point)
        active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
        nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
        relative_position = active_point_coordinates - nearest_neighbor_coordinates

        # Add the relative position to the list, for this timestamp
        relative_positions[timestamp].append(relative_position)

        # The neighbor becomes the active point
        active_point_id = nearest_neighbor

# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."

# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)

# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted

plt.scatter(averages[:,0], averages[:,1])

注：如果向量在特定点之间硬编码，则无法正确识别结构。如果点交换位置或不同的点被替换，但保留相同的结构，最终结果将不准确。我希望函数能够根据相邻点确定整体结构

因此，顶点结构应采取两两方法，其中最终空间分布1）将结构质心设置为结构最密集部分的点位置，由到第三近邻的平均距离确定。2）确定其最近邻点的相对位置、该点最近邻点的相对位置等，直到确定所有点的位置

我将在下面生成两个示例。使用df1，第1帧在第一个时间戳显示点之间的向量。第2帧对某些点的新定位和其他点的位置交换（点A和B在帧之间交换定位）执行相同的操作。最后一帧显示所有帧的每个向量，而点显示平均结构

import pandas as pd
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt
import numpy as np

# Example 1:
df = pd.DataFrame({   
    'Time' : [1,1,1,1,1,2,2,2,2,2],             
    'id' : ['A','B','C','D','E','B','A','C','D','E'],                 
    'X' : [1.0,2.8,4.0,2.0,2.0,1.5,3.0,5.0,3.0,2.5],
    'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
    })

def calculate_distances(group):
    group_distances = pd.DataFrame(
        squareform(pdist(group[["X", "Y"]].to_numpy())),  # Default is Euclidean distance
        columns=group["id"],
        index=group["id"],
    )

    return group_distances

# Calculate the distances between the points, per timeframe
df_distances = df.groupby("Time").apply(calculate_distances)

# Create a placeholder to store the relative positions at every timestamp
relative_positions = {timestamp: [] for timestamp in df["Time"].values}

# Go over the timeframes
for timestamp, group in df.groupby("Time"):

    # ---
    # "... first, we set the centroid of the structure to be the position of the point in the densest part of the structure ..."

    # Determine the density of the group, within this timeframe
    kde = KernelDensity(kernel="gaussian", bandwidth=0.5).fit(group[["X", "Y"]])
    log_density = kde.score_samples(group[["X", "Y"]])

    # Centroid is the most dense point in the structure
    centroid = group.iloc[np.argmax(log_density)]

    # Make a list of the other points to keep track of which points we've handled
    other_points = group["id"].to_list()

    # Start by making the centroid the active point
    active_point_id = centroid["id"]

    # ---
    # "... the relative position of that point’s nearest neighbor (ignoring any point already considered
    # in the process) and so on, until the positions of all points in the team have been determined."

    # Keep handling the next point until there are no points left
    while len(other_points) > 1:

        # Remove the active point from the list
        other_points = [point for point in other_points if point != active_point_id]

        # For the active point, get the nearest neighbor
        nearest_neighbor = df_distances.loc[[timestamp]][active_point_id].droplevel(0).loc[other_points].sort_values().reset_index().iloc[0]["id"]

        # ---
        # "... We then identify the relative position of his nearest neighbor ..."

        # Determine the relative position of the nearest neigbor (compared to the active point)
        active_point_coordinates = group.loc[group["id"] == active_point_id, ["X", "Y"]].iloc[0].values
        nearest_neighbor_coordinates = group.loc[group["id"] == nearest_neighbor, ["X", "Y"]].iloc[0].values
        relative_position = active_point_coordinates - nearest_neighbor_coordinates

        # Add the relative position to the list, for this timestamp
        relative_positions[timestamp].append(relative_position)

        # The neighbor becomes the active point
        active_point_id = nearest_neighbor

# ---
# "... averaging the vectors between each pair of points over a specified time interval to gain a
# clear measure of their designated relative positions ..."

# Take the average vector, across timeframes
averages = np.mean([t for t in relative_positions.values()], axis=0)

# Plot the relative positions, NOTE: The centroid is always at (0, 0), and is not plotted

plt.scatter(averages[:,0], averages[:,1])

如果我在0,0处手动绘制质心，则输出为：

点结构框架1：

点结构框架2：

两个帧的总向量将高亮显示。因此，它们的平均点结构应为：

如果生成相同的点结构，但在后续帧中将点向右移动，则基础点结构应相同

df2 = pd.DataFrame({   
    'Time' : [1,1,1,1,1,2,2,2,2,2],             
    'id' : ['A','B','C','D','E','B','A','C','D','E'],                 
    'X' : [1.0,3.0,4.0,2.0,2.0,3.0,5.0,6.0,4.0,4.0],
    'Y' : [1.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0],
    })

预期结构：

我试着完全按照你引用的论文来写，但是对他们算法的描述非常模糊。这是我的解决方案：

导入numpy
进口大熊猫
随机输入
从sklearn.neights导入内核密度
从scipy.spatial.distance导入pdist，squareform
#从报纸上看：
# ---------------
#队形是通过计算每个球员和其他球员之间的向量来测量的
#在比赛中的连续瞬间，平均每对球员之间的向量
#玩家在指定的时间间隔内获得其指定相对位置的清晰测量值。
#外场球员的最终空间分布由以下算法确定：
#首先，我们将队形的质心设置为球员在队形最密集部分的位置
#团队，由到第三个最近邻居的平均距离决定。然后，我们确定
#他最近邻居的相对位置，该玩家最近邻居的相对位置
#（忽略过程中已考虑的任何玩家）等，直到所有玩家的位置
#球队的阵容已经确定。
#您的数据，我添加了一些随机性，以获得更真实的设置
df=1.DataFrame(
{
“时间”：[1,1,1,1,1,2,2,2,2]，
“id”：[“A”、“B”、“C”、“D”、“E”、“A”、“B”、“C”、“D”、“E”]，
“Y:[element+random.random（）*0.25表示[1.0,1.0,0.0,1.25,2.0,1.0,1.0,0.0,1.25,2.0]]中的元素，
“X:[element+random.random（）*0.25表示[1.0,3.0,2.0,2.25,2.0,3.0,5.0,4.0,4.25,4.0]]中的元素，
}
)
#绘制不同的时间范围（供参考）
对于df[“Time”]中的时间戳。unique（）
df.loc[df[“Time”]==时间戳].plot（kind=“scatter”，x=“x”，y=“y”）
def计算_距离（组：pandas.DataFrame）->pandas.DataFrame:
“”“在特定的时间范围内计算玩家之间的距离。
Args：
组（pandas.DataFrame）：来自指定时间段的数据
返回：
pandas.DataFrame：距离
"""
组_距离=pandas.DataFrame(
正方形（pdist（组[[“X”，“Y”]]到_numpy（）），#默认为欧几里德距离
columns=组[“id”]，
索引=组[“id”]，
)
返回组距离
#计算每个时间段各点之间的距离
df_距离=df.groupby（“时间”）。应用（计算_距离）
#创建占位符以存储每个时间戳的相对位置
相对位置={timestamp:[]表示df[“Time”]中的时间戳。值}
#检查时间表
对于时间戳，df.groupby中的组（“时间”）：
# ---
#“…首先，我们将队形的质心设置为球员在队形最密集部分的位置…”
#在此时间范围内确定集团的密度
kde=KernelDensity（kernel=“gaussian”，带宽=0.5）.fit（组[[“X”，“Y”]））
对数密度=kde.得分样本（组[“X”，“Y”]）
#质心是地层中最密集的点
质心=group.iloc[numpy.argmax（对数密度）]
#列一张其他球员的名单，记录我们处理过的球员
其他玩家=组[“id”]。到列表（）
#首先，使质心成为活动玩家
活动玩家id=质心[“id”]
# ---
#“…该玩家最近邻居的相对位置（忽略已考虑的任何玩家
#在此过程中）等等，直到确定团队中所有球员的位置。”
#继续处理下一个玩家，直到没有玩家离开
而len（其他玩家）>1:
#从列表中删除活动玩家
其他玩家=[其他玩家中的玩家对玩家如果玩家！=活动玩家\u id]
#对于活动玩家，获取最近的邻居
氖