Python 清理杂乱阵列以进行打印的简单方法
我有两个数组上的绘图数据,这些数组以未排序的方式存储,因此绘图不连续地从一个地方跳到另一个地方: 我试过: 我可以用它来“清理”我的数据吗?(在上述代码中,Python 清理杂乱阵列以进行打印的简单方法,python,arrays,sorting,numpy,matplotlib,Python,Arrays,Sorting,Numpy,Matplotlib,我有两个数组上的绘图数据,这些数组以未排序的方式存储,因此绘图不连续地从一个地方跳到另一个地方: 我试过: 我可以用它来“清理”我的数据吗?(在上述代码中,a可以是我的数据) 我计算的示例数据如下: array([[ 2.08937872e+001, 1.99020033e+001, 2.28260611e+001, 6.27711094e+000, 3.30392288e+000, 1.30312878e+001, 8.80768833
a
可以是我的数据)
我计算的示例数据如下:
array([[ 2.08937872e+001, 1.99020033e+001, 2.28260611e+001,
6.27711094e+000, 3.30392288e+000, 1.30312878e+001,
8.80768833e+000, 1.31238275e+001, 1.57400130e+001,
5.00278061e+000, 1.70752624e+001, 1.79131456e+001,
1.50746185e+001, 2.50095731e+001, 2.15895974e+001,
1.23237801e+001, 1.14860312e+001, 1.44268222e+001,
6.37680265e+000, 7.81485403e+000],
[ -1.19702178e-001, -1.14050879e-001, -1.29711421e-001,
8.32977493e-001, 7.27437322e-001, 8.94389885e-001,
8.65931116e-001, -6.08199292e-002, -8.51922900e-002,
1.12333841e-001, -9.88131292e-324, 4.94065646e-324,
-9.88131292e-324, 4.94065646e-324, 4.94065646e-324,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
-4.94065646e-324, 0.00000000e+000]])
- 使用(乔·金顿的)径向排序线后,我收到了以下曲线图:
- 对x轴数据进行排序,例如
并将结果存储在不同的变量中,例如x_old
x_new
- 对于
中的每个元素,在x_new
数组中查找其索引x_old
- 根据从上一步获得的索引,重新排列
数组中的元素y\u轴
list.index
方法比numpy.where
方法更容易操作,因此我会使用python list而不是numpy数组
例如(并假设x_old
和y_old
分别是x轴和y轴的先前numpy变量)
然后您可以绘制
x\u new
和y\u new
这实际上是一个比您通常认为的更困难的问题
在您的实际情况中,您可能可以通过y值进行排序。从情节上很难确定
因此,对于类似这样的圆形,更好的方法是进行径向排序
例如,让我们生成一些与您的数据类似的数据:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(.2, 1.6 * np.pi)
x, y = np.cos(t), np.sin(t)
# Shuffle the points...
i = np.arange(t.size)
np.random.shuffle(i)
x, y = x[i], y[i]
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
plt.show()
好的,现在让我们尝试使用径向排序来撤消这种无序排列。我们将使用点的质心作为中心,计算到每个点的角度,然后按该角度排序:
x0, y0 = x.mean(), y.mean()
angle = np.arctan2(y - y0, x - x0)
idx = angle.argsort()
x, y = x[idx], y[idx]
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
plt.show()
好的,非常接近!如果我们使用闭合多边形,我们就完成了
然而,我们有一个问题——这填补了错误的鸿沟。我们希望角度从直线上最大间隙的位置开始
因此,我们需要计算到新线上每个相邻点的间隙,并根据新的起始角度重新进行排序:
dx = np.diff(np.append(x, x[-1]))
dy = np.diff(np.append(y, y[-1]))
max_gap = np.abs(np.hypot(dx, dy)).argmax() + 1
x = np.append(x[max_gap:], x[:max_gap])
y = np.append(y[max_gap:], y[:max_gap])
其结果是:
作为一个完整的独立示例:
import numpy as np
import matplotlib.pyplot as plt
def main():
x, y = generate_data()
plot(x, y).set(title='Original data')
x, y = radial_sort_line(x, y)
plot(x, y).set(title='Sorted data')
plt.show()
def generate_data(num=50):
t = np.linspace(.2, 1.6 * np.pi, num)
x, y = np.cos(t), np.sin(t)
# Shuffle the points...
i = np.arange(t.size)
np.random.shuffle(i)
x, y = x[i], y[i]
return x, y
def radial_sort_line(x, y):
"""Sort unordered verts of an unclosed line by angle from their center."""
# Radial sort
x0, y0 = x.mean(), y.mean()
angle = np.arctan2(y - y0, x - x0)
idx = angle.argsort()
x, y = x[idx], y[idx]
# Split at opening in line
dx = np.diff(np.append(x, x[-1]))
dy = np.diff(np.append(y, y[-1]))
max_gap = np.abs(np.hypot(dx, dy)).argmax() + 1
x = np.append(x[max_gap:], x[:max_gap])
y = np.append(y[max_gap:], y[:max_gap])
return x, y
def plot(x, y):
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
return ax
main()
在@JoeKington的解决方案中,根据它们相对于中心的角度对数据库进行排序可能会对数据的某些部分产生问题:
In [1]:
import scipy.spatial as ss
import matplotlib.pyplot as plt
import numpy as np
import re
%matplotlib inline
In [2]:
data=np.array([[ 2.08937872e+001, 1.99020033e+001, 2.28260611e+001,
6.27711094e+000, 3.30392288e+000, 1.30312878e+001,
8.80768833e+000, 1.31238275e+001, 1.57400130e+001,
5.00278061e+000, 1.70752624e+001, 1.79131456e+001,
1.50746185e+001, 2.50095731e+001, 2.15895974e+001,
1.23237801e+001, 1.14860312e+001, 1.44268222e+001,
6.37680265e+000, 7.81485403e+000],
[ -1.19702178e-001, -1.14050879e-001, -1.29711421e-001,
8.32977493e-001, 7.27437322e-001, 8.94389885e-001,
8.65931116e-001, -6.08199292e-002, -8.51922900e-002,
1.12333841e-001, -9.88131292e-324, 4.94065646e-324,
-9.88131292e-324, 4.94065646e-324, 4.94065646e-324,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
-4.94065646e-324, 0.00000000e+000]])
In [3]:
plt.plot(data[0], data[1])
plt.title('Unsorted Data')
Out[3]:
<matplotlib.text.Text at 0x10a5c0550>
看起来标准化的欧几里德和马氏度量给出了最好的结果。请注意,我们选择第6个数据(索引5)的起点,它是该最大y值的数据点(当然,使用argmax
获取索引)
如果选择最大x值的起点(索引13),则会发生这种情况。似乎马氏度量比标准化的欧几里德度量好,因为它不受我们选择的起点的影响
你能发布你的数据吗?或者你是如何获得数据的?
a
不是你所展示的图的组成部分。那么,你真的需要线图吗?你不能只绘制数据点(没有线条)吗?看起来你需要的只是按照数据的y
值对数据进行排序,但是发布一些样本数据会有所帮助。@Ohm,我认为具有适当距离度量的最近邻法可能是最好的解决方案,请参见编辑。有关问题中显示的图像,按x值排序在这里不起作用。这可能适用于y值正如tom指出的,在这种情况下,这不适用于x,但可能适用于y。无论如何,如果您有numpy数组,请不要为此使用列表。相反,请执行x,y=x[y.argsort()],x[y.argsort()]
。这非常有效,但在某些情况下,当我绘制数据时,它会填充一个方向上有多个y
值的区域。好主意,它也可以用于将数据分割成不同的曲线吗?实际数据应包含一种半圆形曲线,向右开口,以及一条水平直线。。
import numpy as np
import matplotlib.pyplot as plt
def main():
x, y = generate_data()
plot(x, y).set(title='Original data')
x, y = radial_sort_line(x, y)
plot(x, y).set(title='Sorted data')
plt.show()
def generate_data(num=50):
t = np.linspace(.2, 1.6 * np.pi, num)
x, y = np.cos(t), np.sin(t)
# Shuffle the points...
i = np.arange(t.size)
np.random.shuffle(i)
x, y = x[i], y[i]
return x, y
def radial_sort_line(x, y):
"""Sort unordered verts of an unclosed line by angle from their center."""
# Radial sort
x0, y0 = x.mean(), y.mean()
angle = np.arctan2(y - y0, x - x0)
idx = angle.argsort()
x, y = x[idx], y[idx]
# Split at opening in line
dx = np.diff(np.append(x, x[-1]))
dy = np.diff(np.append(y, y[-1]))
max_gap = np.abs(np.hypot(dx, dy)).argmax() + 1
x = np.append(x[max_gap:], x[:max_gap])
y = np.append(y[max_gap:], y[:max_gap])
return x, y
def plot(x, y):
fig, ax = plt.subplots()
ax.plot(x, y, color='lightblue')
ax.margins(0.05)
return ax
main()
In [1]:
import scipy.spatial as ss
import matplotlib.pyplot as plt
import numpy as np
import re
%matplotlib inline
In [2]:
data=np.array([[ 2.08937872e+001, 1.99020033e+001, 2.28260611e+001,
6.27711094e+000, 3.30392288e+000, 1.30312878e+001,
8.80768833e+000, 1.31238275e+001, 1.57400130e+001,
5.00278061e+000, 1.70752624e+001, 1.79131456e+001,
1.50746185e+001, 2.50095731e+001, 2.15895974e+001,
1.23237801e+001, 1.14860312e+001, 1.44268222e+001,
6.37680265e+000, 7.81485403e+000],
[ -1.19702178e-001, -1.14050879e-001, -1.29711421e-001,
8.32977493e-001, 7.27437322e-001, 8.94389885e-001,
8.65931116e-001, -6.08199292e-002, -8.51922900e-002,
1.12333841e-001, -9.88131292e-324, 4.94065646e-324,
-9.88131292e-324, 4.94065646e-324, 4.94065646e-324,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
-4.94065646e-324, 0.00000000e+000]])
In [3]:
plt.plot(data[0], data[1])
plt.title('Unsorted Data')
Out[3]:
<matplotlib.text.Text at 0x10a5c0550>
In [10]:
#Calculate the angle in degrees of [0, 360]
sort_index = np.angle(np.dot((data.T-data.mean(1)), np.array([1.0, 1.0j])))
sort_index = np.where(sort_index>0, sort_index, sort_index+360)
#sorted the data by angle and plot them
sort_index = sort_index.argsort()
plt.plot(data[0][sort_index], data[1][sort_index])
plt.title('Data Sorted by angle relatively to the centroid')
plt.plot(data[0], data[1], 'r+')
Out[10]:
[<matplotlib.lines.Line2D at 0x10b009e10>]
In [7]:
def sort_dots(metrics, ax, start):
dist_m = ss.distance.squareform(ss.distance.pdist(data.T, metrics))
total_points = data.shape[1]
points_index = set(range(total_points))
sorted_index = []
target = start
ax.plot(data[0, target], data[1, target], 'o', markersize=16)
points_index.discard(target)
while len(points_index)>0:
candidate = list(points_index)
nneigbour = candidate[dist_m[target, candidate].argmin()]
points_index.discard(nneigbour)
points_index.discard(target)
#print points_index, target, nneigbour
sorted_index.append(target)
target = nneigbour
sorted_index.append(target)
ax.plot(data[0][sorted_index], data[1][sorted_index])
ax.set_title(metrics)
In [6]:
dmetrics = re.findall('pdist\(X\,\s+\'(.*)\'', ss.distance.pdist.__doc__)
In [8]:
f, axes = plt.subplots(4, 6, figsize=(16,10), sharex=True, sharey=True)
axes = axes.ravel()
for metrics, ax in zip(dmetrics, axes):
try:
sort_dots(metrics, ax, 5)
except:
ax.set_title(metrics + '(unsuitable)')
In [9]:
f, axes = plt.subplots(4, 6, figsize=(16,10), sharex=True, sharey=True)
axes = axes.ravel()
for metrics, ax in zip(dmetrics, axes):
try:
sort_dots(metrics, ax, 13)
except:
ax.set_title(metrics + '(unsuitable)')