Python 在坐标数组中查找最大的十字形状
根据左/右/上/下侧的最大倍数查找质心坐标 下面的代码正在工作,但没有更大的数组结束。 我如何优化这一点: (如果numpy很重要,我将通过region找到带有Python 在坐标数组中查找最大的十字形状,python,numpy,optimization,Python,Numpy,Optimization,根据左/右/上/下侧的最大倍数查找质心坐标 下面的代码正在工作,但没有更大的数组结束。 我如何优化这一点: (如果numpy很重要,我将通过region找到带有region=region\u坐标的质心。tolist()) 对于测试: array_test = ([[0, 0], [1, 0], [1, 1], [0, 1], [2, 1], [2, 2], [3, 1], [3, 0], [2, 0], [3, 2]]) print(find_centroid(array_test)) 否无限
region=region\u坐标的质心。tolist()
)
对于测试:
array_test = ([[0, 0], [1, 0], [1, 1], [0, 1], [2, 1], [2, 2], [3, 1], [3, 0], [2, 0], [3, 2]])
print(find_centroid(array_test))
否无限循环已解释
如果区域是numpy数组,这部分代码将使您陷入无限循环:
while True:
if [x, y] in region:
...
这是因为,当在数组上使用时,中的运算符将在列表中的任何元素与数组的任何子列表元素匹配时返回True
相反,您可以使用python的any
和all
方法:
if (np.array(region)==[x,y]).all(axis=1).any(axis=0):
all(axis=1)
将按正确顺序检查每个子列表中的两个值是否相等
我们得到了一个布尔值数组。如果任何布尔值为True,则至少存在一个匹配项
将这两个列表中的任何一个强制转换为numpy数组就足以使此测试成为可能
但如果。。。
如果两个元素都是列表,in操作符将按预期工作,但在这种情况下,您应该确保区域及其每个子列表都是列表,而不是numpy数组。铸造它是行不通的。原因如下:
import numpy as np
array_test = [[0, 0], [1, 0]]
print([1,1] in array_test) # prints False, as expected
# numpy always compares element-wise, when both elements have the same length
print([1,1] == np.array([1,0])) # Prints [True, False]
print(np.array([1,1]) == np.array([1,0])) # [Line 6] Prints [True, False]
# Errors when ambiguous "in"
print([1,1] == np.array(array_test)) # Prints [[False False] [ True False]]
print([1,1] in np.array(array_test)) # Prints true as explained, because we have at least one True
print([1,1] in list(np.array(array_test))) #Error because numpy doesn't know how to evaluate the result at line 6
另一个版本
这是我的方法。也许有更好的办法,这只是我的两分钱
预滤波潜在质心
首先,我将组成该地区所有可能的交叉点(从现在起我称之为“中心”)。首先,我要计算每个x坐标和y坐标。为了简单起见,我将使用numpy
import numpy as np
# We count every x values. We keep those that are present at least twice.
x_counts = dict(zip(*np.unique(array_test[:,0], return_counts=True)))
y_counts = dict(zip(*np.unique(array_test[:,1], return_counts=True)))
# If an x is present once, then there cannot be any center in this column.
x_inter = [coord for coord, count in x_counts.items() if count>=2]
# Same with y and rows.
y_inter = [coord for coord, count in y_counts.items() if count>=2]
# Next, we create all combinations of (x, y)
# an filter in the combinations present in our region.
possible_centroids = np.array([(x,y) for x,y in product(x_inter, y_inter)
if (array_test==np.array([x,y])).all(axis=1).any())
测量臂长
为了计算我们中心的力量,我们首先使用一个函数来测量手臂长度。让我们用一个方向
参数使它有点参数化
# Since we are in 2D and we have no diagonal, there are four possible directions.
directions = np.array([[0,1], [0, -1], [1, 0], [-1, 0]])
def get_arm_length(center, direction):
position = center+direction # going one step in the direction
# We keep track of the length in the direction.
length = 0
# adding 1 as long as the next step in direction is in region
while (region==position).all(axis=1).any():
position += direction
length+=1
return length
测量每个潜在质心
现在,我们可以测试四个方向,针对每个潜在的质心(之前选择的),并在整个过程中保持最佳质心
best_center=(0,[-1, -1]) # => (power, center_coords)
for center in centers:
# Setting to 1, which is the identity element of the product (x * 1 == x)
power = 1
for direction in directions:
# We multiply by the power along the four axes.
power *= get_arm_length(center, direction)
# if a more powerful one is found, we store it power and coords.
if power > best_center[0]:
best_center = power, center
# At this point, we found most powerful center, which is our centroid.
把它们放在一起
这是完整的代码
def find_centroid2(region):
region = np.array(region)
# Directions:
directions = np.array([[0,1], [0, -1], [1, 0], [-1, 0]])
def get_arm_length(center, direction):
position = center+direction
length = 1
while (region==position).all(axis=1).any(axis=0):
position+= direction
length+=1
return length
# Intersections:
x_counts = dict(zip(*np.unique(region[:,0], return_counts=True)))
y_counts = dict(zip(*np.unique(region[:,1], return_counts=True)))
x_inter = [coord for coord, count in x_counts.items() if count>=2]
y_inter = [coord for coord, count in y_counts.items() if count>=2]
centers = np.array([(x,y) for x,y in product(x_inter, y_inter) if (region==np.array([x,y])).all(axis=1).any()])
# Measuring each center's "power":
best_center=(0,[-1, -1]) # => (power, center_coords)
for center in centers:
power = 1
for direction in directions:
power *= get_arm_length(center, direction)
if power > best_center[0]:
best_center = power, center
return best_center[1]
最优化的最优化
我们不必测试所有虚拟中心来保留属于我们区域的虚拟中心,而是可以过滤我们的区域,并将具有坐标的单元计数两次或两次以上
def find_centroid3(region):
region = np.array(region)
# Directions:
directions = np.array([[0,1], [0, -1], [1, 0], [-1, 0]])
def get_arm_length(center, direction):
position = center+direction
length = 1
while (region==position).all(axis=1).any(axis=0):
position+= direction
length+=1
return length
# Intersections:
# It's better to filter the cells instead of computing and testing all combinations
x_counts = [x[0] for x in zip(*np.unique(region[:,0], return_counts=True)) if x[1]>=2]
y_counts = [y[0] for y in zip(*np.unique(region[:,1], return_counts=True)) if y[1]>=2]
centers = [[x,y] for x,y in region if x in x_counts or y in y_counts]
# Measuring each center's "power":
best_center=(0,[-1, -1]) # => (power, center_coords)
for center in centers:
power = 1
for direction in directions:
power *= get_arm_length(center, direction)
if power > best_center[0]:
best_center = power, center
return best_center[1]
比较V2
随机区域的准备,有很多细胞
# Keeping the grid fairly big and filled
# 150*150 grid (22'500 cells) with 15'000 filled cells max.
array_test = np.random.randint(15, size=(150, 2)) # => len = 15'000
# Getting rid of duplicates, else they will mess with the counting.
# Assuming your own grids also don't have any
new_array = [list(array_test[0])]
for elem in array_test[1:]:
if (elem != np.array(new_array)).any(axis=1).all():
new_array.append(elem)
array_test = np.array(new_array) # => len = 10'959, all are unique cells
结果:
find_centroid(array_test) # Original version. Result = [64 127]
# 16 s ± 117 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
find_centroid(array_test) # Proposed version 1. result = [61 127]
# 13.1 s ± 87.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
find_centroid3(array_test) # Proposed version 2. Result = [61, 127]
# 9.49 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
我尝试了几种网格大小,使其保持最大半填充
比较V1
[过时]
您的原始代码(针对处理无限循环进行了更正):
拟议守则:
%%timeit
find_centroid2(array_test) # Result => array([73, 16])
# 17.2 s ± 76.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
这不是一个巨大的优化,但无论如何,这是一个优化。
也许其他一些评论和想法可以让它变得更好
我尝试了几种网格大小,使其保持最大半填充
对于任何需要更好(可能是完美的)答案的人来说,答案是使用一个小形状,通过腐蚀在中的数组上,而循环移除边框,直到找到:
-一个十字:中心坐标
-多交叉:分别比较最佳交叉点Wow。感谢您的指导;)真高兴看到这样的描述。让我在我的代码中替换你的,我会来回答的。好吧,关于“in”操作符,我已经有一个错误了。我描述的情况是在numpy数组上使用“in”时,而不是在python列表上<例如,数组([[1,0],[0,0]])
将返回True
。但是[[1,0,2]]
中的[1,2]将返回False
。让我更正一下。我忘记了一个旧的变量名。更正了,很抱歉!这就是在笔记本电脑中编程时发生的情况。正如图表所示,它更快。使用更大的区域进行计算仍然太长,我认为问题在于对所有“中心”进行计算,如果我们在按相同的x或y值对区域进行排序后只进行一次计算,或者像最常见的那样通过内置函数选择最多的出现次数。我说得对吗?回答你,我找到了一个更好的解决方案。在启发式方法中,排序是个好主意。但如果我们想100%确定找到了质心,排序对我们没有帮助。例如,7*5*7*5
小于6*6*6
。如果我们被要求在事先给定重心的情况下找到质心,这将是一个巨大的优化。至于计数器,这是一个聪明而简单的想法,但它似乎比dict(zip(*unique))方法慢4倍左右。
%%timeit
find_centroid(array_test) # Result => array([73, 16])
# 21.4 s ± 397 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
find_centroid2(array_test) # Result => array([73, 16])
# 17.2 s ± 76.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)