Algorithm 在矩阵/位图中查找质量簇
这是此处发布的问题的继续: 其中谈到了在布尔矩阵中求重心,给出了一个例子 假设现在我们将矩阵展开为以下形式:Algorithm 在矩阵/位图中查找质量簇,algorithm,language-agnostic,geometry,Algorithm,Language Agnostic,Geometry,这是此处发布的问题的继续: 其中谈到了在布尔矩阵中求重心,给出了一个例子 假设现在我们将矩阵展开为以下形式: 0 1 2 3 4 5 6 7 8 9 1 . X X . . . . . . 2 . X X X . . X . . 3 . . . . . X X X . 4 . . . . . . X . . 5 . X X . . . . . . 6 . X . . . . . . . 7 . X . . . . . . . 8 . . . . X X . . . 9 . . . . X X .
0 1 2 3 4 5 6 7 8 9
1 . X X . . . . . .
2 . X X X . . X . .
3 . . . . . X X X .
4 . . . . . . X . .
5 . X X . . . . . .
6 . X . . . . . . .
7 . X . . . . . . .
8 . . . . X X . . .
9 . . . . X X . . .
正如你所看到的,我们现在有4个质量中心,4个不同的星团
我们已经知道如何找到一个质心,因为只有一个存在,如果我们在这个矩阵上运行这个算法,我们将在矩阵的中间得到一些点,这对我们没有帮助。
有什么好的、正确的、快速的算法可以找到这些质量簇呢?我想我会检查矩阵中的每个点,并根据它的邻居计算出它的质量。点的质量会随着距离的平方而下降。然后,您可以选择彼此之间距离最小的前四个点 下面是我编写的一些Python代码,试图说明找出每个点的质量的方法。使用示例矩阵进行一些设置:
matrix = [[1.0 if x == "X" else 0.0 for x in y] for y in """.XX......
.XXX..X..
.....XXX.
......X..
.XX......
.X.......
.X.......
....XX...
....XX...""".split("\n")]
HEIGHT = len(matrix)
WIDTH = len(matrix[0])
Y_RADIUS = HEIGHT / 2
X_RADIUS = WIDTH / 2
要计算给定点的质量,请执行以下操作:
def distance(x1, y1, x2, y2):
'Manhattan distance http://en.wikipedia.org/wiki/Manhattan_distance'
return abs(y1 - y2) + abs(x1 - x2)
def mass(m, x, y):
_mass = m[y][x]
for _y in range(max(0, y - Y_RADIUS), min(HEIGHT, y + Y_RADIUS)):
for _x in range(max(0, x - X_RADIUS), min(WIDTH, x + X_RADIUS)):
d = max(1, distance(x, y, _x, _y))
_mass += m[_y][_x] / (d * d)
return _mass
注意:我在这里使用距离(城市街区、出租车几何体),因为我认为使用欧几里得距离增加的精度不值得调用sqrt()
迭代我们的矩阵并建立元组列表,如(x,y,mass(x,y)):
对每个点的体量列表进行排序:
from operator import itemgetter
point_mass.sort(key=itemgetter(2), reverse=True)
查看排序列表中的前9点:
(6, 2, 6.1580555555555554)
(2, 1, 5.4861111111111107)
(1, 1, 4.6736111111111107)
(1, 4, 4.5938888888888885)
(2, 0, 4.54)
(4, 7, 4.4480555555555554)
(1, 5, 4.4480555555555554)
(5, 7, 4.4059637188208614)
(4, 8, 4.3659637188208613)
如果我们从最高点到最低点,过滤掉与我们已经看到的点太接近的点,我们将得到(我正在手动操作,因为我现在已经没有时间在代码中操作了…):
这是一个非常直观的结果,只需查看矩阵即可(注意,与您的示例相比,坐标是以零为基础的)。我的第一个想法是首先找到任何具有非零值的单元格。从那里做一些泛洪填充算法,并计算所发现单元的质心。接下来,从矩阵中将找到的单元格归零,然后从头开始 这当然不会像tuinstoel链接的Google方法那样具有可伸缩性,但对于较小的矩阵更容易实现 编辑: (使用路径压缩和按秩并集)在这里可能很有用。并集和查找集的时间复杂度为O(α(n)),其中 α(n)=min{k:Ak(1)≥ n} Ak(n)是阿克曼函数,所以对于任何合理的值,α(n)本质上都是O(1)。唯一的问题是不相交集是项到集的单向映射,但如果要遍历所有项,这并不重要 下面是一个用于演示的简单python脚本:
from collections import defaultdict
class DisjointSets(object):
def __init__(self):
self.item_map = defaultdict(DisjointNode)
def add(self,item):
"""Add item to the forest."""
# It's gets initialized to a new node when
# trying to access a non-existant item.
return self.item_map[item]
def __contains__(self,item):
return (item in self.item_map)
def __getitem__(self,item):
if item not in self:
raise KeyError
return self.item_map[item]
def __delitem__(self,item):
del self.item_map[item]
def __iter__(self):
# sort all items into real sets
all_sets = defaultdict(set)
for item,node in self.item_map.iteritems():
all_sets[node.find_set()].add(item)
return all_sets.itervalues()
class DisjointNode(object):
def __init__(self,parent=None,rank=0):
if parent is None:
self.parent = self
else:
self.parent = parent
self.rank = rank
def union(self,other):
"""Join two sets."""
node1 = self.find_set()
node2 = other.find_set()
# union by rank
if node1.rank > node2.rank:
node2.parent = node1
else:
node1.parent = node2
if node1.rank == node2.rank:
node2.rank += 1
return node1
def find_set(self):
"""Finds the root node of this set."""
node = self
while node is not node.parent:
node = node.parent
# path compression
root, node = node, self
while node is not node.parent:
node, node.parent = node.parent, root
return root
def find_clusters(grid):
disj = DisjointSets()
for y,row in enumerate(grid):
for x,cell in enumerate(row):
if cell:
node = disj.add((x,y))
for dx,dy in ((-1,0),(-1,-1),(0,-1),(1,-1)):
if (x+dx,y+dy) in disj:
node.union(disj[x+dx,y+dy])
for index,set_ in enumerate(disj):
sum_x, sum_y, count = 0, 0, 0
for x,y in set_:
sum_x += x
sum_y += y
count += 1
yield 1.0 * sum_x / count, 1.0 * sum_y / count
def main():
grid = [[('.' != cell) for cell in row if not cell.isspace()] for row in (
". X X . . . . . .",
". X X X . . X . .",
". . . . . X X X .",
". . . . . . X . .",
". X X . . . . . .",
". X . . . . . . .",
". X . . . . . . .",
". . . . X X . . .",
". . . . X X . . .",
)]
coordinates = list(find_clusters(grid))
centers = dict(((round(x),round(y)),i) for i,(x,y) in enumerate(coordinates))
for y,row in enumerate(grid):
for x,cell in enumerate(row):
if (x,y) in centers:
print centers[x,y]+1,
elif cell:
print 'X',
else:
print '.',
print
print
print '%4s | %7s %7s' % ('i','x','y')
print '-'*22
for i,(x,y) in enumerate(coordinates):
print '%4d | %7.4f %7.4f' % (i+1,x,y)
if __name__ == '__main__':
main()
输出:
. X X . . . . . .
. X 3 X . . X . .
. . . . . X 4 X .
. . . . . . X . .
. X X . . . . . .
. 2 . . . . . . .
. X . . . . . . .
. . . . X X . . .
. . . . X 1 . . .
i | x y
----------------------
1 | 4.5000 7.5000
2 | 1.2500 4.7500
3 | 1.8000 0.6000
4 | 6.0000 2.0000
这一点是为了证明不相交集。find_clusters()
中的实际算法可以升级为更健壮的算法
参考资料
- 算法简介。第二版Cormen等人李>
- 您需要一个聚类算法,这很容易,因为您只有一个二维网格,并且条目彼此相邻。你可以用一个。拥有每个群集后,可以在中找到中心
from collections import defaultdict class DisjointSets(object): def __init__(self): self.item_map = defaultdict(DisjointNode) def add(self,item): """Add item to the forest.""" # It's gets initialized to a new node when # trying to access a non-existant item. return self.item_map[item] def __contains__(self,item): return (item in self.item_map) def __getitem__(self,item): if item not in self: raise KeyError return self.item_map[item] def __delitem__(self,item): del self.item_map[item] def __iter__(self): # sort all items into real sets all_sets = defaultdict(set) for item,node in self.item_map.iteritems(): all_sets[node.find_set()].add(item) return all_sets.itervalues() class DisjointNode(object): def __init__(self,parent=None,rank=0): if parent is None: self.parent = self else: self.parent = parent self.rank = rank def union(self,other): """Join two sets.""" node1 = self.find_set() node2 = other.find_set() # union by rank if node1.rank > node2.rank: node2.parent = node1 else: node1.parent = node2 if node1.rank == node2.rank: node2.rank += 1 return node1 def find_set(self): """Finds the root node of this set.""" node = self while node is not node.parent: node = node.parent # path compression root, node = node, self while node is not node.parent: node, node.parent = node.parent, root return root def find_clusters(grid): disj = DisjointSets() for y,row in enumerate(grid): for x,cell in enumerate(row): if cell: node = disj.add((x,y)) for dx,dy in ((-1,0),(-1,-1),(0,-1),(1,-1)): if (x+dx,y+dy) in disj: node.union(disj[x+dx,y+dy]) for index,set_ in enumerate(disj): sum_x, sum_y, count = 0, 0, 0 for x,y in set_: sum_x += x sum_y += y count += 1 yield 1.0 * sum_x / count, 1.0 * sum_y / count def main(): grid = [[('.' != cell) for cell in row if not cell.isspace()] for row in ( ". X X . . . . . .", ". X X X . . X . .", ". . . . . X X X .", ". . . . . . X . .", ". X X . . . . . .", ". X . . . . . . .", ". X . . . . . . .", ". . . . X X . . .", ". . . . X X . . .", )] coordinates = list(find_clusters(grid)) centers = dict(((round(x),round(y)),i) for i,(x,y) in enumerate(coordinates)) for y,row in enumerate(grid): for x,cell in enumerate(row): if (x,y) in centers: print centers[x,y]+1, elif cell: print 'X', else: print '.', print print print '%4s | %7s %7s' % ('i','x','y') print '-'*22 for i,(x,y) in enumerate(coordinates): print '%4d | %7.4f %7.4f' % (i+1,x,y) if __name__ == '__main__': main()
. X X . . . . . . . X 3 X . . X . . . . . . . X 4 X . . . . . . . X . . . X X . . . . . . . 2 . . . . . . . . X . . . . . . . . . . . X X . . . . . . . X 1 . . . i | x y ---------------------- 1 | 4.5000 7.5000 2 | 1.2500 4.7500 3 | 1.8000 0.6000 4 | 6.0000 2.0000