Algorithm 在矩阵/位图中查找质量簇_Algorithm_Language Agnostic_Geometry

Algorithm 在矩阵/位图中查找质量簇

algorithm language-agnostic geometry

Algorithm 在矩阵/位图中查找质量簇,algorithm,language-agnostic,geometry,Algorithm,Language Agnostic,Geometry,这是此处发布的问题的继续：其中谈到了在布尔矩阵中求重心，给出了一个例子假设现在我们将矩阵展开为以下形式： 0 1 2 3 4 5 6 7 8 9 1 . X X . . . . . . 2 . X X X . . X . . 3 . . . . . X X X . 4 . . . . . . X . . 5 . X X . . . . . . 6 . X . . . . . . . 7 . X . . . . . . . 8 . . . . X X . . . 9 . . . . X X .

这是此处发布的问题的继续：其中谈到了在布尔矩阵中求重心，给出了一个例子

假设现在我们将矩阵展开为以下形式：

0 1 2 3 4 5 6 7 8 9
1 . X X . . . . . .
2 . X X X . . X . .
3 . . . . . X X X .
4 . . . . . . X . .
5 . X X . . . . . .
6 . X . . . . . . .
7 . X . . . . . . .
8 . . . . X X . . .
9 . . . . X X . . .

正如你所看到的，我们现在有4个质量中心，4个不同的星团

我们已经知道如何找到一个质心，因为只有一个存在，如果我们在这个矩阵上运行这个算法，我们将在矩阵的中间得到一些点，这对我们没有帮助。

有什么好的、正确的、快速的算法可以找到这些质量簇呢？

我想我会检查矩阵中的每个点，并根据它的邻居计算出它的质量。点的质量会随着距离的平方而下降。然后，您可以选择彼此之间距离最小的前四个点

下面是我编写的一些Python代码，试图说明找出每个点的质量的方法。使用示例矩阵进行一些设置：

matrix = [[1.0 if x == "X" else 0.0 for x in y] for y in """.XX......
.XXX..X..
.....XXX.
......X..
.XX......
.X.......
.X.......
....XX...
....XX...""".split("\n")]

HEIGHT = len(matrix)
WIDTH = len(matrix[0])
Y_RADIUS = HEIGHT / 2
X_RADIUS = WIDTH / 2

要计算给定点的质量，请执行以下操作：

def distance(x1, y1, x2, y2):
  'Manhattan distance http://en.wikipedia.org/wiki/Manhattan_distance'
  return abs(y1 - y2) + abs(x1 - x2)

def mass(m, x, y):
  _mass = m[y][x]
  for _y in range(max(0, y - Y_RADIUS), min(HEIGHT, y + Y_RADIUS)):
    for _x in range(max(0, x - X_RADIUS), min(WIDTH, x + X_RADIUS)):
      d = max(1, distance(x, y, _x, _y))
      _mass += m[_y][_x] / (d * d)
  return _mass

注意：我在这里使用距离（城市街区、出租车几何体），因为我认为使用欧几里得距离增加的精度不值得调用sqrt（）

迭代我们的矩阵并建立元组列表，如（x，y，mass（x，y））：

对每个点的体量列表进行排序：

from operator import itemgetter
point_mass.sort(key=itemgetter(2), reverse=True)

查看排序列表中的前9点：

(6, 2, 6.1580555555555554)
(2, 1, 5.4861111111111107)
(1, 1, 4.6736111111111107)
(1, 4, 4.5938888888888885)
(2, 0, 4.54)
(4, 7, 4.4480555555555554)
(1, 5, 4.4480555555555554)
(5, 7, 4.4059637188208614)
(4, 8, 4.3659637188208613)

如果我们从最高点到最低点，过滤掉与我们已经看到的点太接近的点，我们将得到（我正在手动操作，因为我现在已经没有时间在代码中操作了…）：

这是一个非常直观的结果，只需查看矩阵即可（注意，与您的示例相比，坐标是以零为基础的）。

我的第一个想法是首先找到任何具有非零值的单元格。从那里做一些泛洪填充算法，并计算所发现单元的质心。接下来，从矩阵中将找到的单元格归零，然后从头开始

这当然不会像tuinstoel链接的Google方法那样具有可伸缩性，但对于较小的矩阵更容易实现

编辑：

（使用路径压缩和按秩并集）在这里可能很有用。并集和查找集的时间复杂度为O（α（n）），其中

α（n）=min{k:Ak（1）≥ n}

Ak（n）是阿克曼函数，所以对于任何合理的值，α（n）本质上都是O（1）。唯一的问题是不相交集是项到集的单向映射，但如果要遍历所有项，这并不重要

下面是一个用于演示的简单python脚本：

from collections import defaultdict

class DisjointSets(object):
    def __init__(self):
        self.item_map = defaultdict(DisjointNode)

    def add(self,item):
        """Add item to the forest."""
        # It's gets initialized to a new node when
        # trying to access a non-existant item.
        return self.item_map[item]

    def __contains__(self,item):
        return (item in self.item_map)

    def __getitem__(self,item):
        if item not in self:
            raise KeyError
        return self.item_map[item]

    def __delitem__(self,item):
        del self.item_map[item]

    def __iter__(self):
        # sort all items into real sets
        all_sets = defaultdict(set)
        for item,node in self.item_map.iteritems():
            all_sets[node.find_set()].add(item)
        return all_sets.itervalues()

class DisjointNode(object):
    def __init__(self,parent=None,rank=0):
        if parent is None:
            self.parent = self
        else:
            self.parent = parent
        self.rank = rank

    def union(self,other):
        """Join two sets."""
        node1 = self.find_set()
        node2 = other.find_set()
        # union by rank
        if node1.rank > node2.rank:
            node2.parent = node1
        else:
            node1.parent = node2
            if node1.rank == node2.rank:
                node2.rank += 1
        return node1

    def find_set(self):
        """Finds the root node of this set."""
        node = self
        while node is not node.parent:
            node = node.parent
        # path compression
        root, node = node, self
        while node is not node.parent:
            node, node.parent = node.parent, root
        return root

def find_clusters(grid):
    disj = DisjointSets()
    for y,row in enumerate(grid):
        for x,cell in enumerate(row):
            if cell:
                node = disj.add((x,y))
                for dx,dy in ((-1,0),(-1,-1),(0,-1),(1,-1)):
                    if (x+dx,y+dy) in disj:
                        node.union(disj[x+dx,y+dy])
    for index,set_ in enumerate(disj):
        sum_x, sum_y, count = 0, 0, 0
        for x,y in set_:
            sum_x += x
            sum_y += y
            count += 1
        yield 1.0 * sum_x / count, 1.0 * sum_y / count

def main():
    grid = [[('.' != cell) for cell in row if not cell.isspace()] for row in (
        ". X X . . . . . .",
        ". X X X . . X . .",
        ". . . . . X X X .",
        ". . . . . . X . .",
        ". X X . . . . . .",
        ". X . . . . . . .",
        ". X . . . . . . .",
        ". . . . X X . . .",
        ". . . . X X . . .",
    )]
    coordinates = list(find_clusters(grid))
    centers = dict(((round(x),round(y)),i) for i,(x,y) in enumerate(coordinates))
    for y,row in enumerate(grid):
        for x,cell in enumerate(row):
            if (x,y) in centers:
                print centers[x,y]+1,
            elif cell:
                print 'X',
            else:
                print '.',
        print
    print
    print '%4s | %7s %7s' % ('i','x','y')
    print '-'*22
    for i,(x,y) in enumerate(coordinates):
        print '%4d | %7.4f %7.4f' % (i+1,x,y)

if __name__ == '__main__':
    main()

输出：

. X X . . . . . .
. X 3 X . . X . .
. . . . . X 4 X .
. . . . . . X . .
. X X . . . . . .
. 2 . . . . . . .
. X . . . . . . .
. . . . X X . . .
. . . . X 1 . . .

   i |       x       y
----------------------
   1 |  4.5000  7.5000
   2 |  1.2500  4.7500
   3 |  1.8000  0.6000
   4 |  6.0000  2.0000

这一点是为了证明不相交集。

find_clusters（）

中的实际算法可以升级为更健壮的算法

参考资料

算法简介。第二版Cormen等人

from collections import defaultdict

class DisjointSets(object):
    def __init__(self):
        self.item_map = defaultdict(DisjointNode)

    def add(self,item):
        """Add item to the forest."""
        # It's gets initialized to a new node when
        # trying to access a non-existant item.
        return self.item_map[item]

    def __contains__(self,item):
        return (item in self.item_map)

    def __getitem__(self,item):
        if item not in self:
            raise KeyError
        return self.item_map[item]

    def __delitem__(self,item):
        del self.item_map[item]

    def __iter__(self):
        # sort all items into real sets
        all_sets = defaultdict(set)
        for item,node in self.item_map.iteritems():
            all_sets[node.find_set()].add(item)
        return all_sets.itervalues()

class DisjointNode(object):
    def __init__(self,parent=None,rank=0):
        if parent is None:
            self.parent = self
        else:
            self.parent = parent
        self.rank = rank

    def union(self,other):
        """Join two sets."""
        node1 = self.find_set()
        node2 = other.find_set()
        # union by rank
        if node1.rank > node2.rank:
            node2.parent = node1
        else:
            node1.parent = node2
            if node1.rank == node2.rank:
                node2.rank += 1
        return node1

    def find_set(self):
        """Finds the root node of this set."""
        node = self
        while node is not node.parent:
            node = node.parent
        # path compression
        root, node = node, self
        while node is not node.parent:
            node, node.parent = node.parent, root
        return root

def find_clusters(grid):
    disj = DisjointSets()
    for y,row in enumerate(grid):
        for x,cell in enumerate(row):
            if cell:
                node = disj.add((x,y))
                for dx,dy in ((-1,0),(-1,-1),(0,-1),(1,-1)):
                    if (x+dx,y+dy) in disj:
                        node.union(disj[x+dx,y+dy])
    for index,set_ in enumerate(disj):
        sum_x, sum_y, count = 0, 0, 0
        for x,y in set_:
            sum_x += x
            sum_y += y
            count += 1
        yield 1.0 * sum_x / count, 1.0 * sum_y / count

def main():
    grid = [[('.' != cell) for cell in row if not cell.isspace()] for row in (
        ". X X . . . . . .",
        ". X X X . . X . .",
        ". . . . . X X X .",
        ". . . . . . X . .",
        ". X X . . . . . .",
        ". X . . . . . . .",
        ". X . . . . . . .",
        ". . . . X X . . .",
        ". . . . X X . . .",
    )]
    coordinates = list(find_clusters(grid))
    centers = dict(((round(x),round(y)),i) for i,(x,y) in enumerate(coordinates))
    for y,row in enumerate(grid):
        for x,cell in enumerate(row):
            if (x,y) in centers:
                print centers[x,y]+1,
            elif cell:
                print 'X',
            else:
                print '.',
        print
    print
    print '%4s | %7s %7s' % ('i','x','y')
    print '-'*22
    for i,(x,y) in enumerate(coordinates):
        print '%4d | %7.4f %7.4f' % (i+1,x,y)

if __name__ == '__main__':
    main()

. X X . . . . . .
. X 3 X . . X . .
. . . . . X 4 X .
. . . . . . X . .
. X X . . . . . .
. 2 . . . . . . .
. X . . . . . . .
. . . . X X . . .
. . . . X 1 . . .

   i |       x       y
----------------------
   1 |  4.5000  7.5000
   2 |  1.2500  4.7500
   3 |  1.8000  0.6000
   4 |  6.0000  2.0000