如何进一步优化以下算法（Java），使其运行得更快？介绍_Java_Performance_Opencv

如何进一步优化以下算法（Java），使其运行得更快？介绍

java performance opencv

如何进一步优化以下算法（Java），使其运行得更快？介绍,java,performance,opencv,Java,Performance,Opencv,我目前已经用java实现了这个算法。我没有做论文中的以下建议（第5节末尾第3页）：相对昂贵的单元盖操作可以在很大程度上实现通过使用单个位存储每个单元的状态来加速这允许使用按位OR操作“覆盖”相邻的使用实现要在给定位偏移位置应用的盖测试、基准测试和分析算法测试通过创建12000个随机点进行测试，并以初始半径执行算法一次。测试附呈我使用对象内存吞吐量（实际创建的对象不多）、CPU（这是一个CPU瓶颈）、GC（这里没有太多事情发生）连续评测，CPU瓶颈当前位于bresenhamFi

我目前已经用java实现了这个算法。我没有做论文中的以下建议（第5节末尾第3页）：

相对昂贵的单元盖操作可以在很大程度上实现通过使用单个位存储每个单元的状态来加速这允许使用按位OR操作“覆盖”相邻的使用实现要在给定位偏移位置应用的盖

测试、基准测试和分析

算法测试通过创建12000个随机点进行测试，并以初始半径执行算法一次。测试附呈
我使用对象内存吞吐量（实际创建的对象不多）、CPU（这是一个CPU瓶颈）、GC（这里没有太多事情发生）连续评测，CPU瓶颈当前位于bresenhamFilledCircle方法内（这是所有操作发生的地方）。在12000个点中，大约有1.500个点是从主算法返回的，因此bresenhamFilledCircle执行约1.500*6.700=约1000万次pr.秒。这大约是0.1微秒（100纳秒）的pr调用。相当快，但应该有空间让它走得更快

到目前为止我做了什么

从一个基本的蛮力算法开始：两个嵌套的行和列循环，以及一个标准来判断我是否在圆内，将圆“绘制”为布尔值[][]。
```
吞吐量~3500次/s
```
切换到使用System.arrayCopy进行填充，而不是强制执行。
```
吞吐量~5 600次/s
```
优化的阵列初始化（使用缓存）。
```
吞吐量~6000次/s
```
在行和列上添加了边距，以避免在算法期间进行边界检查。
```
吞吐量~6 500次/s
```
切换到（稍微修改以填充圆圈），以避免“复杂”毕达哥拉斯检查。
```
吞吐量~6 500次/秒
```
：(
从2D阵列切换到1D阵列..
```
吞吐量~6 700次/秒
```

现在我已经没有主意了，除了将布尔值[]转换为字节[]，如果我正确理解了论文中的建议，就按照建议使用位掩码进行设置/获取

有人准备挑战吗

以下是JMH测试：

public class KeyPointFilterBenchmark {
    private static final int DEFAULT_RADIUS = 10;

    @Benchmark
    public List<OpenCVKeyPoint> benchmarkFilterByRadius(KeyPointFilterState state) {
        return state.filter.filterByRadius(DEFAULT_RADIUS, state.list);
    }

    @State(Scope.Thread)
    public static class KeyPointFilterState {
        private static final int NUMBER_OF_POINTS = 12_000;
        private static final int IMAGE_WIDTH = 640;
        private static final int IMAGE_HEIGHT = 480;
        private static final int RESPONSE_RANGE = 255;
        private List<OpenCVKeyPoint> list;
        private KeyPointFilter filter;

        @Setup(Level.Trial)
        public void doSetup() {
            this.list = new ArrayList<>();
            for (int i = 0; i < NUMBER_OF_POINTS; i++) {
                double x = Math.random() * IMAGE_WIDTH;
                double y = Math.random() * IMAGE_HEIGHT;
                float response = (float) (Math.random() * RESPONSE_RANGE);
                list.add(new OpenCVKeyPoint(x, y, response));
            }
            this.filter = new KeyPointFilter(IMAGE_WIDTH, IMAGE_HEIGHT);
        }
    }
}

您可以尽可能多地缓存更多计算和内联函数

尝试用此替换

filterByRadius

，查看是否有任何改进：

public List<OpenCVKeyPoint> filterByRadius(final int radius, List<OpenCVKeyPoint> input) {
    init(radius);

    // Possibly give a hint to the arraylist on how much space to allocate from the start.
    List<OpenCVKeyPoint> filtered = new ArrayList<>();

    // calculate once
    final int d_init = (5 - radius * 4) / 4;

    // Eliminating by covering
    for (OpenCVKeyPoint point : input) {

        // FIXME do the points need to be doubles, only to be cast to int?
        int col = (int) point.getXPos();
        int row = (int) point.getYPos();

        if (!isSet(col, row)) {
            final int rowOffset = (radius + row) * matrixColCount;
            final int colOffset = radius + col;

            int d = d_init;
            int x = 0;
            int y = radius;
            do {
                final int yStart = colOffset - y;
                final int yLength = 2 * y;

                final int xByMatrixColCount = x * matrixColCount;
                final int rowOffsetPlusYStart = rowOffset + yStart;

                // Since we are filling a circle, we fill using System.arraycopy, from left to right.

                // Row a bottom
                System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart - xByMatrixColCount),
                        yLength);
                if (x != 0) {
                    // Row a top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart + xByMatrixColCount),
                            yLength);

                    // -----
                    final int xLength = 2 * x;
                    final int yByMatrixColCount = y * matrixColCount;
                    final int rowOffsetPlusXStart = rowOffset + colOffset - x;

                    // Row b bottom
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart - yByMatrixColCount),
                            xLength);

                    // Row b top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart + yByMatrixColCount),
                            xLength);
                }
                if (d < 0) {
                    d += 2 * x + 1;
                } else {
                    d += 2 * (x - y) + 1;
                    y--;
                }
                x++;
            } while (x <= y);

            filtered.add(point);
        }
    }
    return filtered;
}

public List filterByRadius（最终整数半径，列表输入）{
初始（半径）；
//可能会提示arraylist从一开始就要分配多少空间。
List filtered=new ArrayList（）；
//算一次
最终整数d_init=（5-半径*4）/4；
//掩蔽消除
for（OpenCVKeyPoint:input）{
//修正我点是否需要是双倍的，只需要转换为int？
int col=（int）point.getXPos（）；
int row=（int）point.getYPos（）；
如果（！isSet（列，行））{
最终整数行偏移=（半径+行）*matrixColCount；
最终整数列偏移=半径+列；
int d=d_init；
int x=0；
int y=半径；
做{
最终整数yStart=colOffset-y；
最终长度=2*y；
最终整数xByMatrixColCount=x*matrixColCount；
最终整数rowOffsetPlusYStart=rowOffset+yStart；
//因为我们要填充一个圆，所以使用System.arraycopy从左到右填充。
//划一条底线
System.arraycopy（1，0，矩阵，（rowOffsetPlusYStart-xByMatrix-xColCount），
长度）；
如果（x！=0）{
//排顶
System.arraycopy（1，0，矩阵，（rowOffsetPlusYStart+xByMatrix xColCount），
长度）；
// -----
最终长度=2*x；
最终整数yByMatrixColCount=y*matrixColCount；
final int rowOffsetPlusXStart=rowOffset+colOffset-x；
//b排底部
System.arraycopy（1，0，矩阵，（rowOffsetPlusXStart-YbyMatrix xColCount），
x长度）；
//b排顶部
System.arraycopy（1，0，矩阵，（rowOffsetPlusXStart+YbyMatrix计算），
x长度）；
}
if（d<0）{
d+=2*x+1；
}否则{
d+=2*（x-y）+1；
y--；
}
x++；
}而（x所以，我想出了一个很好的优化。
通用的Bresenham算法将在圆的顶部和底部附近的同一位置产生多个绘制，但通过使用自定义策略，
我们可以有一个特定的绘画，例如10半径，没有更多的必要，几乎没有任何计算。
半径为10的圆的自定义策略如下：
System.arraycopy(ones, 0, matrix, getIndex(row, col + 7), 6);
System.arraycopy(ones, 0, matrix, getIndex(row + 1, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 2, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 3, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 4, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 5, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 6, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 7, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 8, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 9, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 10, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 11, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 12, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 13, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 14, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 15, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 16, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 17, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 18, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 19, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 20, col + 7), 6);

做了一个新的基准测试，吞吐量得到了提高，现在达到了8200次/秒
如果我引入线程并并行执行列表，可能会变得更高，但是现在这个吞吐量已经足够好了。
分析它，找出瓶颈在哪里。从那里开始工作。如果不这样做，我们只是在黑暗中拍摄。但是一般来说，新的在高性能敏感代码中是一种气味。我知道我已经尝试了代码，虽然我也认为缓存计算会略微提高性能，（根据我的经验，内联并不像很多小方法那样可以内联
public List<OpenCVKeyPoint> filterByRadius(final int radius, List<OpenCVKeyPoint> input) {
    init(radius);

    // Possibly give a hint to the arraylist on how much space to allocate from the start.
    List<OpenCVKeyPoint> filtered = new ArrayList<>();

    // calculate once
    final int d_init = (5 - radius * 4) / 4;

    // Eliminating by covering
    for (OpenCVKeyPoint point : input) {

        // FIXME do the points need to be doubles, only to be cast to int?
        int col = (int) point.getXPos();
        int row = (int) point.getYPos();

        if (!isSet(col, row)) {
            final int rowOffset = (radius + row) * matrixColCount;
            final int colOffset = radius + col;

            int d = d_init;
            int x = 0;
            int y = radius;
            do {
                final int yStart = colOffset - y;
                final int yLength = 2 * y;

                final int xByMatrixColCount = x * matrixColCount;
                final int rowOffsetPlusYStart = rowOffset + yStart;

                // Since we are filling a circle, we fill using System.arraycopy, from left to right.

                // Row a bottom
                System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart - xByMatrixColCount),
                        yLength);
                if (x != 0) {
                    // Row a top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusYStart + xByMatrixColCount),
                            yLength);

                    // -----
                    final int xLength = 2 * x;
                    final int yByMatrixColCount = y * matrixColCount;
                    final int rowOffsetPlusXStart = rowOffset + colOffset - x;

                    // Row b bottom
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart - yByMatrixColCount),
                            xLength);

                    // Row b top
                    System.arraycopy(ones, 0, matrix, (rowOffsetPlusXStart + yByMatrixColCount),
                            xLength);
                }
                if (d < 0) {
                    d += 2 * x + 1;
                } else {
                    d += 2 * (x - y) + 1;
                    y--;
                }
                x++;
            } while (x <= y);

            filtered.add(point);
        }
    }
    return filtered;
}

System.arraycopy(ones, 0, matrix, getIndex(row, col + 7), 6);
System.arraycopy(ones, 0, matrix, getIndex(row + 1, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 2, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 3, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 4, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 5, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 6, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 7, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 8, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 9, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 10, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 11, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 12, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 13, col), 20);
System.arraycopy(ones, 0, matrix, getIndex(row + 14, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 15, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 16, col + 1), 18);
System.arraycopy(ones, 0, matrix, getIndex(row + 17, col + 2), 16);
System.arraycopy(ones, 0, matrix, getIndex(row + 18, col + 3), 14);
System.arraycopy(ones, 0, matrix, getIndex(row + 19, col + 4), 12);
System.arraycopy(ones, 0, matrix, getIndex(row + 20, col + 7), 6);