C# 将坐标变换卸载到GPU_C#_Opencl_Gpu_Gpgpu_Coordinate Transformation

C# 将坐标变换卸载到GPU

c# opencl

C# 将坐标变换卸载到GPU,c#,opencl,gpu,gpgpu,coordinate-transformation,C#,Opencl,Gpu,Gpgpu,Coordinate Transformation,我有一个使用WinForms的传统地图查看器应用程序。这是斯洛。（速度过去是可以接受的，但是谷歌地图、谷歌地球出现了，用户被宠坏了。现在我可以让if更快：）在完成了所有明显的速度改进（缓存、并行执行、不绘制不需要绘制的内容等）之后，我的探查器向我显示，真正的瓶颈是将点从地图空间转换到屏幕空间时的坐标变换。通常，转换代码如下所示： public Point MapToScreen(PointF input) { // Note that North is neg

我有一个使用WinForms的传统地图查看器应用程序。这是斯洛。（速度过去是可以接受的，但是谷歌地图、谷歌地球出现了，用户被宠坏了。现在我可以让if更快：）

在完成了所有明显的速度改进（缓存、并行执行、不绘制不需要绘制的内容等）之后，我的探查器向我显示，真正的瓶颈是将点从地图空间转换到屏幕空间时的坐标变换。通常，转换代码如下所示：

public Point MapToScreen(PointF input) { // Note that North is negative! var result = new Point( (int)((input.X - this.currentView.X) * this.Scale), (int)((input.Y - this.currentView.Y) * this.Scale)); return result; }

public Point MapToScreen(Position input) { Point result = new Point(); result.X = (input.LongitudeInt - this.UpperLeftPosition.LongitudeInt) >> (Position.PrecisionCompensationPower - this.ZoomLevel); result.Y = (input.LatitudeInt - this.UpperLeftPosition.LatitudeInt) >> (Position.PrecisionCompensationPower - this.ZoomLevel); return result; }

__kernel void CoordTrans(__global int *lat, __global int *lon, __constant int ulpLat, __constant int ulpLon, __constant int zl, __global int *outx, __global int *outy) { int i = get_global_id(0); const int pcp = 20; outx[i] = (lon[i] - ulpLon) >> (pcp - zl); outy[i] = (lat[i] - ulpLat) >> (pcp - zl); }
真正的实现更复杂。纬度/经度表示为整数。为避免失去精度，将其乘以2^20（~100万）。这就是坐标的表示方式

public struct Position { public const int PrecisionCompensationPower = 20; public const int PrecisionCompensationScale = 1048576; // 2^20 public readonly int LatitudeInt; // North is negative! public readonly int LongitudeInt; }
重要的是，可能的比例因子也明确地限定为2的幂。这允许我们用位移位来代替乘法。因此，实际算法如下所示：

public Point MapToScreen(PointF input) { // Note that North is negative! var result = new Point( (int)((input.X - this.currentView.X) * this.Scale), (int)((input.Y - this.currentView.Y) * this.Scale)); return result; }

public Point MapToScreen(Position input) { Point result = new Point(); result.X = (input.LongitudeInt - this.UpperLeftPosition.LongitudeInt) >> (Position.PrecisionCompensationPower - this.ZoomLevel); result.Y = (input.LatitudeInt - this.UpperLeftPosition.LatitudeInt) >> (Position.PrecisionCompensationPower - this.ZoomLevel); return result; }

__kernel void CoordTrans(__global int *lat, __global int *lon, __constant int ulpLat, __constant int ulpLon, __constant int zl, __global int *outx, __global int *outy) { int i = get_global_id(0); const int pcp = 20; outx[i] = (lon[i] - ulpLon) >> (pcp - zl); outy[i] = (lat[i] - ulpLat) >> (pcp - zl); }
（UpperLeftPosition表示地图空间中屏幕的左上角。）我现在正在考虑将此计算卸载到GPU上。有人能给我举个例子吗

我们使用.NET4.0，但是代码最好也在WindowsXP上运行。此外，我们不能使用GPL下的库。
我建议您考虑使用OpenCL，要做到这一点，请查看，然后通过使用两个
ComputeBuffer
s（每个点的
LatitudeInt
和
LongtudeInt
各一个）将该值映射到2个输出
ComputeBuffer
s。我怀疑OpenCL代码看起来像这样：

public Point MapToScreen(PointF input) { // Note that North is negative! var result = new Point( (int)((input.X - this.currentView.X) * this.Scale), (int)((input.Y - this.currentView.Y) * this.Scale)); return result; }

public Point MapToScreen(Position input) { Point result = new Point(); result.X = (input.LongitudeInt - this.UpperLeftPosition.LongitudeInt) >> (Position.PrecisionCompensationPower - this.ZoomLevel); result.Y = (input.LatitudeInt - this.UpperLeftPosition.LatitudeInt) >> (Position.PrecisionCompensationPower - this.ZoomLevel); return result; }

__kernel void CoordTrans(__global int *lat, __global int *lon, __constant int ulpLat, __constant int ulpLon, __constant int zl, __global int *outx, __global int *outy) { int i = get_global_id(0); const int pcp = 20; outx[i] = (lon[i] - ulpLon) >> (pcp - zl); outy[i] = (lat[i] - ulpLat) >> (pcp - zl); }
但每个核心要做不止一个坐标变换。我得赶紧走了，我建议你在做这件事之前先阅读一下opencl

另外，如果coords的数量是合理的（我来自CUDA背景，只能代表NVIDIA GPU发言，但现在开始
在GPU上执行此操作的问题是您的操作/传输时间
每个元素需要执行1次操作。要获得真正的速度提升，每个元素需要执行的操作远不止此。全局内存和GPU上线程之间的带宽约为100GB/s。因此，如果必须加载一个4字节整数来执行一次触发器，理论上的最大速度为100/4=25次触发器。这远远不够从广告上的数百次失败中
请注意，这是理论上的最大值，实际结果可能会更糟。如果加载多个元素，情况会更糟。在您的情况下，它看起来像2，因此您可能从中获得最大12.5次失败。实际上，它几乎肯定会更低

如果你觉得这没问题，那就去做吧！
XNA可以用来做你需要的所有转换，并提供非常好的性能。它也可以显示在winforms应用程序中：
现在一年后，问题又出现了，我们找到了一个非常平庸的答案。我觉得没有早点意识到这一点有点愚蠢。我们画了g通过普通WinForms GDI将地理元素转换为位图。GDI是硬件加速的。我们所要做的不是自己进行转换，而是设置System.Drawing.Graphics对象的比例参数 Graphics.TranslateTransform（…）和Graphics.ScaleTransform（…）我们甚至不需要位移动的技巧

：）
从各个角度来看，平均2核CPU的浮点运算速度大约是多少？这取决于你所说的浮点运算。假设您的双核CPU的时钟速度为2GHz，一个触发器需要4个时钟周期。你可以做2*2/4=1亿次。这是一个非常粗略的估计。