C# 邮递员?或者Vector.Dot那么慢?LocateFirstFoundByte方法只执行异或、乘法和移位。我怀疑向量。点会比这个快。实际上有两个重载。使用最上面的一个,它称为最下面的一个。顶部重载首先将向量视为向量,并找到第一个传递到底部重载的非零ulo
C# 邮递员?或者Vector.Dot那么慢?LocateFirstFoundByte方法只执行异或、乘法和移位。我怀疑向量。点会比这个快。实际上有两个重载。使用最上面的一个,它称为最下面的一个。顶部重载首先将向量视为向量,并找到第一个传递到底部重载的非零ulo,c#,vectorization,simd,intrinsics,dot-product,C#,Vectorization,Simd,Intrinsics,Dot Product,邮递员?或者Vector.Dot那么慢?LocateFirstFoundByte方法只执行异或、乘法和移位。我怀疑向量。点会比这个快。实际上有两个重载。使用最上面的一个,它称为最下面的一个。顶部重载首先将向量视为向量,并找到第一个传递到底部重载的非零ulong。问题在于! // One-time initialized vector containing { 1, 2, 3, 4, ... } Vector<ushort> indexes = MemoryMarsha
邮递员?或者
Vector.Dot
那么慢?LocateFirstFoundByte方法只执行异或、乘法和移位。我怀疑向量。点会比这个快。实际上有两个重载。使用最上面的一个,它称为最下面的一个。顶部重载首先将向量
视为向量
,并找到第一个传递到底部重载的非零ulong
。问题在于!
// One-time initialized vector containing { 1, 2, 3, 4, ... }
Vector<ushort> indexes = MemoryMarshal.Cast<ushort, Vector<ushort>>(Enumerable.Range(1, Vector<ushort>.Count).Select(index => (ushort)index).ToArray())[0];
// The input set and the element to search for
Span<ushort> set = stackalloc ushort[]{ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
ushort element = 22;
// Interpret input set as a sequence of vectors (set is assumed to have length power of two for brevity)
var setVectors = MemoryMarshal.Cast<ushort, Vector<ushort>>(set);
// Create a vector that contains the target element in each slot
var elementVector = new Vector<ushort>(element);
// Loop per vector rather than per element
foreach (var vector in setVectors)
{
// Get a mask that has a 1 in the single matching slot, or only 0s
var mask = Vector.Equals(vector, elementVector);
// Get the dot product of the mask and the indexes
// This will multiple each index by 0, or by 1 if it is the matching one, and return their sum, i.e. the matching index or 0
// Note that the indexes are deliberately 1-based, to distinguished from 0 (no match)
var index = Vector.Dot(indexes, mask);
// Either return 0 for no match, or reduce the index by 1 to get the 0-based index
return index == 0 ? -1 : index - 1;
}