Opengl es OpenGL ES（WebGL）渲染许多小对象_Opengl Es_Webgl

Opengl es OpenGL ES（WebGL）渲染许多小对象

opengl-es webgl

Opengl es OpenGL ES（WebGL）渲染许多小对象,opengl-es,webgl,Opengl Es,Webgl,我需要渲染很多小对象（大小为2-100个三角形），它们位于深层层次中，每个对象都有自己的矩阵。为了呈现它们，我预先计算了每个对象的实际矩阵，将对象放在一个列表中，我有两个调用来绘制每个对象：set matrix uniform和gl.drawerelements（）显然，这不是最快的方式。然后我有几千个对象的性能变得不可接受。我考虑的唯一解决方案是将多个对象批处理到单个缓冲区中。但这并不容易，因为每个对象都有自己的矩阵，要将对象放入共享缓冲区，我需要在CPU上通过矩阵变换其顶点。更糟糕的问题是

我需要渲染很多小对象（大小为2-100个三角形），它们位于深层层次中，每个对象都有自己的矩阵。为了呈现它们，我预先计算了每个对象的实际矩阵，将对象放在一个列表中，我有两个调用来绘制每个对象：set matrix uniform和gl.drawerelements（）

显然，这不是最快的方式。然后我有几千个对象的性能变得不可接受。我考虑的唯一解决方案是将多个对象批处理到单个缓冲区中。但这并不容易，因为每个对象都有自己的矩阵，要将对象放入共享缓冲区，我需要在CPU上通过矩阵变换其顶点。更糟糕的问题是，用户可以随时移动任何对象，我需要再次重新计算大型顶点数据（因为用户可以移动具有许多嵌套子对象的对象）

所以我在寻找其他的方法。最近在onshape.com项目中发现了奇怪的顶点着色器：

uniform mat4 uMVMatrix;
uniform mat3 uNMatrix;
uniform mat4 uPMatrix;
 
uniform vec3 uSpecular;
uniform float uOpacity;
uniform float uColorAmbientFactor;  //Determines how much of the vertex-specified color to use in the ambient term
uniform float uColorDiffuseFactor;  //Determines how much of the vertex-specified color to use in the diffuse term
 
uniform bool uApplyTranslucentAlphaToAll;
uniform float uTranslucentPassAlpha;
 
attribute vec3 aVertexPosition;
attribute vec3 aVertexNormal;
attribute vec2 aTextureCoordinate;
attribute vec4 aVertexColor;
 
varying vec3 vPosition;
varying lowp vec3 vNormal;
varying mediump vec2 vTextureCoordinate;
varying lowp vec3 vAmbient;
varying lowp vec3 vDiffuse;
varying lowp vec3 vSpecular;
varying lowp float vOpacity;
 
attribute vec4 aOccurrenceId;
 
float unpackOccurrenceId() {
  return aOccurrenceId.g * 65536.0 + aOccurrenceId.b * 256.0 + aOccurrenceId.a;
}
 
float unpackHashedBodyId() {
  return aOccurrenceId.r;
}
 
#define USE_OCCURRENCE_TEXTURE 1
 
#ifdef USE_OCCURRENCE_TEXTURE
 
uniform sampler2D uOccurrenceDataTexture;
uniform float uOccurrenceTexelWidth;
uniform float uOccurrenceTexelHeight;
#define ELEMENTS_PER_OCCURRENCE 2.0
 
void getOccurrenceData(out vec4 occurrenceData[2]) {
  // We will extract the occurrence data from the occurrence texture by converting the occurrence id to texture coordinates
 
  // Convert the packed occurrenceId into a single number
  float occurrenceId = unpackOccurrenceId();
 
  // We first determine the row of the texture by dividing by the overall texture width.  Each occurrence
  // has multiple rgba texture entries, so we need to account for each of those entries when determining the
  // element's offset into the buffer.
  float divided = (ELEMENTS_PER_OCCURRENCE * occurrenceId) * uOccurrenceTexelWidth;
  float row = floor(divided);
  vec2 coordinate;
  // The actual coordinate lies between 0 and 1.  We need to take care that coordinate lies on the texel
  // center by offsetting the coordinate by a half texel.
  coordinate.t = (0.5 + row) * uOccurrenceTexelHeight;
  // Figure out the width of one texel in texture space
  // Since we've already done the texture width division, we can figure out the horizontal coordinate
  // by adding a half-texel width to the remainder
  coordinate.s = (divided - row) + 0.5 * uOccurrenceTexelWidth;
  occurrenceData[0] = texture2D(uOccurrenceDataTexture, coordinate);
  // The second piece of texture data will lie in the adjacent column
  coordinate.s += uOccurrenceTexelWidth;
  occurrenceData[1] = texture2D(uOccurrenceDataTexture, coordinate);
}
 
#else
 
attribute vec4 aOccurrenceData0;
attribute vec4 aOccurrenceData1;
void getOccurrenceData(out vec4 occurrenceData[2]) {
  occurrenceData[0] = aOccurrenceData0;
  occurrenceData[1] = aOccurrenceData1;
}
 
#endif
 
/**
 * Create a model matrix from the given occurrence data.
 *
 * The method for deriving the rotation matrix from the euler angles is based on this publication:
 * http://www.soi.city.ac.uk/~sbbh653/publications/euler.pdf
 */
mat4 createModelTransformationFromOccurrenceData(vec4 occurrenceData[2]) {
  float cx = cos(occurrenceData[0].x);
  float sx = sin(occurrenceData[0].x);
  float cy = cos(occurrenceData[0].y);
  float sy = sin(occurrenceData[0].y);
  float cz = cos(occurrenceData[0].z);
  float sz = sin(occurrenceData[0].z);
 
  mat4 modelMatrix = mat4(1.0);
 
  float scale = occurrenceData[0][3];
 
  modelMatrix[0][0] = (cy * cz) * scale;
  modelMatrix[0][1] = (cy * sz) * scale;
  modelMatrix[0][2] = -sy * scale;
 
  modelMatrix[1][0] = (sx * sy * cz - cx * sz) * scale;
  modelMatrix[1][1] = (sx * sy * sz + cx * cz) * scale;
  modelMatrix[1][2] = (sx * cy) * scale;
 
  modelMatrix[2][0] = (cx * sy * cz + sx * sz) * scale;
  modelMatrix[2][1] = (cx * sy * sz - sx * cz) * scale;
  modelMatrix[2][2] = (cx * cy) * scale;
 
  modelMatrix[3].xyz = occurrenceData[1].xyz;
 
  return modelMatrix;
}
 
 
void main(void) {
  vec4 occurrenceData[2];
  getOccurrenceData(occurrenceData);
  mat4 modelMatrix = createModelTransformationFromOccurrenceData(occurrenceData);
  mat3 normalMatrix = mat3(modelMatrix);
 
  vec4 position = uMVMatrix * modelMatrix * vec4(aVertexPosition, 1.0);
  vPosition = position.xyz;
  vNormal = uNMatrix * normalMatrix * aVertexNormal;
  vTextureCoordinate = aTextureCoordinate;
 
  vAmbient = uColorAmbientFactor * aVertexColor.rgb;
  vDiffuse = uColorDiffuseFactor * aVertexColor.rgb;
  vSpecular = uSpecular;
  vOpacity = uApplyTranslucentAlphaToAll ? (min(uTranslucentPassAlpha, aVertexColor.a)) : aVertexColor.a;
 
  gl_Position = uPMatrix * position;
}

看起来他们将对象位置和旋转角度编码为4分量浮点纹理中的2个条目，添加属性以存储该纹理中每个顶点变换的位置，然后在顶点着色器中执行矩阵计算

所以问题是这个着色器实际上是我问题的有效解决方案，还是我应该更好地使用批处理或其他方法

PS：可能更好的方法是存储四元数而不是角度，并通过它直接变换顶点？

这可能会给你一些想法

如果理解Rem的评论

最简单的解决方案是存储某种逐顶点变换数据。这就是上面的视频所做的。该解决方案的问题是，如果模型具有100个顶点，则必须更新所有100个顶点的变换

解决方案是通过纹理间接变换。对于每个模型存储中的每个顶点，只需一个额外的浮点，我们可以将该浮点称为“modelId”，如中所示

因此，第一个模型中的所有顶点的id均为0，第二个模型中的所有顶点的id均为1，以此类推

然后将变换存储在纹理中。例如，可以存储平移（x，y，z）+四元数（x，y，z，w）。如果目标平台支持浮点纹理，则每变换2个RGBA像素

您可以使用modelId计算纹理中提取变换数据的位置

float col = mod(modelId, halfTextureWidth) * 2.;
float row = floor(modelId / halfTextureWidth);
float oneHPixel = 1. / textureWidth;
vec2 uv = vec2((col + 0.5) / textureWidth, (row + 0.5) / textureHeight);
vec4 translation = texture2D(transforms, uv);
vec4 rotationQuat = texture2D(transform, uv + vec2(oneHPixel, 0));

现在，您可以使用平移和旋转quat在顶点着色器中创建矩阵

为什么

halfTextureWidth

？因为我们每变换2个像素

为什么

+0.5

？看

这意味着您只需为每个模型更新1个变换，而不是为每个顶点更新1个变换，这使其工作量最小

。这是一个类似的想法，但因为它做粒子，它不需要纹理间接

注意：以上假设您只需要平移和旋转。如果您需要的话，没有什么可以阻止您在纹理中存储整个矩阵。或者其他类似于材质属性、照明属性等的内容

AFAIK几乎所有当前平台都支持从浮点纹理读取数据。您必须使用启用该功能

var ext = gl.getExtension("OES_texture_float");
if (!ext) {
   // no floating point textures for you!
}

但请注意，并非所有平台都支持过滤浮点纹理。此解决方案不需要筛选（需要单独启用）。确保将筛选设置为最近的总帐

我对此也很好奇，所以我用4种不同的绘图技术运行了两个测试

第一种是通过制服进行实例化，这在大多数教程和书籍中都可以找到。为每个模型设置制服，然后绘制模型

第二种方法是在每个顶点上存储一个附加属性，即矩阵变换，并在GPU上执行变换。在每次绘制时，先绘制gl.bufferSubData，然后在每次绘制中绘制尽可能多的模型

第三种方法是将多个矩阵变换统一上传到GPU，并在每个顶点上增加一个矩阵XID，以在GPU上选择正确的矩阵。这与第一个类似，只是它允许批量绘制模型。这也是它通常在骨架动画中实现的方式。在绘制时间，对于每个批次，将批次[index]处的模型中的矩阵上传到GPU中的矩阵数组[index]，并绘制批次

最后一种技术是通过纹理查找。我创建了一个大小为4096*256*4的Float32Array，其中包含每个模型的世界矩阵（足以容纳约256k个模型）。每个模型都有一个modelIndex属性，用于从纹理读取其矩阵。然后，在每个帧上，gl.texSubImage2D将绘制整个纹理，并在每个绘制调用中绘制尽可能多的纹理

不考虑硬件实例，因为我假设需要绘制许多独特的模型，即使在我的测试中，我只绘制每个帧具有不同世界矩阵的立方体

以下是结果：（以60帧/秒的速度可以绘制多少帧）

每个型号的不同制服：~2000

含matrixId的成批制服：~20000

每个顶点存储变换：~40000（在第一个实现中发现错误）

纹理查找：~160000

无需绘图，只需计算矩阵的CPU时间：~170000

我认为很明显，统一实例化不是可行的方法。技巧1之所以失败，是因为它进行了太多的抽签调用。批处理制服应该可以处理draw调用问题，但是我发现从正确的模型获取矩阵数据并将其上传到GPU上花费了太多的CPU时间。无数的uniformMatrix4f调用也没有帮助

与计算动态对象的新世界矩阵相比，执行gl.texSubImage2D所需的时间要少得多。在每个顶点上复制变换数据的效果比大多数人想象的要好，但这会浪费大量内存带宽。纹理查找应用程序

var ext = gl.getExtension("OES_texture_float");
if (!ext) {
   // no floating point textures for you!
}