Instanced Model示例
这个示例展示如何有效率地绘制相同模型的许多副本,使用GPU instancing技术减少重复绘制调用的开销。
游戏经常需要绘制相同模型的多个副本,例如在场景中放置树木,在房间中放置箱子。 绘制一个模型的调用相对来说比较耗费资源,几百个模型的绘制更加耗费资源。这个示例展示了一些技术,通过使用这些技术你可以减少绘制相同模型副本的开销。
注意:没有一个最好的instancing技术。在Windows平台上实现的方法与Xbox 360并不相同。在Windows平台上,理想的技术需要shader 3.0,但也可以有一个变通的方法可以用在shader 2.0上。
Vertex Shader Model 2.0
Pixel Shader Model 1.1
动作 | 键盘控制 | 手柄控制 |
改变techniques | A | A |
增加instances | X | X |
减少instances | Y | Y |
退出 | ESC或ALT+F4 | BACK |
- No instancing或state batching: 这与在一个循环中调用许多次ModelMesh.Draw的工作方式是相同的。
- No instancing: 不使用任何特定的GPU技巧,但在重复设置设备状态方面表现地更聪明。
- 硬件instancing:Windows shader 3.0技术。
- Shader instancing: Windows shader 2.0技术。
- VFetch instancing: Xbox 360技术。
No Instancing或State Batching
foreach (Matrix instance in instances) { SetVertexBuffer(); SetIndexBuffer(); SetVertexDeclaration(); SetWorldTransform(instance); effect.Begin(); foreach (EffectPass pass in effect.CurrentTechnique.Passes) { pass.Begin(); DrawIndexedPrimitives(); pass.End(); } effect.End(); }
No Instancing
SetVertexBuffer(); SetIndexBuffer(); SetVertexDeclaration(); effect.Begin(); foreach (EffectPass pass in effect.CurrentTechnique.Passes) { pass.Begin(); foreach (Matrix instance in instances) { SetWorldTransform(instance); effect.CommitChanges(); DrawIndexedPrimitives(); } pass.End(); } effect.End();
这个技术完全在GPU上处理instancing。无论绘制多少instance,CPU的负担都非常低。它只能工作在Windows平台,并且需要一块shader 3.0的显卡。
VertexStreamCollection vertices = graphicsDevice.Vertices; vertices[0].SetSource(geometryVertexBuffer, 0, geometryVertexStride); vertices[0].SetFrequencyOfIndexData(numberOfInstances); vertices[1].SetSource(instanceTransformVertexBuffer, 0, SizeOfMatrix); vertices[1].SetFrequencyOfInstanceData(1);
最后,我们必须将每个instance的变换矩阵作为顶点着色器的一个输入参数。可在InstancedModel.fx 文件的HardwareInstancingVertexShader方法中看到具体代码。
第一个instance使用的数据用蓝色表示,第二个instance为绿色。灰色代表被两个instances共享的数据。注意观察三角形0和2、1和3是如何共享相同的索引的, 它们从顶点缓冲流0中引用相同的数据,还有注意每个instances是如何从顶点缓冲流1中获取不同的变换坐标。
Shader Instancing
但如果你有一块支持shader 3.0的显卡,使用硬件instancing效果很好。但在shader 2.0的硬件上如何实现? shader instancing技术提供了与硬件instancing近似的性能,但可以工作在shader 2.0硬件上。唯一的缺点是需要更多的内存,因为它需要复制顶点和索引数据的副本。
InstancedModel构造函数使用IsTechniqueSupported方法自动检测是否运行在shader 3.0显卡上。默认使用硬件instancing,如有必要则使用shader instancing。 shader instancing的基本思路是只制作几何数据的一些副本,然后将它们一次存储到顶点和索引缓冲中。
This makes it possible to draw many copies of the model in a single call, simply by specifying a larger number of triangles when you call DrawIndexedPrimitives. The hard part is this: how can the shader tell which instance it is currently drawing? To determine this, you must add an additional channel of data to your vertex buffer, which specifies the instance index. This is set to 0 for all the vertices of the first copy of the model, 1 for the second copy, and so on. The shader can use this index to choose which instance transform matrix should apply to each vertex. It looks up the instance transforms from a matrix array that is uploaded to the shader constant registers using an effect parameter.
Here is a diagram of shader instancing in action:
Data used for the first instance is shown in blue, and the second instance is green. Note how the triangle assembly, index buffer, and vertex buffer lookups are functioning exactly as they would while drawing a single non instanced model. The vertex buffer just happens to contain two copies of the same position, normal, and texture coordinate data, which are differentiated only by the instance index value. This is used to select the appropriate instance transform matrix for each copy of the model.
The repeated copies of the geometry data are created on demand by the ReplicateVertexData and ReplicateIndexData methods of the InstancedModelPart class. InitializeShaderInstancing also modifies the vertex declaration to include the additional instance index data channel. As an alternate, it would have been possible to pregenerate this data inside the content processor. However, doing it that way would bloat our XNB files with repeated copies of the same data. Also, using the preferred method means we won't generate the repeated information if we are running on a shader 3.0 card that can use hardware instancing.
There is a limit on how many shader instances can be drawn in a single batch. This comes partly from the limited number of shader constant registers available to hold the instance transform matrices (see the comment and MAX_SHADER_MATRICES constant at the top of InstanceModel.fx) and partly from the limited range of 16-bit index values. If we repeated the model data too many times, our 16-bit indices would overflow. We do not want to use 32-bit indices because they are not universally supported on all graphics cards. The InstancedModelPart class stores the result of combining these two batch size limits in the maxInstances field. If asked to draw more copies than this limit, the DrawShaderInstancing method splits up the request, drawing as many instances as possible in each call to DrawIndexPrimitives.
VFetch Instancing
Xbox 360不支持硬件instancing。虽然在技术上可以使用shader instancing,但是无需如此,因为Xbox 360提供了更好的选择。
Index dereferencing is normally handled automatically inside the GPU. Your vertex shader never gets direct access to the index value. Instead, it is passed the result of the GPU dereferencing whatever is stored at that index in the vertex buffer. Xbox 360 provides an alternative to this behavior. Using the INDEX HLSL semantic, you can request that the raw index value be passed directly into your vertex shader:
VertexShaderOutput MyVertexShader(int index : INDEX) { ....
You can then use the vfetch shader instruction to manually look up whatever data lives at that index in the vertex buffer, for example:
float4 position; float4 normal; float4 textureCoordinate; asm { vfetch position, index, position0 vfetch normal, index, normal0 vfetch textureCoordinate, index, texcoord0 };
But here's the trick: there is no requirement that the index you pass into the vfetch instruction be the same value that was passed in to your vertex shader! You can use arbitrary math instructions to compute whatever vertex buffer index you like. You could even do one vfetch instruction to look up a value from one part of the vertex buffer, and then use that value as an index to look up in a different part of the buffer.
To render instanced data, we extend our index buffer with repeated copies of the model data, in the same way as when using the shader instancing technique on Windows. But thanks to the vfetch instruction, there is no need to also replicate multiple copies of the vertex data, or to add the extra vertex channel for holding instance indices. Instead, we perform modulus and division computations at the top of our vertex shader:
VertexShaderOutput MyVertexShader(int index : INDEX) { int vertexIndex = (index + 0.5) % VertexCount; int instanceIndex = (index + 0.5) / VertexCount;
This diagram shows the resulting data flow:
Thanks to the modulus operation, both the original index value of 0 and the replicated index value of 4 end up referencing the first entry in our vertex buffer. Both copies of the model can use the same vertex buffer data. But thanks to the division operation, we are also able to determine that an index value of 0 refers to the first instance, while 4 refers to the second, so each instance can choose the appropriate transform matrix from the shader constant registers.
Although VFetch instancing does require extra copies of the index data, the memory overhead is much smaller than for shader instancing on Windows, because it does not also require extra copies of the (much bigger) vertex data.
技术 | 平台 | CPU负担 | 内存消耗 | 每次绘制调用最大instances数量 | 如何指定instance 的位置 | 是否复制索引数据 | 是否复制顶点数据 |
No instancing 或state batching | 任意 | 差 | 非常好 | 1 | Effect参数 | No | No |
No instancing | 任意 | 较好 | 非常好 | 1 | Effect参数 | No | No |
Hardware instancing | Windows, shader 3.0 | 非常好 | 非常好 | 无限 | 第二个顶点数据流 | No | No |
Shader instancing | Windows, shader 2.0 | 非常好 | 差 | ~60 (取决于shader) | Effect参数数组 | Yes | Yes |
VFetch instancing | Xbox 360 | 非常好 | 轻微 | overhead ~60 (取决于shader) | Effect参数数组 | Yes | No |
这个示例对每个instances只定义了一个4×4矩阵,所以虽然每个instance位置可以不同,但看起来是完全一样的。你可以添加一个额外的针对每个instance的参数,当使用VFetch 或shader instancing时将这个参数作为shader常量数组,使用 hardware instancing时将这个参数作为第二个顶点数据流的额外数据通道。你可以使用这个额外参数给每个instance添加不同的颜色,或替换某些颜色等操作让它们的外观有所不同。
本示例需要shader 2.0,但是如果你减少位于InstancedModel.fx 文件顶部和InstancedModelPart.cs文件中的MAX_SHADER_MATRICES 和MaxShaderMatrices常量,shader instancing技术实际上在shader 1.1中也能工作。因为shader 1.1只支持96个常量寄存器,而shader 2.0可以支持256个,所以shader 1.1版本的技术无法绘制很多instances。
文件下载(已下载 1700 次)发布时间:2010/6/9 下午1:29:44 阅读次数:7904