|
||||||||||||||||||
Add Forum to Favorites | Send Topic To a Friend | View Forum FAQ | Track this topic |
Last Thread Next Thread ![]() |
| [MDX] Hardware instancing, quick overview and sample projects inside |
|
![]() remigius Member since: 5/4/2005 From: Oirsbeek, Netherlands |
||||
|
|
||||
| Edited: fixed hardware instancing and updated some comments ____________________________________________________ ![]() What's this? After reading this thread by Pikebu I decided to look into instancing in Managed DirectX, using the C++ SDK sample (read this page!). So far I've recreated both Shader Instancing (technique #2) and Constants Instancing (technique #3, using the FFP) succesfully in C#. It works for arbitrary meshes loaded from an X file, provided they use the typical PositionNormalTextured vertex format. Pure hardware instancing (technique #1) is now working too, but you can still send me the GeForce 7800 :) Download the projects I've split up the code into two projects for clarity and both approaches (techniques #1 and #2) are now working as they should. The "Shader Instancing" project uses technique #2 and #3, will run on all SM2 hardware and it actually seems to be faster than true hardware instancing (#1) on ATI cards. If you want true hardware instancing and got the SM3 hardware (or compatible ATI card) to run it, you should try the "Hardware instancing" project. It's main advantage is that you can just use a mesh loaded from an X file out-of-the-box, without copying it into batch buffers as needed for technique #2. The downside is that the performance seems to be about 1/3rd of shader instancing, at least on a X850. - Shader Instancing Project (293KB zip) - Hardware Instancing Project (291KB zip) What does it do? The sample renders 64.000 meshes (tested with simple ones, like boxes, pyramids and cylinders) in a 40x40x40 grid. Technique #3, Constants Instancing, was implemented using the fixed function pipeline to serve as a test render approach, to check if the instance data was correct. Technique #2, Shader Instancing, is fully implemented to achieve the same result with a much better framerate. The framerate for technique #3 averages around 4.75 fps, while shader instancing renders the 64.000 meshes at an average of 60 fps on an ATI Radeon X850PE. The sample code provided allows you to specify the translation and the texture for each instance, but it can be easily modified to support more elaborate transformations and additional properties. The sample also provides a simple trick to allow for the use of different textures on various instances without batch sorting. The Hardware instancing project produces the same result, but by using the stream frequency setting attributed to SM3. Magically it actually works on (most?) ATI cards as well using Muhammad's hack below. It's performance averages at about 20 fps on the same ATI Radeon X850PE. Requested feedback Here are some things I'd like to get your feedback on. Please PM me anything you can tell me about these points, so this thread stays on the topic of instancing. Thanks in advance for any feedback you guys provide! 1. GraphicsStream IO I can't seem to get reading and writing with the GraphicsStream right. It would allow the sample to use meshes with any vertex formats, but unfortunately it doesn't work like it should. I've left the code in the "Shader Instancing" project for you to check out. 2. Setting an array parameter on an effect in MDX The original SDK sample uses the SetVectorArray function to pass a batch-sized part of the original instance data array to the effect. The Managed Effect class does not have this method, but it does have an array overload for SetValue and it has a SetArrayRange method, which presumably can be used to achieve the same result. Unfortunately documentation seems non-existent on this, so I had to resort to a CPU bound array copy for each batch. If someone can tell me how this is supposed to work in MDX, I think this would boost the framerate quite a lot. ... But I tried caching the batch sized arrays so the CPU array copy isn't needed every time. The improvement was marginal, about 3 fps, so I guess this shouldn't make a difference. [Edited by - remigius on December 28, 2005 11:07:58 AM] |
||||
|
||||
![]() Muhammad Haggag Moderator Member since: 9/22/2000 From: Redmond, WA, United States |
||||
|
|
||||
Great work .The X850 supports hardware instancing, but not SM3.0, so Direct3D thinks it doesn't. There's a driver hack to work around this mentioned in this post. Use the following to make the FOURCC code: static int MakeFourCC(int ch0, int ch1, int ch2, int ch3) { return ((int)(byte)(ch0)|((int)(byte)(ch1) << 8)| ((int)(byte)(ch2) << 16) | ((int)(byte)(ch3) << 24)); } |
||||
|
||||
![]() circlesoft Member since: 2/2/2003 From: Baltimore, MD, United States |
||||
|
|
||||
Quote: I have had problems with this, even in regular unmanaged D3D. The effect framework allows you to set an array of constants, but not a specific range of them (ie you supply a starting index - if anybody knows of this, I would love to know ). It also allows you to set an array of matrix pointers (very nifty for skinning), but again, not a specific range of them. When all of your data is stored in different arrays (since the constants being instanced are stored in the individual nodes), this is quite annoying. The way I do it is to just set each member individually. By that, I mean that if I want to set from index 4-8 in the constant array, I just retrieve the handle of the constant at index 4, set it, and repeat. Since ID3DXEffect employs some state change batching, it doesn't really cause a performance hit (other than the CPU overhead from the additional ID3DXEffect::SetValue() calls). Nice work on the sample. Instancing really isn't recognized as much as it should be, as it is crucial in realtime rendering, whether it be in hardware or in software. Dustin Franklin ( circlesoft :: KBase :: Mystic GD :: ApolloNL ) |
||||
|
||||
![]() remigius Member since: 5/4/2005 From: Oirsbeek, Netherlands |
||||
|
|
||||
| Thanks for the replies, good to see it's of some use :) As I edited into my post, I also fixed true hardware instancing (technique #1) now using the hack posted by Muhammad, so thanks for that too! The performance was a bit disappointing on my X850, but that seem consistent with what I've been reading about ATI & instancing on some hardware review boards (regarding CryTec mostly). If anyone cares to share some results on other cards (framerates & such), please go ahead and post them. The X850 is the only decent card I have to try my stuff on :) I should probably warn you though that the samples don't perform much checking whether or not something is supported. I just assumed a SM2 card for the Shader Instancing project and a SM3 card for Hardware Instancing, with the added ATI hack. So, I've got one question left. I'm a bit puzzled about technique #4 (Stream Instancing) in the original sample. It seems a bit redundant... I mean, if they'd combine an instance data buffer with the batch buffers from technique #2, you could use a bigger batch size and get better performance from this. But if I understand it correctly, they still draw one instance at a time, just with the instance data on another stream. That doesn't make sense to me, as technique #3 seems to be easier to implement and yield a better performance. So the only thing I can come up with is that technique #4 either is a legacy technique or included for educational purposes. Any thoughts? |
||||
|
||||
![]() circlesoft Member since: 2/2/2003 From: Baltimore, MD, United States |
||||
|
|
||||
Quote: Yea, when I tried the hack on my 9600xt, it only performed as well as the shader instancing. I would try your sample, except that my machine is still down at school and I don't have anything with a card in it now ![]() Quote: Yea, from the documentation they have printed, it seems a bit confusing and backwards. How does it compare to the others, performance-wise? Dustin Franklin ( circlesoft :: KBase :: Mystic GD :: ApolloNL ) |
||||
|
||||
![]() Muhammad Haggag Moderator Member since: 9/22/2000 From: Redmond, WA, United States |
||||
|
|
||||
Quote: ATI SM2.0 cards - 9500+. |
||||
|
||||
All times are ET (US)![]() |
Last Thread Next Thread ![]() |
|