# [MDX-C#]Vertives transformation (From Object to World space) slow

This topic is 4177 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello, I'm implementing a dynamic vertex buffering system for my engine. It seems that for batching purpose I will have to transform all the vertices from Object to World space (they will all use the same vertex buffer). Until there, everything is fine. I keep the original position of every scene object inside an array and do the modification on an other one :
// Copy the original value to the temp vertices array
_verts.CopyTo(_vertsTemp, 0);

// For every vertices in the array, do the transformation (multiplied by a matrix)
for (i = 0; i < _vertsTemp.Length; i++)
{
_vertsTemp[i].UpdatePosition(Vector3.Transform(_vertsTemp[i].Position3, _ObjectWorldMatrix));
}


My problem here, is that the Vector3.Transform(MyVertice, MyMatrix) fonction is really slow ! Running the application without anything to do on my computer is output 3000+ Frm/sec, If this fonction is called 1000+ times it make my framerate drop to 1000 and less ! (Nothing is done, only the recomputation of the vertices !) Is there a beter way to do these transformations ? Tx you !

##### Share on other sites
IDirect3DDevice9::ProcessVertices() might well be quicker than the D3DX based method (which should have a faster array-based overload).

But ultimately your big problem is that you're doing CPU-based transformation. Modern GPU's are absolute monsters when it comes to vector/matrix mathematics and will easily run circles around even the best CPU's. You really should try and structure your code/engine so as to take advantage of the GPU wherever possible.

Why are you having to transform everything? Give us a bit more of an explanation as to what you're trying to achieve and we might be able to suggest a more GPU-friendly method [smile]

hth
Jack

##### Share on other sites
First off, FPS is a very bad way to measure speed. Framerate vs Frametime covers that subject.

Now, 1000 means 1ms per frame. 3000 is 0.333ms per frame. That doesn't make Vector3.Transform slow.

Also, unmanaged DX has a TransformArray method, which takes an array of many vectors and transforms those, but has its own limitations. MDX probably has these somewhere, not sure where it is in MDX1.1.

Lastly, why do you want to do the transformation on the CPU? Of all solutions, this is likely to be the slowest. Unless you have many, many objects with very low poly counts, even a SetTransform + DrawPrim would likely work faster. If you really need good performance drawing many objects, look into instancing. CPU transformation seems like the worst possibility here.

Hope this helps.
[EDIT] Slowpoke.

[Edited by - sirob on August 11, 2006 11:48:43 AM]

##### Share on other sites
I can only agree with the above... But if you really want to do this on the CPU I'd suggest using

Vector3.TransformCoordinate(Vector3[] source, Matrix transform)
(see also DirectX Documentation for Managed Languages: Vector3)

This is still hell of a lot faster than doing vertex per vertex.

##### Share on other sites
Hello !

First, I'm not that far in my engine.
I try to make it as dynamic as possible, for the moment I try to put in place a game card in 3D (GameCard like Magic the gathering for who knows).

The game table with cards on it will be store inside static buffer (Card are not changing that often there).

The dynamic part will be use for the hands and the cards held by the player, a slow movement should be perceptble there, also cards moving, the action to put the card on the play table, ...

Basicaly my engine is doing now :

Static Buffering : Works very very nice, with sorting, batching, ...

Dynamic Buffering : The aim of the dynamic buffer is trying to batch as many cards together (With differents World matrix by card objects) to render them at the same time.
My question is quite simple how would you render let's say hundred of cards at the same time, but with world matrix different for all of them (and changing frequently : Move of the hand, ...) ? My solution is the fill in a dynamic vertex buffer but to be abe to render it with as less possible draw as possible, they must be in the view space (It's why I'm doing world -> View transform before I fetch the vertex buffer)

For the hardware instancing, is it working with ATI cards ? I though It was only for latest viceo card and Nvidia only ?

##### Share on other sites
An other things :

The array of points where I have to change/refresh the coordinate point every frame is in fact an array of VertexFormat (Where you have basicaly a vector3, and texture mapping u and v coord, ...)

And I keep a table of these that, and not only Vector3.
That makes the TransformCoordinate not possible for all vector3 point at the same times ...

##### Share on other sites
<quote>
For the hardware instancing, is it working with ATI cards ? I though It was only for latest viceo card and Nvidia only ?
</quote>

Well I think you'll have to look this up at ATI. But as far as I know, instancing is supported by NVIDIA since the 6800 model, which is pretty old.
I don't think that ATI is so far behind concerning that feature.
Alas, I'm not so well aquainted with instancing in application.

But: I'd say for your purpose it would just be fine to transform your dynamic data on the GPU/in your shader. Batching is nice, and that it works quite good for your static geometry is even better, but in my opinion the overhead for locking, transforming etc. is not worth it for your dynamic data/purpose.

##### Share on other sites
Hello !

So clearly for Dynamic processing I should go :

foreach DynamicObject{    Device.SetWorldMatrix = CurrentObject.WorldMatrix    Draw}=> No batching possible here[\source]Instead of :foreach DynamicObject{    UpdateCurrentObjectVector3 with CurrentObject.WorldMatrix}draw batched objects=> Batched draws[\source]

##### Share on other sites
I'd like to offer a couple suggestions for possible ways of doing this [smile]:

- You could always just forget about batching. Depending on how many actual cards you need to show, it's possible that just calling SetTransform + DrawPrim per each might work out well enough to be worth avoiding the hassle of coding something more difficult. This also has the benefit of being simple, which means it has less places to break on specific hardware/systems.

- Assuming the "cards" are just flat rectangles, theres always ID3DXSprite with the OBJECTSPACE flag. This would work quite fast, and be very simple to do. On the other hand, theres little to be learnt from using a pre-made interface, so if you're interested in this as a learning experience, this might not be the way to go.

- Using CPU processing is also a valid posibility for this. That would actually be quite similar to the way ID3DXSprite does things. Frankly, I feel this would be pretty difficult to get working exactly right. The transformation is bound to be a bit slow, though there are a couple ways you can speed it up*.

- Lastly, theres the option of using shaders. This would include instancing (either pure hardware, or shader constant based). If you're interested in either of these, have a look at the Instancing sample in the SDK Sample Browser, which features both methods.
Keep in mind, however, that both methods require rather newish cards (SM3 for pure hardware, ~SM1.4/2 for shader constant), and are quite complex, which means they might break under different setups, or in different cases.

What I'd recommend you do is use ID3DXSprite, if you're not interested in actually writing this yourself, or if you are, use an Array of Vector3s as a temporary buffer for the positions when you transform them.

Hope this helps.

* Namely, using D3DXVec3TransformArray would do quite a bit of good, but unfortunatly, the MDX equivalent does not have a stride parameter, and thus can only use an array of Vector3s, and not any custom struct. I can't find any substitute for that, which is a bit weird.

[Edited by - sirob on August 11, 2006 9:03:04 AM]

##### Share on other sites
Well, I don't have much to add, except that I (for the sake of gaining experience - and it's still quite simple) would tend to use shader-constant based instancing.
You can pm me if you like and I'll send you a small vertex/pixel shader that uses some phong-like lighting -and some c# sample code showing you how to use it- via email.

Martin

##### Share on other sites
Tx you very much for the suggestions, they here really nice !

- First, I don't wan't to use the shader method, because to be interesting it will required SM3 to be wide spread. It's not the case at all (mostly because of ATI cards)

- Second, the sprite way could fit my needs, but I try to keep the engine behind as flexible as possible (I have other littles projects with it after, that will required pure 3D dynamic process)

- So the "bad" cpu way is the only one present. I managed to use a Vector3 array to give to possibility to the function Vector3.TransformCoordinate to works on the array, it's really much more faster (4 times faster with a little amount of vertices to transforms). The only draw back is the my objects will use more memory (To balance de data from Vector3 Array <-> CustomVertexStruct). It's sad thats the Vector3.TransformCoordinate doesn't accept custom vertex format structure ...

Now the CPU way (Vertice positions transformed on the CPU) VS the "GPU" way (SetTransform on the device for each object).

Is it bad to think this :

I could "capture" when an object has moved (flagged), so it means that I only have to "refresh" his Vertice positions when the object is flagged has "has moved".
Let's say that 100 objects are moving at 10 moves/secondes. That's already a lot of movement. At 60fps it means that I will have to "recompute" the positions of the vertices 1000 times by seconds, the 5000 other times doesn't need position update. Added to this, I have now the possibility to batches.
Simple exemple : All the Cards background texture are the same, it means that I could (even if they are not at the same position in the world) draw the 100 cards in 1 draw !
Conclusion : Drawing 100 cards BackFace moving = 1000 times position refresh + 1 Draw call.

Now the "GPU" way : It means that no matter what, I will have to set X times the SetTransform on the device where X is the number of frame refreshed by seconds. In this case 6000 SetTransforms. And even if the texture is the same, I won't be able to batch them.
Conclusion : Drawing 100 cards backFace moving = 6000 SetTransform + 6000 draws

What's the fastest method ?

Tx you very much for your help !

##### Share on other sites
Ok, let's say that you don't want to use the shader constant based instancing - which works fine not only With SM3.0

<quote>(SM3 for pure hardware, ~SM1.4/2 for shader constant)</quote>.

I wouldn't care about how many calls you have per second, but per frame.
And that's 100 SetTransform(...)(maybe bad) and 100 Draw Calls(not so bad on "newer" hardware).
And then we're back at using the frametime for measuring the speed of your game.
Do you use the NVPerfHUD to measure the performance?
If not, then do it and look how it goes. You should also keep in mind, that
the debug build and DX runtime are slower than the release when doing so.
And - to remind you - see to it, that your code is working and optimize for speed later. As you could see, only small changes can speed up your code by (in the prior case) four times.
I have to leave for now, but I'll have a look at this thread for a while ;)

Good luck,

Martin

##### Share on other sites
I'm interested on the "shader constant based instancing", could you forward me a link where its discused ?

Tx you !

##### Share on other sites
Quote:
 Original post by SeeMeI'm interested on the "shader constant based instancing", could you forward me a link where its discused ?

Like I said, it's in a sample in the SDK called "Instancing". This shows 3 ways of doing the same thing, two of which are instancing techniques (True instancing, and shader constant instancing). You can search the SDK documentation for the page explaining the sample, and look at the sample code using the Sample Browser. I figure that's about all you'll need to get started.

Hope this helps.