Slow drawing textured quads with direct3D8

Started by
11 comments, last by LEET_developer 18 years, 4 months ago
Quote:Moving the SetTexture outside the loop is a good idea


Could you explain this to me?
Why? (I don't see why I should move it?)
Which loop? (There are two)
Where? (after inner, outer loop, before inner, outer loop)
setup quads -> realloc each time I add a new quadsort quads by texture -> so I don't have to set texture many timesfor each n vertices that fit in VB    copy n vertices in sys memory        lock VB        copy n vertices to VB    unlock VB        for each n vertices that have same texture        SetTexture        DrawIndexedPrimitive n vertices    endend


Duplicate SetTexture calls? There is actually only one SetTexture call.

PS:

About optimization:
I wrote a complete DDraw engine, and after it was done, It appeared to be too slow. [grin]
while (!asleep()) {    sheep++;}
Advertisement
Pah... I'm not reading things correctly today [headshake]

Ignore that bit about moving the SetTexture call - its wrong. Sorry!

Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

gerbenvv:

move the settexture call outside the inner loop, maybe you were already planning on moving it outside but your for loop says

"for n vertices that have the same texture"
and then on each of these iterations (the code inside the loo) you are setting texture, but you already said in the forloop they all have the same texture... no need to set it... just set it once outside, unless Im confused about what you wrong, then I appologize.

and sorry associative arrays are not part of the C++ language, I was just showing you what I meant by hash tables (depends where / when you learn about them they can be called a million different things).. but you'd have to implement it yourself or use STL, which I would recommend against in a game context, better off writing your own to be faster and optimized for your uses.


jollyjeffers:

thanks, and I'll have to disagree with your statements :P...

1. Sure a quadtree will make rendering that much faster... if a graphics card can push 1 million triangles per second, and you are pushing one 20th of the triangles each frame, you'll get 20 more frames per second (+- with the page flipping and what not being constant across the board, but you get the picture). My point was that too often people ignore constant level optimizations in algorithms saying that "well it was n^2 and now its n, so all is good" when if n is always less than your constant work factor, your n algorithm is actually slower... a good example would be if your terrain never had more than 30 vertices (really really crappy terrain)... you wouldnt bother doing all the math involved in frustum culling, since your card can render more than 600 frames of this terrain as it is without culling.

this also applied to your heiarchial culling example... if you are making a fighting game or something, or even one of those oldschool beatem ups that had at most 6 people somewhere in the room and maybe a weapon on the ground... you wouldnt bother heiarchial culling something like this, because it would actually be slower..

these are over simplified examples, usually what you want to do is design an algorithm, do case studies, best / average / worst case, and figure out how often you expect the worst case and how much slower it is than your average case, and how important it is for the best case to be as fast as possible, etc... on an application level its far too important to remember O(n) is not ALWAYS better than O(n^2) in the real world where we do not have infinite data sets.

compilers do a great job at optimizing your code. but there are specific things compilers cannot do.

Im pretty sure if you have a dynamic loop:

for (i < 0; i < variable; i++)

where variable is not garaunteed to be any specific value (maybe based on user input) the compiler has a hard time optimizing array entries inside based on that i index... as it should. I think compilers that are in use today will still generate better code if you set a temporary reference to the elements you are looking at, to avoid the dereferencing all the way through, that being said, I havent looked into that for a long time, so I could definitely be wrong :)

I should have specified, by assembly I mean writing optimizations in MMX/SSE, being able to divide your data set by 4 on some operations is very useful, and there arent many compilers that take advantage of these features... you just have to make sure you have a legacy code path for people running pentium 133s and lower :P but I doubt they'll have a card that supports direct X 9 anyways.

and I definately know that compilers won't turn all your divides into bit shifts or multiplications as necisary, and if you dont care about the small loss in precision, thats almost 8 cycles each you are saving :P heh

This topic is closed to new replies.

Advertisement