Sign in to follow this  
speedie

a speed question

Recommended Posts

I'm curious as to if this will be slow?
int x = 0;
int foo[max];

for (int j=0; j<max; j++)
{
    if (foo[j].x > x) 
        x = foo[j].x;
}


would this be slow? when max could be upwards of around 3000+ a graphical question, assumeing 2D using quads(2 triangles)when using indexbuffers to batch. how much should I be batching? Should I only be batching say the top two used textures or should I batch everything? I was thinking if I batch everything, along with z ordering Alot of it really wouldn't be batched that much, or I would have alot of overlapping, and that would be a problem.

Share this post


Link to post
Share on other sites
As far as that code goes, there's nothing particularly slow about it. The compiler will optimize it into something extremely fast.

I can't really help with the other questions I'm afraid.

Share this post


Link to post
Share on other sites
Quite an appropriate name for such a question, speedie. ;)

Anyway, why don't you test the performance first? Optimizations can be done later on, optimizing too early on can make things far more complex than they need to be. Well, since you're asking for performance, sorting the list once and taking the last of first, depending on sorting order, is of course much faster, with the cost of a single sort at the start of course. But that depends on the situation: what are you using this list for, what are the requirements?

I can't say much on the batching, I've never done it in my OpenGL programs so far, but I'd test the performance with a representative level first before optimizing. That way you can spot what you need to optimize rather than doing it because you've heard it's a good thing. Someone else may give you some better pointers on this though. :)

Share this post


Link to post
Share on other sites
Re: batching.

I personally think that, while premature optimisation is normally a waste of time, you should still batch as much as you possibly can due to the vast range of different graphics hardware set ups you will want your program to run under.

What seems like a pointless and premature optimisation on your set up might in fact make a significant speed difference on a different machine with different graphics hardware.

I'm not sure the usually perfectly sensible argument about "Optimise only when you have identified a bottleneck" applies so well to the area of hardware accelerated graphics.

I'm prepared to stand corrected though if I'm talking rubbish.

Share this post


Link to post
Share on other sites
I wouldn't say it's premature, Incorparating indexbuffers, is my optimization. I just realy confused as to how I would sort everything while not having images overlap previously drawn images. I seems I would have to incorporate a z order, but I would think after sorting everything by the z axis alot of it per z layer wouldn't share the same texture.

Share this post


Link to post
Share on other sites
The runtime for your loop is O( n ) I believe.
Programs that require a lot of processing power will not worry too much about wether the loop iterates 10 times or 3000 times. The "big O" notation is still O( n ).

If you want to be considerably faster you will have to come up with faster data structures and algorithms.
As an example, binary search trees will give you O( log(n) ). (Correct me if Im wrong again)

If you are talking about optimizing the loop, I have heard that pointer arithmetics is faster than array indexing.
Example:

int foo[max];
int *px=foo, x = 0;

for (int j=0; j<max; ++j, ++px) // ++j is faster than j++ if were gona be picky
{
if (px->x > x)
x = px->x;
}




Wether this is a bottleneck in your program is another question. If not, optimizing like this is not worth it.

Share this post


Link to post
Share on other sites
You can't access an int array with px->x... Your compiler would throw an error saying that px is not a struct, class, etc... However you may have been inclined to write this because the OP used foo[j].x which is also incorrect

You would need to access like
*(px + x) to get the value at foo[x]

Anyway, why would pointer arithmetic be any faster that array index notation. I would think the compiler would generate exactly the same code for both. In addition you are also incrementing two variables. This may prove insignificant to run time however, although the compiler might find a way to optimise out the second variable.

Also why would ++j be faster than j++. It is true for iterators that the prefix notation is faster but I don't think that it would faster for plain integer types.

If you were to write a more efficient way, you could use a pointer for the loop counter.


int x=0;
int foo[max];

for (int* px=foo; px<&foo[max]; px++)
{
if (*px > x)
x=*px;
}

Share this post


Link to post
Share on other sites
About batching, it depends on the number of texture switches you are going to have. However, all else equal, does batching cost you anything? I mean, even if you draw 2 Sprites in a batch, switch states, then draw 2 more, you'll probably be slightly faster as you'll be saving yourself an extra draw primitive call.

Since you'll be sorting things by Z, your render breaks will likely be dependent on texture swaps. This will give you further incentive to use a sprite sheet rather than individual textures for each object.

Share this post


Link to post
Share on other sites
Absolutely Batch. Changing state is the most expensive graphical operation you can perform. Unless you know that a large majority (say 75% or more) of the batches will only contain a single instance, then its probably a win. This would most definately be a corner-case though, and there's no way to compute (at run time) whether this is the case for this frame without doing much of the work required to batch in the first place. BAtching should be a win in all but the most essoteric cases.

Share this post


Link to post
Share on other sites
Understand that if 'max' is very large, you may run out of stack memory. Instead, if you anticipate that you will need to allocate a large number of int's, do it dynamically.

Share this post


Link to post
Share on other sites
Perhaps you'd be happier using std::max_element:

#include <algorithm>

int x = 0;
int foo[max];

x = *std::max_element(&foo[0], &foo[max]);

The STL is about as well-optimised as you can hope for. Obviously, this problem is bounded below by O(n), and your implementation is identical to STL's, but it's a good idea to get into the habit of using STL for boilerplate code.
Furthermore, unless you need to pass this array around as a raw buffer, you'd be better-off using a std::vector.

Regards
Admiral

Share this post


Link to post
Share on other sites
Quote:
Original post by Adam Hamilton
If you were to write a more efficient way, you could use a pointer for the loop counter.


int x=0;
int foo[max];

for (int* px=foo; px<&foo[max]; px++)
{
if (*px > x)
x=*px;
}


Compilers typically convert loop-and-iterate into that form anyway.


xor eax, eax
mov ecx, max
mov esi, foo // &foo[0]

L1:
mov edx, [esi]
cmp eax, edx
cmovle eax, edx // compiler MIGHT put more compatible operations here.

add esi, 4
dec ecx
jnz L1

Share this post


Link to post
Share on other sites
Quote:
Original post by TheAdmiral
Perhaps you'd be happier using std::max_element:

#include <algorithm>

int x = 0;
int foo[max];

x = *std::max_element(&foo[0], &foo[max]);

The STL is about as well-optimised as you can hope for. Obviously, this problem is bounded below by O(n), and your implementation is identical to STL's, but it's a good idea to get into the habit of using STL for boilerplate code.
Furthermore, unless you need to pass this array around as a raw buffer, you'd be better-off using a std::vector.

Regards
Admiral


Correct answer on almost all counts, and certainly on the counts that really matter here.

(The nits: It's not "the STL" but the standard C++ library - i.e. the standard library of the language you are using; there is no reason not to use it - and the implementation is technically compiler-specific, but practically will look similar or identical on every compiler, because there really isn't another way to do it that's really any different. Although I *think* a conforming implementation can just have a pre-optimized, machine code version of it sitting around somewhere, and plug it in if you include the right header and try to use the function, with no actual C++ implementation code existing. The *header file* <algorithm> has to exist, though, and provide the *declaration* of std::max_element.)

Also, purely as a style issue, I would write it slightly differently. Not this -


int x = 0;
int foo[max];
// populate 'foo' here
x = *std::max_element(&foo[0], &foo[max]);


But instead this:


int foo[max];
// populate 'foo' here
int x = *std::max_element(foo, foo + max);


You can all go home now.

Share this post


Link to post
Share on other sites


That code snippet only finds a max. If its related to your Z sorting then
the code for that is very different (a sort) and there are more options to select depending on the situation.

A simple linear sort is 1/2 N^2 ( O(n^2) ) and other sorting methods
can be more efficient (bubble sort, trees etc...)

For batching there are some index limits (was 65k) and they used to recommend batches of something like 1k (DirectX - when you had time while the GPU was busy
to preprocess the next batch) but its probably higher now (and different for other Libs)

As others have mentioned, texture swapping is the most significant factor and unless you can get enough sequential triangles to share the same texture your rendering will be slow.

I assume you are not using Z buffering because of alpha transparency....


There are higher level optimizations like ordering render of objects (associated sets of triangles) that position and relation to other objects are understood.

Share this post


Link to post
Share on other sites
Zahlman, you are an endless fountain of knowledge [rolleyes].

Quote:
Original post by Zahlman
...I would write it slightly differently.
As would I, actually (at least on the initialisation of x). I put it down to copy-paste artifacting [looksaround].

Quote:
Original post by Zahlman
It's not "the STL" but the standard C++ library - i.e. the standard library of the language you are using...
I'm not sure I totally understand. Is STL something different?

Regards
Admiral

Share this post


Link to post
Share on other sites
Quote:
Original post by TheAdmiral
Quote:
Original post by Zahlman
It's not "the STL" but the standard C++ library - i.e. the standard library of the language you are using...
I'm not sure I totally understand. Is STL something different?
The differences are discussed in the Wikipedia entries on the C++ standard library and the STL (I'm assuming the information presented therein is accurate).

I think it could be said that the C++ standard library includes the STL (or it may be that use of the term STL is now discouraged altogether since those components have been absorbed by the standard library, but I'm not sure about this).

Share this post


Link to post
Share on other sites
Quote:
Original post by jyk
I think it could be said that the C++ standard library includes the STL (or it may be that use of the term STL is now discouraged altogether since those components have been absorbed by the standard library, but I'm not sure about this).


Almost entirely. Some STL widgets didn't make the cut for the SC++L (e.g. 'iota' or 'copy_n'), but are sometimes offered as extensions. Also, the STL only provided containers, iterators and algorithms; the full standard library includes lots of other stuff, such as all the C backwards compatibility stuff (not all of which is only there for backwards compatibility) and all the *stream headers. (Actually, I can't think of anything offhand that doesn't fit in one of those three categories...)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this