Jump to content
  • Advertisement
Sign in to follow this  
Ectara

Global variable vs passed pointer

This topic is 3031 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Why is it faster to use a pointer to an object rather than accessing a global object? If possible, prove using ASM. I'm looking to this for optimization.

Share this post


Link to post
Share on other sites
Advertisement
It isn't, necessarily. It depends.

It's not worth worrying about until a profiler has indicated that a particular area of the code where you are doing one or the other is a bottleneck.

Furthermore, neither C nor C++ map consistently to machine code for the underlying CPU, so looking at the C or C++ code for an issue this trivial will rarely tell you anything. You could speculate, but speculation isn't that helpful.

Cache locality will probably end up having a lot to do with it.

Share this post


Link to post
Share on other sites
You seem to have a real predilection about these types of micro-optimizations, which is an unhealthy habit for a programmer, probably born out of misinformation or outdated study resources.

This isn't 1992 -- Compilers are a lot smarter now, and we don't usually have to worry about such small issues with scalar code, at least not until we have proof that there's a problem. We don't have to multiply by powers of two using shifts. We don't have to put 'inline' on everything...

The programmer should focus on describing the solution at an appropriate level of abstraction and let the compiler figure out the fine details. Then, if you have proof that the compiler is doing a poor job, you go back and restate the solution more explicitly in order to hint the compiler in the right direction. If that fails, then its time to think about assembly, if you *know* you can do better than the compiler, which isn't likely for scalar code, and will be even more rare once the new C++ langauge features are widely addopted.

I think in general, you're not going to beat the compiler's optimizer at anything it knows how to do, and the only chance a non-superstar ASM programmer has to beat the compiler is when the compiler is either clueless, or when its hands are tied because the language doesn't allow it to make necessary assumptions required for the optimization.

Without knowing when the compiler is going to do a less than stellar job, you're just chasing ghosts.

Share this post


Link to post
Share on other sites
You all make valid points. I have a habit of micro-optimizing everything, from coding back in the day. I tend not to rely on the compiler to do what it should. I guess I shouldn't worry about all of these small things. I guess the fact that I test builds with optimization off to speed up compilation time is also part of the reason I optimize. After the trial and error to make sure it works, I go back with optimizations on to make sure the compiler didn't do anything silly like optimize out a vital variable.

Oh well. While I'm here, since this thread is short-lived anyway, does anyone have a method of keeping the compiler from optimizing out variables without using the volatile keyword or performing a no op instruction with the dereferenced value of the address of it? I'm pretty sure volatile isn't C89.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara
I guess the fact that I test builds with optimization off to speed up compilation time is also part of the reason I optimize.

And you are also optimizing the wrong thing. It is not the optimization that makes C++ builds slow.

In order of importance:
#1 includes and macro substitutions. These will account for upwards of 80% of the time
#2 linking time. For lots of code, linked becomes the bottleneck, especially if it needs to bring together many libs
...
#78 Actual code optimizations

#1 and #2 can be solved with a custom crafted build which includes ccache, precompiled headers, perhaps a PIMPL or two, and using forward declarations and compiling either on RAM drive or SSD. Using those it's possible to cut build times down to by several orders of magnitude down to a single-digit second times, even for reasonably sized projects.

Quote:
After the trial and error to make sure it works,
Or, just write unit tests.

Quote:
I go back with optimizations on to make sure the compiler didn't do anything silly like optimize out a vital variable.
If something like this happens, then you are either doing something really evil (such as addressing member variables by hard-coded offsets) or your compiler is broken (if it's gcc, MSVC, intel, then it's not).

Quote:
does anyone have a method of keeping the compiler from optimizing out variables without using the volatile keyword or performing a no op instruction with the dereferenced value of the address of it? I'm pretty sure volatile isn't C89.

Um.... why do you care if a variable is optimized out or not. Compiler will statically check all variable accesses, and remove those that aren't needed, it will inline others, and rearrange everything else.

However, a compiler is forbidden from changing class layouts, so class member variables will never be removed, compilers are specifically forbidden from doing so.

I'm having hard time coming up with a case where compiler optimization could break code in this way.

Share this post


Link to post
Share on other sites
I use gcc and C89 only, and I do find that it optimizes out a variable that I'm using, causing a segfault when I try to use the variable. Such as in one of my IO functions when it swapped bytes reading and writing from a different byte order. My previous method had a pointer that was moved through the block of bytes passed by reference in a manner that flipped the bytes of variable width boundaries, swapping bytes of variables, rather than the whole block.

Tried to run it, and it optimized that intermediate pointer out, and segfaulted when it tried to use it. It might have looked unused since I was just incrementing and decrementing it, and passing it to intrinsic functions, and manipulating the data elsewhere. So right now, for the functions I didn't find an alternate method for, I declared a couple variables volatile so gcc doesn't optimize it out.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara
I use gcc and C89 only, and I do find that it optimizes out a variable that I'm using, causing a segfault when I try to use the variable. Such as in one of my IO functions when it swapped bytes reading and writing from a different byte order.


Most of the endian swap routines floating around the internet use undefined behavior, which means your code is working fine - it segfaults, which is undefined.

Quote:
My previous method had a pointer that was moved through the block of bytes passed by reference in a manner that flipped the bytes of variable width boundaries, swapping bytes of variables, rather than the whole block.

Endian swap of this type is known to be problematic due to hardware, which can cause problems if performed on unaligned addresses. It's not necessarily a compiler thing, it can be a CPU problem. If you're working on anything else than x86, then it should almost certainly segfault, x86 is extremely tolerant of alignment issues.

As said, GCC is very unlikely to be buggy with this type of operations, but it cannot go beyond certain point when it comes to intent.

I seem to recall that this works for general endian swap:
template <class T >
void endian_swap(T * source) {
char buffer[sizeof(T)];
memcpy(buffer, source, sizeof(T));
std::reverse(buffer, buffer+sizeof(T));
memcpy(source, buffer, sizeof(T));
}

memcpy should avoid alignment issues, although I'm not sure it's guaranteed, platform documentation should describe what and how. IIRC there's something that has to do with unions that can make the above simpler, but I've forgotten most of the details.

The above might seem like an overkill, but it allows dealing with variables directly in a buffer.

Contrary to popularity, the following is undefined:
char * buffer;
reverse(*(int*)(buffer+offset));


And that's not even half the story, then there's alignment issues, etc....

Share this post


Link to post
Share on other sites
The ones I wrote are a bit more... sophisticated than that, and I don't use intrinsic or standard functions for it. The algorithm itself isn't the problem; the only way for it to fail would be to pass an invalid pointer, or to yank the RAM module out of the machine while it is running. The issue was, and I used GDB to check, the variables I defined were optimized out, and GDB reported "<variable optimized out>" when trying to read their values, yet the code still tried to access it, as it was of obvious importance.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara
The ones I wrote are a bit more... sophisticated than that
Do they rely on undefined behavrio?

Write the simplest, dumbest, standard version first, using nothing but built-in routines. Only then worry about "sophistication".

Quote:
The algorithm itself isn't the problem;
Then post it.

Either here, or on GCC bug tracker.

Share this post


Link to post
Share on other sites
The typedefs are simple. si32 == signed integer with 32 bits, uisys == unsigned integer of system dependent length.



#define _E_convertSwapByte32( i ) ((((i)&0xFF000000)>>24)|(((i)&0x00FF0000)>>8)|(((i)&0x0000FF00)<<8)|(((i)&0x000000FF)<<24))

#define _E_convertSwapByte16( i ) (((i)>>8)|((i)<<8))

typedef struct {
FILE * fp;
si32 endian;
sisys pos;
sisys length;
}E_io;

#define E_LSB 0
#define E_MSB 1

uisys E_ioWriteb( E_io *io, const void * ptr, ui32 size, ui32 n){
if(!io||!ptr)return 0;
uisys numWritten=0;
if(size == 1)numWritten = fwrite(ptr,sizeof(si8),n,io->fp);
else{
si8 * p = (si8*)ptr;
if(!io->endian){
si8 buffer[size];
while(n--){
ui32 byteIndex=size;
while(byteIndex--)buffer[byteIndex] = *(p++);
numWritten+=fwrite(buffer,size,1,io->fp);
}
}else numWritten+=fwrite(p,size,n,io->fp);
}
/* These lines below update the file position and the length of the file, so the structure stays updated */
io->pos = ftell(io->fp);
if(fseek(io->fp,0,SEEK_END))return 0;
io->length=ftell(io->fp);
if(fseek(io->fp,io->pos,SEEK_SET))return 0;
return numWritten;
}



Either way, other than scrutinizing my coding style, does anyone have a solution to the described issue?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!