Sign in to follow this  
Ectara

Global variable vs passed pointer

Recommended Posts

Why is it faster to use a pointer to an object rather than accessing a global object? If possible, prove using ASM. I'm looking to this for optimization.

Share this post


Link to post
Share on other sites
It isn't, necessarily. It depends.

It's not worth worrying about until a profiler has indicated that a particular area of the code where you are doing one or the other is a bottleneck.

Furthermore, neither C nor C++ map consistently to machine code for the underlying CPU, so looking at the C or C++ code for an issue this trivial will rarely tell you anything. You could speculate, but speculation isn't that helpful.

Cache locality will probably end up having a lot to do with it.

Share this post


Link to post
Share on other sites
You seem to have a real predilection about these types of micro-optimizations, which is an unhealthy habit for a programmer, probably born out of misinformation or outdated study resources.

This isn't 1992 -- Compilers are a lot smarter now, and we don't usually have to worry about such small issues with scalar code, at least not until we have proof that there's a problem. We don't have to multiply by powers of two using shifts. We don't have to put 'inline' on everything...

The programmer should focus on describing the solution at an appropriate level of abstraction and let the compiler figure out the fine details. Then, if you have proof that the compiler is doing a poor job, you go back and restate the solution more explicitly in order to hint the compiler in the right direction. If that fails, then its time to think about assembly, if you *know* you can do better than the compiler, which isn't likely for scalar code, and will be even more rare once the new C++ langauge features are widely addopted.

I think in general, you're not going to beat the compiler's optimizer at anything it knows how to do, and the only chance a non-superstar ASM programmer has to beat the compiler is when the compiler is either clueless, or when its hands are tied because the language doesn't allow it to make necessary assumptions required for the optimization.

Without knowing when the compiler is going to do a less than stellar job, you're just chasing ghosts.

Share this post


Link to post
Share on other sites
You all make valid points. I have a habit of micro-optimizing everything, from coding back in the day. I tend not to rely on the compiler to do what it should. I guess I shouldn't worry about all of these small things. I guess the fact that I test builds with optimization off to speed up compilation time is also part of the reason I optimize. After the trial and error to make sure it works, I go back with optimizations on to make sure the compiler didn't do anything silly like optimize out a vital variable.

Oh well. While I'm here, since this thread is short-lived anyway, does anyone have a method of keeping the compiler from optimizing out variables without using the volatile keyword or performing a no op instruction with the dereferenced value of the address of it? I'm pretty sure volatile isn't C89.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara
I guess the fact that I test builds with optimization off to speed up compilation time is also part of the reason I optimize.

And you are also optimizing the wrong thing. It is not the optimization that makes C++ builds slow.

In order of importance:
#1 includes and macro substitutions. These will account for upwards of 80% of the time
#2 linking time. For lots of code, linked becomes the bottleneck, especially if it needs to bring together many libs
...
#78 Actual code optimizations

#1 and #2 can be solved with a custom crafted build which includes ccache, precompiled headers, perhaps a PIMPL or two, and using forward declarations and compiling either on RAM drive or SSD. Using those it's possible to cut build times down to by several orders of magnitude down to a single-digit second times, even for reasonably sized projects.

Quote:
After the trial and error to make sure it works,
Or, just write unit tests.

Quote:
I go back with optimizations on to make sure the compiler didn't do anything silly like optimize out a vital variable.
If something like this happens, then you are either doing something really evil (such as addressing member variables by hard-coded offsets) or your compiler is broken (if it's gcc, MSVC, intel, then it's not).

Quote:
does anyone have a method of keeping the compiler from optimizing out variables without using the volatile keyword or performing a no op instruction with the dereferenced value of the address of it? I'm pretty sure volatile isn't C89.

Um.... why do you care if a variable is optimized out or not. Compiler will statically check all variable accesses, and remove those that aren't needed, it will inline others, and rearrange everything else.

However, a compiler is forbidden from changing class layouts, so class member variables will never be removed, compilers are specifically forbidden from doing so.

I'm having hard time coming up with a case where compiler optimization could break code in this way.

Share this post


Link to post
Share on other sites
I use gcc and C89 only, and I do find that it optimizes out a variable that I'm using, causing a segfault when I try to use the variable. Such as in one of my IO functions when it swapped bytes reading and writing from a different byte order. My previous method had a pointer that was moved through the block of bytes passed by reference in a manner that flipped the bytes of variable width boundaries, swapping bytes of variables, rather than the whole block.

Tried to run it, and it optimized that intermediate pointer out, and segfaulted when it tried to use it. It might have looked unused since I was just incrementing and decrementing it, and passing it to intrinsic functions, and manipulating the data elsewhere. So right now, for the functions I didn't find an alternate method for, I declared a couple variables volatile so gcc doesn't optimize it out.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara
I use gcc and C89 only, and I do find that it optimizes out a variable that I'm using, causing a segfault when I try to use the variable. Such as in one of my IO functions when it swapped bytes reading and writing from a different byte order.


Most of the endian swap routines floating around the internet use undefined behavior, which means your code is working fine - it segfaults, which is undefined.

Quote:
My previous method had a pointer that was moved through the block of bytes passed by reference in a manner that flipped the bytes of variable width boundaries, swapping bytes of variables, rather than the whole block.

Endian swap of this type is known to be problematic due to hardware, which can cause problems if performed on unaligned addresses. It's not necessarily a compiler thing, it can be a CPU problem. If you're working on anything else than x86, then it should almost certainly segfault, x86 is extremely tolerant of alignment issues.

As said, GCC is very unlikely to be buggy with this type of operations, but it cannot go beyond certain point when it comes to intent.

I seem to recall that this works for general endian swap:
template <class T >
void endian_swap(T * source) {
char buffer[sizeof(T)];
memcpy(buffer, source, sizeof(T));
std::reverse(buffer, buffer+sizeof(T));
memcpy(source, buffer, sizeof(T));
}

memcpy should avoid alignment issues, although I'm not sure it's guaranteed, platform documentation should describe what and how. IIRC there's something that has to do with unions that can make the above simpler, but I've forgotten most of the details.

The above might seem like an overkill, but it allows dealing with variables directly in a buffer.

Contrary to popularity, the following is undefined:
char * buffer;
reverse(*(int*)(buffer+offset));


And that's not even half the story, then there's alignment issues, etc....

Share this post


Link to post
Share on other sites
The ones I wrote are a bit more... sophisticated than that, and I don't use intrinsic or standard functions for it. The algorithm itself isn't the problem; the only way for it to fail would be to pass an invalid pointer, or to yank the RAM module out of the machine while it is running. The issue was, and I used GDB to check, the variables I defined were optimized out, and GDB reported "<variable optimized out>" when trying to read their values, yet the code still tried to access it, as it was of obvious importance.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara
The ones I wrote are a bit more... sophisticated than that
Do they rely on undefined behavrio?

Write the simplest, dumbest, standard version first, using nothing but built-in routines. Only then worry about "sophistication".

Quote:
The algorithm itself isn't the problem;
Then post it.

Either here, or on GCC bug tracker.

Share this post


Link to post
Share on other sites
The typedefs are simple. si32 == signed integer with 32 bits, uisys == unsigned integer of system dependent length.



#define _E_convertSwapByte32( i ) ((((i)&0xFF000000)>>24)|(((i)&0x00FF0000)>>8)|(((i)&0x0000FF00)<<8)|(((i)&0x000000FF)<<24))

#define _E_convertSwapByte16( i ) (((i)>>8)|((i)<<8))

typedef struct {
FILE * fp;
si32 endian;
sisys pos;
sisys length;
}E_io;

#define E_LSB 0
#define E_MSB 1

uisys E_ioWriteb( E_io *io, const void * ptr, ui32 size, ui32 n){
if(!io||!ptr)return 0;
uisys numWritten=0;
if(size == 1)numWritten = fwrite(ptr,sizeof(si8),n,io->fp);
else{
si8 * p = (si8*)ptr;
if(!io->endian){
si8 buffer[size];
while(n--){
ui32 byteIndex=size;
while(byteIndex--)buffer[byteIndex] = *(p++);
numWritten+=fwrite(buffer,size,1,io->fp);
}
}else numWritten+=fwrite(p,size,n,io->fp);
}
/* These lines below update the file position and the length of the file, so the structure stays updated */
io->pos = ftell(io->fp);
if(fseek(io->fp,0,SEEK_END))return 0;
io->length=ftell(io->fp);
if(fseek(io->fp,io->pos,SEEK_SET))return 0;
return numWritten;
}



Either way, other than scrutinizing my coding style, does anyone have a solution to the described issue?

Share this post


Link to post
Share on other sites
I wasn't aware that this is allowed, and definitely not in C89:
si8 buffer[size];


What happens if you malloc this buffer?

Other than that, check that inputs are sane, the problem might be elsewhere.

Share this post


Link to post
Share on other sites
Allowed, and not C89. C99 to be exact. I'm going to go back and change the dynamic stack allocations to dynamic heap once everything is finished.

Malloc would work fine.

The inputs are sane, the problem is elsewhere, the problem is GCC optimizes out the variable( I rewrote the function to avoid the issue ever so slightly, but to give an idea of how bad it is, imagine it optimized out the pointer `p'). The way I got around it in some of the functions, such as ones that use a spinlock to ensure thread safety, is to declare it volatile; of which I'm pretty sure is C99. But, my question was, does anyone have a method of preventing a variable from being optimized out other than declaring it volatile? I know where the problem lies, just looking for an alternate solution.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ectara

But, my question was, does anyone have a method of preventing a variable from being optimized out other than declaring it volatile? I know where the problem lies, just looking for an alternate solution.


A compiler will never optimize out a variable that is used. If it does, it's a bug in compiler.

There is no alternate solution, since the problem doesn't exist in the first place.

With threads, one thread might only write to a variable, but never read from it. Compiler would assume this variable is redundant and remove it. This cannot happen with single-threaded code.

Another reason why compiler might remove a variable is when its value is passed via registers only. So while a debugger would not see it, its value would still be valid and passed.

So unless it's a compiler bug, there might be an issue with taking an address of a variable that compiler has decided should live inside a register only. As long as address-of operation is legal, compiler will generate valid code. There is nothing to correct for here, where and how variables are stored is up to compiler, and it will never generate invalid code. If it does, it's a bug.

Share this post


Link to post
Share on other sites
Not necessarily. As seen in the function below, An infinite number of threads could be waiting to write, or read if the function decided to. Either way, it needs to wait it's turn.


void E_errorSetErrorStr(const si8 * message){
si32 tlock = E_globalErrorLock;
while(tlock) /* Compiler might make this always true, and infinite loop, or always false, and cause serious problems. */
tlock = E_globalErrorLock;
E_globalErrorLock = 1;

while(E_globalErrorCount>=E_ERROR_MAX_ERRORS){
E_error *terror = NULL;
if(E_globalErrorCount>2)
terror = E_globalError.next->next;
if(E_globalErrorCount>=2){
strncpy(E_globalError.error,E_globalError.next->error,E_ERROR_BUFLEN);
E_globalError.code = E_globalError.next->code;
free(E_globalError.next);
}
if(E_globalErrorCount==1)
memset(&E_globalError,0,sizeof(E_error));
else
E_globalError.next = terror;

--E_globalErrorCount;
}

E_error *terror = &E_globalError;
if(E_globalErrorCount){
while(terror->next)
terror = terror->next;
terror->next = (E_error*)malloc(sizeof(E_error));
if(E_errorGetAllocError(terror->next,E_TEST_CACHE))return;
terror = terror->next;
}
strncpy(terror->error,message,E_ERROR_BUFLEN);
#ifdef E_ERROR_DEBUG
fprintf(stderr,"%s\n",message);
#endif
terror->code = 0;
terror->next = NULL;
++E_globalErrorCount;
E_globalErrorLock = 0;
return;
}



The solution I had to use was to declare `E_globalErrorLock' as volatile, so it works now. It need not be a bug; with only a single thread, this assumption would be correct.

Share this post


Link to post
Share on other sites
Maybe GCC is similar to Visual-C in that you need to specify if your program is to run in a multi-threaded environment. Your compiler may be assuming your code is singly-threaded and therefore safe to optimise away that E_globalErrorLock check. With VC there is the -MT flag to tell the compiler to assume globals can be changed by other threads and not to make those sorts of assumptions. Perhaps GCC has a similar flag.

Also, in a multi-threaded environment, this code will cause issues:


si32 tlock = E_globalErrorLock;
while(tlock)
tlock = E_globalErrorLock;
E_globalErrorLock = 1;


The problem is the thread could be switched away from between reading E_globalErrorLock into tlock and checking the contents of tlock. eg. say E_globalErrorLock is zero when you enter this code. The first line is run loading E_globalErrorLock into tlock, which is zero. Your thread gets switched away to another one that also runs this code. The new thread runs through all three lines, setting E_globalErrorLock to one, and starts to run the code afterwards, then it too is switched away. The original thread runs, sees tlock is zero and sets E_globalErrorLock to one (when it is already one) and runs the same code. You now have two threads running the code you only wanted one thread to use.

You ideally should be using the platform-specific atomic test-and-set functions for things like this. They're guaranteed to operate correctly in a multi-threaded, multi-CPU environment.

Share this post


Link to post
Share on other sites
Quote:
Original post by PlayerX
Maybe GCC is similar to Visual-C in that you need to specify if your program is to run in a multi-threaded environment. Your compiler may be assuming your code is singly-threaded and therefore safe to optimise away that E_globalErrorLock check. With VC there is the -MT flag to tell the compiler to assume globals can be changed by other threads and not to make those sorts of assumptions. Perhaps GCC has a similar flag.

Also, in a multi-threaded environment, this code will cause issues:


si32 tlock = E_globalErrorLock;
while(tlock)
tlock = E_globalErrorLock;
E_globalErrorLock = 1;


The problem is the thread could be switched away from between reading E_globalErrorLock into tlock and checking the contents of tlock. eg. say E_globalErrorLock is zero when you enter this code. The first line is run loading E_globalErrorLock into tlock, which is zero. Your thread gets switched away to another one that also runs this code. The new thread runs through all three lines, setting E_globalErrorLock to one, and starts to run the code afterwards, then it too is switched away. The original thread runs, sees tlock is zero and sets E_globalErrorLock to one (when it is already one) and runs the same code. You now have two threads running the code you only wanted one thread to use.

You ideally should be using the platform-specific atomic test-and-set functions for things like this. They're guaranteed to operate correctly in a multi-threaded, multi-CPU environment.


Which is very true, and I was trying to avoid using platform specific code. But either way, for any of these, is there a compiler-independent way of doing this without using volatile, or should I be spending my energy on something else right now?

Share this post


Link to post
Share on other sites
Not really. Your compiler is applying an optimisation on the assumption it's in a single-threaded environment. You need to tell the compiler not to do that, to optimise for a multi-threaded environment. That's always going to be compiler-specific.

'volatile' appears to be in the C89 standard.

EDIT: my mistake, I thought you were asking if there was a compiler-independent way of stopping the optimisation. There's no really platform-independent way of doing multi-threading synchronisation unfortunately.


[Edited by - PlayerX on March 29, 2010 11:49:57 PM]

Share this post


Link to post
Share on other sites
Change "Optimize out" to "optimize". That's what's really happening. You declare a variable on the stack, compiler does subexpression elimination, constant folding, or whatever, and realizes there is an identity based on some other expression such that it need not even allocate memory for this new variable. It doesn't mean the compiler is doing something wrong, but rather probably your program is.

Edit: Nvm, I read more. It was a threading problem. There's not a platform independent way to invoke mutual exclusion, no.

Volatile isn't even what you want, it only guarantees memory ordering it doesn't guarantee atomicity (except in Visual C++, but even *that* isn't enough to solve the problem). Even with memory ordering (i.e. volatile) *and* atomicity, you can run this:


si32 tlock = E_globalErrorLock;
while(tlock)
tlock = E_globalErrorLock;
E_globalErrorLock = 1;







And 2 threads could run the while (tlock) test at exactly the same time, both determine it's false, and both "acquire" the lock.

Moral of the story here is, don't try to reinvent synchronization primitives.


BTW, to answer your original question, using a global variable is often faster because you don't have to constantly keep pushing the same argument on the stack over and over again every time you call a function. And if you access it frequently, it will usually stay in the cache. But like everyone else said, it's a micro optimization. Nevertheless, I'd *still* probably use a global variable, because passing around tons of extra arguments tends to get annoying pretty fast.

Share this post


Link to post
Share on other sites
I'm going to disagree that using globals frequently for this sort of thing is a good idea.

From an architectural perspective, a global variable introduces an implicit dependancy everywhere it is visible. This makes debugging very hard, as there is no real way to make any assumption about the contents of the global at any given time. Global Mutable state is A Bad Thing, in fact any mutable state that exists at a higher level than necessary is a bad thing. The higher up the pyramid it exists, the more egregious it becomes. This also makes refactoring difficult if and when a single global variable no longer suffices.

Its also bad because it lets you ignore certain design problems that would be readily aparent in code that had passed a pointer or (better) a reference around -- eg Chain of Responsibility-type issues.

Passing an extra parameter around isn't a big deal, and saying that its a "hassle" is, to me, just an excuse made up by lazy typists who fancy themselves programmers.

That's not to say that globals are evil, just that they should only be used when the solution calls for it -- If a global really fits the natural solution to a problem, then use one; if not, don't. If you claim to adhere to this principle and find that you're still using global's frequently, then your compass probably needs adjusting. When in doubt, don't.

I can honestly say I've used maybe two globals in anything I've written in the last 4 years, most of which are smallish, but non-trivial programs between 5k and 25k lines of code.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ravyne
I can honestly say I've used maybe two globals in anything I've written in the last 4 years, most of which are smallish, but non-trivial programs between 5k and 25k lines of code.


Have you used any singletons? Because singletons are essentially global variables with another name.

I'm not necessarily saying you should go throw globals all of your code, but the case of passing around some state to tons and tons of functions is really the poster example of when to use a global variable.

Share this post


Link to post
Share on other sites
Quote:
but the case of passing around some state to tons and tons of functions is really the poster example of when to use a global variable.


Having functions which are forced to rely on global mutable state has the potential to cause problems. Even if you aren't running into those problems, they can always pop up in the future. Is introducing global state really worth it just to save a few keystrokes?
Same argument applies for singletons of course, which are effectively an entire class full of global variables.

Share this post


Link to post
Share on other sites
Quote:
Original post by taz0010
Quote:
but the case of passing around some state to tons and tons of functions is really the poster example of when to use a global variable.


Having functions which are forced to rely on global mutable state has the potential to cause problems. Even if you aren't running into those problems, they can always pop up in the future. Is introducing global state really worth it just to save a few keystrokes?
Same argument applies for singletons of course, which are effectively an entire class full of global variables.


When was the last time you actually ran into this though? People always talk about it, and sure it happens sometimes, but games development is notorious for breaking good programming practices for various reasons. If you read any software design book you're going to learn how to do everything all object oriented, but then later we find out object oriented isn't all its cut out to be when it comes to game dev and that a data driven approach is better (OOP is array of structures (AoS), data-driven is structure of arrays (SoA)).

I would much rather have a global variable in my program than to spam the signatures of 800 functions all over my codebase with an argument whose sole purpose is to... pass around global state. Whether or not something is global isn't defined by how you write the code, it's defined by the meaning/purpose of the data in question.

Share this post


Link to post
Share on other sites
Quote:
Original post by cache_hit
Quote:
Original post by Ravyne
I can honestly say I've used maybe two globals in anything I've written in the last 4 years, most of which are smallish, but non-trivial programs between 5k and 25k lines of code.


Have you used any singletons? Because singletons are essentially global variables with another name.


I Have Not! And I resent the implication, good sir! I challenge thee to a duel!


Just kidding, but clearly you aren't familiar with my posting history [grin]. I abhore singletons -- they share all the faults of globals, plus one: Their 'singleness'. Singletons are just slightly more toxic globals.

Quote:
I'm not necessarily saying you should go throw globals all of your code, but the case of passing around some state to tons and tons of functions is really the poster example of when to use a global variable.


I believe the issue boils down to ownership of said state, rather than convenience. Global state is state which is not 'owned' logically by any one entity. If the state belongs to any one system, then it should not be made global. This does not preclude sharing access, but implies that it should be funneled through the proper channels in order to help keep tabs on who has access.

I have some data that, while not global, lives quite high up the chain in my applications (as dictated by need) and they are passed through parameters as far down as needed. In practice, I haven't found that I pass things down more than 3 levels or so typically.

I think people worry that the 'passing things around' is going to get out of hand quickly, and I think that idea is largely a symptom of having used globals in the past to solve this type of 'problem' -- but that 'solution' is no solution at all, it simply says 'I don't want to think about it, so everyone can see this state."

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Contrary to popularity, the following is undefined:
char * buffer;
reverse(*(int*)(buffer+offset));



That isn't even close to compiling code; I assume you meant something more like


template <class T>
void endian_swap(T * source) {
char* begin = reinterpret_cast<char*>(source);
std::reverse(begin, begin + sizeof(T));
}


If that's actually broken, please explain. It seems to me that the addresses of the individual bytes that make up a T instance are required to be consecutive, so there should be no problem. Even if there is something pedantic about the pointer arithmetic, we might instead try


char* begin = reinterpret_cast<char*>(source);
char* end = reinterpret_cast<char*>(source + 1);
std::reverse(begin, end);

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
That isn't even close to compiling code; I assume you meant something more like


template <class T>
void endian_swap(T * source) {
char* begin = reinterpret_cast<char*>(source);
std::reverse(begin, begin + sizeof(T));
}


If that's actually broken, please explain.

Not std::reverse, that one should be safe. Some generic endian reverse function, which likely accepts a value type such as int.

It's easy to get alignment problems with dereferences or any kind of direct conversions. Same with #define macro used above.

But it doesn't appear to be alignment issue here, so this isn't all that relevant.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this