Global variable vs passed pointer

Started by
24 comments, last by Ectara 14 years ago
Why is it faster to use a pointer to an object rather than accessing a global object? If possible, prove using ASM. I'm looking to this for optimization.
Advertisement
It isn't, necessarily. It depends.

It's not worth worrying about until a profiler has indicated that a particular area of the code where you are doing one or the other is a bottleneck.

Furthermore, neither C nor C++ map consistently to machine code for the underlying CPU, so looking at the C or C++ code for an issue this trivial will rarely tell you anything. You could speculate, but speculation isn't that helpful.

Cache locality will probably end up having a lot to do with it.
You seem to have a real predilection about these types of micro-optimizations, which is an unhealthy habit for a programmer, probably born out of misinformation or outdated study resources.

This isn't 1992 -- Compilers are a lot smarter now, and we don't usually have to worry about such small issues with scalar code, at least not until we have proof that there's a problem. We don't have to multiply by powers of two using shifts. We don't have to put 'inline' on everything...

The programmer should focus on describing the solution at an appropriate level of abstraction and let the compiler figure out the fine details. Then, if you have proof that the compiler is doing a poor job, you go back and restate the solution more explicitly in order to hint the compiler in the right direction. If that fails, then its time to think about assembly, if you *know* you can do better than the compiler, which isn't likely for scalar code, and will be even more rare once the new C++ langauge features are widely addopted.

I think in general, you're not going to beat the compiler's optimizer at anything it knows how to do, and the only chance a non-superstar ASM programmer has to beat the compiler is when the compiler is either clueless, or when its hands are tied because the language doesn't allow it to make necessary assumptions required for the optimization.

Without knowing when the compiler is going to do a less than stellar job, you're just chasing ghosts.

throw table_exception("(? ???)? ? ???");

You all make valid points. I have a habit of micro-optimizing everything, from coding back in the day. I tend not to rely on the compiler to do what it should. I guess I shouldn't worry about all of these small things. I guess the fact that I test builds with optimization off to speed up compilation time is also part of the reason I optimize. After the trial and error to make sure it works, I go back with optimizations on to make sure the compiler didn't do anything silly like optimize out a vital variable.

Oh well. While I'm here, since this thread is short-lived anyway, does anyone have a method of keeping the compiler from optimizing out variables without using the volatile keyword or performing a no op instruction with the dereferenced value of the address of it? I'm pretty sure volatile isn't C89.
Quote:Original post by Ectara
I guess the fact that I test builds with optimization off to speed up compilation time is also part of the reason I optimize.

And you are also optimizing the wrong thing. It is not the optimization that makes C++ builds slow.

In order of importance:
#1 includes and macro substitutions. These will account for upwards of 80% of the time
#2 linking time. For lots of code, linked becomes the bottleneck, especially if it needs to bring together many libs
...
#78 Actual code optimizations

#1 and #2 can be solved with a custom crafted build which includes ccache, precompiled headers, perhaps a PIMPL or two, and using forward declarations and compiling either on RAM drive or SSD. Using those it's possible to cut build times down to by several orders of magnitude down to a single-digit second times, even for reasonably sized projects.

Quote:After the trial and error to make sure it works,
Or, just write unit tests.

Quote:I go back with optimizations on to make sure the compiler didn't do anything silly like optimize out a vital variable.
If something like this happens, then you are either doing something really evil (such as addressing member variables by hard-coded offsets) or your compiler is broken (if it's gcc, MSVC, intel, then it's not).

Quote:does anyone have a method of keeping the compiler from optimizing out variables without using the volatile keyword or performing a no op instruction with the dereferenced value of the address of it? I'm pretty sure volatile isn't C89.

Um.... why do you care if a variable is optimized out or not. Compiler will statically check all variable accesses, and remove those that aren't needed, it will inline others, and rearrange everything else.

However, a compiler is forbidden from changing class layouts, so class member variables will never be removed, compilers are specifically forbidden from doing so.

I'm having hard time coming up with a case where compiler optimization could break code in this way.
I use gcc and C89 only, and I do find that it optimizes out a variable that I'm using, causing a segfault when I try to use the variable. Such as in one of my IO functions when it swapped bytes reading and writing from a different byte order. My previous method had a pointer that was moved through the block of bytes passed by reference in a manner that flipped the bytes of variable width boundaries, swapping bytes of variables, rather than the whole block.

Tried to run it, and it optimized that intermediate pointer out, and segfaulted when it tried to use it. It might have looked unused since I was just incrementing and decrementing it, and passing it to intrinsic functions, and manipulating the data elsewhere. So right now, for the functions I didn't find an alternate method for, I declared a couple variables volatile so gcc doesn't optimize it out.
Quote:Original post by Ectara
I use gcc and C89 only, and I do find that it optimizes out a variable that I'm using, causing a segfault when I try to use the variable. Such as in one of my IO functions when it swapped bytes reading and writing from a different byte order.


Most of the endian swap routines floating around the internet use undefined behavior, which means your code is working fine - it segfaults, which is undefined.

Quote:My previous method had a pointer that was moved through the block of bytes passed by reference in a manner that flipped the bytes of variable width boundaries, swapping bytes of variables, rather than the whole block.

Endian swap of this type is known to be problematic due to hardware, which can cause problems if performed on unaligned addresses. It's not necessarily a compiler thing, it can be a CPU problem. If you're working on anything else than x86, then it should almost certainly segfault, x86 is extremely tolerant of alignment issues.

As said, GCC is very unlikely to be buggy with this type of operations, but it cannot go beyond certain point when it comes to intent.

I seem to recall that this works for general endian swap:
template <class T >void endian_swap(T * source) {  char buffer[sizeof(T)];  memcpy(buffer, source, sizeof(T));  std::reverse(buffer, buffer+sizeof(T));  memcpy(source, buffer, sizeof(T));}

memcpy should avoid alignment issues, although I'm not sure it's guaranteed, platform documentation should describe what and how. IIRC there's something that has to do with unions that can make the above simpler, but I've forgotten most of the details.

The above might seem like an overkill, but it allows dealing with variables directly in a buffer.

Contrary to popularity, the following is undefined:
char * buffer;reverse(*(int*)(buffer+offset));


And that's not even half the story, then there's alignment issues, etc....
The ones I wrote are a bit more... sophisticated than that, and I don't use intrinsic or standard functions for it. The algorithm itself isn't the problem; the only way for it to fail would be to pass an invalid pointer, or to yank the RAM module out of the machine while it is running. The issue was, and I used GDB to check, the variables I defined were optimized out, and GDB reported "<variable optimized out>" when trying to read their values, yet the code still tried to access it, as it was of obvious importance.
Quote:Original post by Ectara
The ones I wrote are a bit more... sophisticated than that
Do they rely on undefined behavrio?

Write the simplest, dumbest, standard version first, using nothing but built-in routines. Only then worry about "sophistication".

Quote:The algorithm itself isn't the problem;
Then post it.

Either here, or on GCC bug tracker.
The typedefs are simple. si32 == signed integer with 32 bits, uisys == unsigned integer of system dependent length.

#define _E_convertSwapByte32( i ) ((((i)&0xFF000000)>>24)|(((i)&0x00FF0000)>>8)|(((i)&0x0000FF00)<<8)|(((i)&0x000000FF)<<24))#define _E_convertSwapByte16( i ) (((i)>>8)|((i)<<8))typedef struct {	FILE * fp;	si32 endian;	sisys pos;	sisys length;}E_io;#define E_LSB 0#define E_MSB 1uisys E_ioWriteb( E_io *io, const void * ptr, ui32 size, ui32 n){	if(!io||!ptr)return 0;	uisys numWritten=0;	if(size == 1)numWritten = fwrite(ptr,sizeof(si8),n,io->fp);	else{		si8 * p = (si8*)ptr;		if(!io->endian){			si8 buffer[size];			while(n--){				ui32 byteIndex=size;				while(byteIndex--)buffer[byteIndex] = *(p++);				numWritten+=fwrite(buffer,size,1,io->fp);			}		}else numWritten+=fwrite(p,size,n,io->fp);	}	/* These lines below update the file position and the length of the file, so the structure stays updated */	io->pos = ftell(io->fp);	if(fseek(io->fp,0,SEEK_END))return 0;	io->length=ftell(io->fp);	if(fseek(io->fp,io->pos,SEEK_SET))return 0;	return numWritten;}


Either way, other than scrutinizing my coding style, does anyone have a solution to the described issue?

This topic is closed to new replies.

Advertisement