Weird behavior for a function with optimizations turned on

Started by
18 comments, last by Hodgman 6 years, 5 months ago

I have a function which returns the name of the processor for X64 based on __cpuid() intrinsic. When i run that function without optimizations its fine and works correct, but when i compile and run it with O2 for example then the compiler does weird things. The branch for detecting if "destBuffer" is null is totally changed, no comparison will be made and "destLen" will never be set.

The result is that PushMemory_Internal() gets a totally garbage size and may return nullptr. What did i wrong? The compiler definitily rearranged it a lot, so its hard to follow.

Basically the function has two modes, one with a given destBuffer and its max length and one without any dest buffer at all - so in this case we stack allocate the memory we need for worst case size.


	static char *GetProcessorName(char *destBuffer = nullptr, const uint32_t maxDestBufferLen = 0);		

	static char *GetProcessorName(char *destBuffer, const uint32_t maxDestBufferLen) {
			constexpr uint32_t cpuBrandBufferSize = 0x40;

			uint32_t destLen = maxDestBufferLen;
			if (destBuffer == nullptr) {
				destLen = cpuBrandBufferSize + 1;
				destBuffer = (char *)malloca(destLen);
			}
			assert(destBuffer != nullptr);
			assert(destLen >= (cpuBrandBufferSize + 1));

			int cpuInfo[4] = { -1 };
			char cpuBrandBuffer[cpuBrandBufferSize] = {};
			__cpuid(cpuInfo, 0x80000000);
			uint32_t extendedIds = cpuInfo[0];

			// Get the information associated with each extended ID.
			// Interpret CPU brand string.
			for (uint32_t i = 0x80000000; i <= extendedIds; ++i) {
				__cpuid(cpuInfo, i);
				if (i == 0x80000002) {
					CopyMemory(cpuInfo, cpuBrandBuffer, sizeof(cpuInfo));
				} else if (i == 0x80000003) {
					CopyMemory(cpuInfo, cpuBrandBuffer + 16, sizeof(cpuInfo));
				} else if (i == 0x80000004) {
					CopyMemory(cpuInfo, cpuBrandBuffer + 32, sizeof(cpuInfo));
				}
			}

			// Copy result back to the dest buffer
			uint32_t sourceLen = strlen(cpuBrandBuffer);
          	strncpy(destBuffer, cpuBrandBuffer, sourceLen);

			return(destBuffer);
		}

Thanks in advance,

Final

Advertisement

There's... Multiple things wrong with that code.

Let's start with this: https://msdn.microsoft.com/en-us/library/5471dc8s.aspx

And move on to this: https://en.m.wikipedia.org/wiki/Dangling_pointer

 

Hopefully you'll see how your approach is fundamentally flawed. (As to why the optimizer scrambles your code - welcome to undefined behavior.)

Design a better approach and we can get nitpicky on the rest of that snippet :-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

2 hours ago, ApochPiQ said:

There's... Multiple things wrong with that code.

Let's start with this: https://msdn.microsoft.com/en-us/library/5471dc8s.aspx

And move on to this: https://en.m.wikipedia.org/wiki/Dangling_pointer

 

Hopefully you'll see how your approach is fundamentally flawed. (As to why the optimizer scrambles your code - welcome to undefined behavior.)

Design a better approach and we can get nitpicky on the rest of that snippet :-)

Okay sometimes i should not touch any keyboard at all. Its obvious that alloca allocates memory on the CURRENT stack frame, not on any other one O_o. Changing to malloc solves that particular problem, but the removal of the condition (destBuffer == nullptr) was still there. So i rearranged the condition a bit and now it works: Also the memcpy order in this code i posted was wrong, but in my actual code its correct.

Here is a full working main.cpp for x86 and x64 with and without optimizations:


#include <intrin.h>
#include <inttypes.h>
#include <assert.h>
#include <malloc.h>
#include <stdio.h>
#include <string>

static char *GetProcessorName(char *destBuffer = nullptr, const size_t maxDestBufferLen = 0);

static char *GetProcessorName(char *destBuffer, const size_t maxDestBufferLen) {
	constexpr uint32_t cpuBrandBufferSize = 0x40;

	char *dest;
	size_t destLen;
	if ((destBuffer != nullptr) && (maxDestBufferLen > 0)) {
		dest = destBuffer;
		destLen = maxDestBufferLen;
	} else {
		destLen = cpuBrandBufferSize + 1;
		dest = (char *)malloc(destLen);
	}
	assert(dest != nullptr);
	assert(destLen >= (cpuBrandBufferSize + 1));

	int cpuInfo[4] = { -1 };
	char cpuBrandBuffer[cpuBrandBufferSize] = {};
	__cpuid(cpuInfo, 0x80000000);
	uint32_t extendedIds = cpuInfo[0];

	// Get the information associated with each extended ID.
	// Interpret CPU brand string.
	for (uint32_t i = 0x80000000; i <= extendedIds; ++i) {
		__cpuid(cpuInfo, i);
		if (i == 0x80000002) {
			memcpy(cpuBrandBuffer, cpuInfo, sizeof(cpuInfo));
		} else if (i == 0x80000003) {
			memcpy(cpuBrandBuffer + 16, cpuInfo, sizeof(cpuInfo));
		} else if (i == 0x80000004) {
			memcpy(cpuBrandBuffer + 32, cpuInfo, sizeof(cpuInfo));
		}
	}

	// Copy result back to the dest buffer
	size_t sourceLen = strlen(cpuBrandBuffer);
	strncpy_s(dest, destLen, cpuBrandBuffer, sourceLen);

	return(dest);
}

int main(int argc, char **args) {
    char *cpuName = GetProcessorName();
	printf("CPU Name: %s\n", cpuName);
    free(cpuName); // Really i want a defer free(); function...
	getchar();
	return 0;
}

 

4 hours ago, Finalspace said:

Changing to malloc solves that particular problem, but the removal of the condition (destBuffer == nullptr) was still there.

Is that a problem? The function many have been inlined.

1 minute ago, KidsLoveSatan said:

Is that a problem? The function many have been inlined.

It might be, but there are no guarantee for that - even if it would use the "inline" keyword. Maybe with a forced inline, but this would require to change the api signature...

Just now, Finalspace said:

It might be, but there are no guarantee for that - even if it would use the "inline" keyword. Maybe with a forced inline, but this would require to change the api signature...

That doesn't answer why is it a problem if the condition is optimized away? 

1 minute ago, KidsLoveSatan said:

That doesn't answer why is it a problem if the condition is optimized away? 

I have no idea why the condition is optimized away... maybe because of the default argument of nullptr and zero for the destination length?

Just now, Finalspace said:

I have no idea why the condition is optimized away... maybe because of the default argument of nullptr and zero for the destination length?

I compiled your code with /O2 and the function is being inlined, if I turn off function inlining a number of variables are optimized away. I don't see why you have a problem with it, to me it's working as it should.

In your original code, "destLen" isn't used outside of assignment and the assert though, so the compiler will totally optimize that away. It should still enter the branch and assign the destBuffer though, are you sure that didn't happen as well?

1 hour ago, Juliean said:

In your original code, "destLen" isn't used outside of assignment and the assert though, so the compiler will totally optimize that away. It should still enter the branch and assign the destBuffer though, are you sure that didn't happen as well?

I debugged it in visual studio 2015 and it fully jumps to the allocation directly. Its not inlined, i havent enabled aggresive inline rules.

I think i know what happened, the branch is not compiled out - but rather moved/rearranged, so the comparison is not on the same line anymore. Looking at the disassembly output should make that clear.

So the problem was definitly the alloca() allocating memory on the current stack frame, then leaving the function and then throwing it away. Depending on the inlining of a function is not a good idea i think. Inlining should simply help make your code faster by removing function calls - but it should never change the actual result.

I hate switching between low level and high level languages all the time, so you forget such fundamental things.

This topic is closed to new replies.

Advertisement