# Weird behavior for a function with optimizations turned on

This topic is 373 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have a function which returns the name of the processor for X64 based on __cpuid() intrinsic. When i run that function without optimizations its fine and works correct, but when i compile and run it with O2 for example then the compiler does weird things. The branch for detecting if "destBuffer" is null is totally changed, no comparison will be made and "destLen" will never be set.

The result is that PushMemory_Internal() gets a totally garbage size and may return nullptr. What did i wrong? The compiler definitily rearranged it a lot, so its hard to follow.

Basically the function has two modes, one with a given destBuffer and its max length and one without any dest buffer at all - so in this case we stack allocate the memory we need for worst case size.

	static char *GetProcessorName(char *destBuffer = nullptr, const uint32_t maxDestBufferLen = 0);

static char *GetProcessorName(char *destBuffer, const uint32_t maxDestBufferLen) {
constexpr uint32_t cpuBrandBufferSize = 0x40;

uint32_t destLen = maxDestBufferLen;
if (destBuffer == nullptr) {
destLen = cpuBrandBufferSize + 1;
destBuffer = (char *)malloca(destLen);
}
assert(destBuffer != nullptr);
assert(destLen >= (cpuBrandBufferSize + 1));

int cpuInfo[4] = { -1 };
char cpuBrandBuffer[cpuBrandBufferSize] = {};
__cpuid(cpuInfo, 0x80000000);
uint32_t extendedIds = cpuInfo[0];

// Get the information associated with each extended ID.
// Interpret CPU brand string.
for (uint32_t i = 0x80000000; i <= extendedIds; ++i) {
__cpuid(cpuInfo, i);
if (i == 0x80000002) {
CopyMemory(cpuInfo, cpuBrandBuffer, sizeof(cpuInfo));
} else if (i == 0x80000003) {
CopyMemory(cpuInfo, cpuBrandBuffer + 16, sizeof(cpuInfo));
} else if (i == 0x80000004) {
CopyMemory(cpuInfo, cpuBrandBuffer + 32, sizeof(cpuInfo));
}
}

// Copy result back to the dest buffer
uint32_t sourceLen = strlen(cpuBrandBuffer);
strncpy(destBuffer, cpuBrandBuffer, sourceLen);

return(destBuffer);
}

Final

##### Share on other sites

There's... Multiple things wrong with that code.

And move on to this: https://en.m.wikipedia.org/wiki/Dangling_pointer

Hopefully you'll see how your approach is fundamentally flawed. (As to why the optimizer scrambles your code - welcome to undefined behavior.)

Design a better approach and we can get nitpicky on the rest of that snippet :-)

##### Share on other sites
2 hours ago, ApochPiQ said:

There's... Multiple things wrong with that code.

And move on to this: https://en.m.wikipedia.org/wiki/Dangling_pointer

Hopefully you'll see how your approach is fundamentally flawed. (As to why the optimizer scrambles your code - welcome to undefined behavior.)

Design a better approach and we can get nitpicky on the rest of that snippet :-)

Okay sometimes i should not touch any keyboard at all. Its obvious that alloca allocates memory on the CURRENT stack frame, not on any other one O_o. Changing to malloc solves that particular problem, but the removal of the condition (destBuffer == nullptr) was still there. So i rearranged the condition a bit and now it works: Also the memcpy order in this code i posted was wrong, but in my actual code its correct.

Here is a full working main.cpp for x86 and x64 with and without optimizations:

#include <intrin.h>
#include <inttypes.h>
#include <assert.h>
#include <malloc.h>
#include <stdio.h>
#include <string>

static char *GetProcessorName(char *destBuffer = nullptr, const size_t maxDestBufferLen = 0);

static char *GetProcessorName(char *destBuffer, const size_t maxDestBufferLen) {
constexpr uint32_t cpuBrandBufferSize = 0x40;

char *dest;
size_t destLen;
if ((destBuffer != nullptr) && (maxDestBufferLen > 0)) {
dest = destBuffer;
destLen = maxDestBufferLen;
} else {
destLen = cpuBrandBufferSize + 1;
dest = (char *)malloc(destLen);
}
assert(dest != nullptr);
assert(destLen >= (cpuBrandBufferSize + 1));

int cpuInfo[4] = { -1 };
char cpuBrandBuffer[cpuBrandBufferSize] = {};
__cpuid(cpuInfo, 0x80000000);
uint32_t extendedIds = cpuInfo[0];

// Get the information associated with each extended ID.
// Interpret CPU brand string.
for (uint32_t i = 0x80000000; i <= extendedIds; ++i) {
__cpuid(cpuInfo, i);
if (i == 0x80000002) {
memcpy(cpuBrandBuffer, cpuInfo, sizeof(cpuInfo));
} else if (i == 0x80000003) {
memcpy(cpuBrandBuffer + 16, cpuInfo, sizeof(cpuInfo));
} else if (i == 0x80000004) {
memcpy(cpuBrandBuffer + 32, cpuInfo, sizeof(cpuInfo));
}
}

// Copy result back to the dest buffer
size_t sourceLen = strlen(cpuBrandBuffer);
strncpy_s(dest, destLen, cpuBrandBuffer, sourceLen);

return(dest);
}

int main(int argc, char **args) {
char *cpuName = GetProcessorName();
printf("CPU Name: %s\n", cpuName);
free(cpuName); // Really i want a defer free(); function...
getchar();
return 0;
}

Edited by Finalspace

##### Share on other sites
4 hours ago, Finalspace said:

Changing to malloc solves that particular problem, but the removal of the condition (destBuffer == nullptr) was still there.

Is that a problem? The function many have been inlined.

##### Share on other sites
1 minute ago, KidsLoveSatan said:

Is that a problem? The function many have been inlined.

It might be, but there are no guarantee for that - even if it would use the "inline" keyword. Maybe with a forced inline, but this would require to change the api signature...

##### Share on other sites
Just now, Finalspace said:

It might be, but there are no guarantee for that - even if it would use the "inline" keyword. Maybe with a forced inline, but this would require to change the api signature...

That doesn't answer why is it a problem if the condition is optimized away?

##### Share on other sites
1 minute ago, KidsLoveSatan said:

That doesn't answer why is it a problem if the condition is optimized away?

I have no idea why the condition is optimized away... maybe because of the default argument of nullptr and zero for the destination length?

##### Share on other sites
Just now, Finalspace said:

I have no idea why the condition is optimized away... maybe because of the default argument of nullptr and zero for the destination length?

I compiled your code with /O2 and the function is being inlined, if I turn off function inlining a number of variables are optimized away. I don't see why you have a problem with it, to me it's working as it should.

##### Share on other sites

In your original code, "destLen" isn't used outside of assignment and the assert though, so the compiler will totally optimize that away. It should still enter the branch and assign the destBuffer though, are you sure that didn't happen as well?

##### Share on other sites
1 hour ago, Juliean said:

In your original code, "destLen" isn't used outside of assignment and the assert though, so the compiler will totally optimize that away. It should still enter the branch and assign the destBuffer though, are you sure that didn't happen as well?

I debugged it in visual studio 2015 and it fully jumps to the allocation directly. Its not inlined, i havent enabled aggresive inline rules.

I think i know what happened, the branch is not compiled out - but rather moved/rearranged, so the comparison is not on the same line anymore. Looking at the disassembly output should make that clear.

So the problem was definitly the alloca() allocating memory on the current stack frame, then leaving the function and then throwing it away. Depending on the inlining of a function is not a good idea i think. Inlining should simply help make your code faster by removing function calls - but it should never change the actual result.

I hate switching between low level and high level languages all the time, so you forget such fundamental things.

Edited by Finalspace

1. 1
Rutin
42
2. 2
3. 3
4. 4
5. 5

• 9
• 27
• 20
• 14
• 14
• ### Forum Statistics

• Total Topics
633384
• Total Posts
3011598
• ### Who's Online (See full list)

There are no registered users currently online

×