Disabling Whole Program Optimization breaks my code

Started by
4 comments, last by Evil Steve 16 years, 9 months ago
Hi all, I've been playing with my compiler (Visual Studio 2005 Professional) to create an optimised debug build (Optimised code with debug symbols). I discovered that the debugger was failing miserably at getting symbol information, and I found that disabling link time code generation and whole program optimisation fixed that. However, a fair bit into my code, I log a message, do some more stuff, and then call into my memory manager to allocate a singleton (Yes, I know - singletons are bad, etc etc). Inside my memory manager, I call RtlCaptureContext, and the code crashes inside that call, here:

7C9038E0  pushfd           
7C9038E1  pop         dword ptr [ebx+0C0h] 
7C9038E7  mov         eax,dword ptr [ebp+4] 

PM
Because ebp == 0x00000008, so it causes an access violation trying to read from 0x0000000C. Looking through the source code and the disassembly, it turns out that EBP is set to 8 after my log call, and then it isn't changed at all until the crash (It may be changed inside functions, but if it is, then it's restored). Here's the code from the log to the singleton allocation which causes the crash:

bool PGraphics::Init()
{
HRESULT hResult;
D3DADAPTER_IDENTIFIER9 theInfo;
PApp& app = PApp::Get();

	// Create D3D interface
	ELog::Get().SystemLog(L"RENDER  : Creating D3D9 interface... ");
	m_pD3D = CreateDirect3DInterface();
	if(!m_pD3D)
	{
		// CreateDirect3DInterface logs any errors
		app.SetError(app.GetError() + L"\nYou need to have DirectX 9 installed");
		return false;
	}
	ELog::Get().SystemLog(L"Done\n");

	// Init D3DX
	ELog::Get().SystemLog(L"RENDER  : Initialising D3DX... ");
	if(!InitD3DX())
	{
		// InitD3DX logs any errors
		app.SetError(app.GetError() + L"\nYou need to have the Feb 2007 version of DirectX 9 installed");
		RELEASE(m_pD3D);
		return false;
	}
	ELog::Get().SystemLog(L"Done\n");

	// Get Device Names
	UINT nAdapters = m_pD3D->GetAdapterCount();
	ELog::Get().SystemFormat(L"RENDER  : Listing %d device adapters:\n", nAdapters);
	for(UINT i=0; i<nAdapters; ++i)
	{
		hResult = m_pD3D->GetAdapterIdentifier(i, 0, &theInfo);
		if(FAILED(hResult))
		{
			std::wstringstream str;
			str << L"Failed to get D3D device name [" << i << L"]. Error: ";
			str << DXGetErrorString(hResult);
			app.SetError(str.str());
			ELog::Get().ErrorFormat(L"RENDER  : * %s\n", app.GetError().c_str());
			RELEASE(m_pD3D);
			return false;
		}
		if(theInfo.Description[0] == 0)
		{
			ELog::Get().SystemFormat(L"RENDER  : * Adapter index %d has no description (\"%S\" in %S)\n",
				i, theInfo.DeviceName, theInfo.Driver);
		}
		else
		{
			ELog::Get().SystemFormat(L"RENDER  : * Adapter index %d is \"%S\"\n", i, theInfo.Description);
		}
	}

	// Get Device Caps
	ELog::Get().SystemLog(L"RENDER  : Requesting device caps... ");
	hResult = m_pD3D->GetDeviceCaps(D3DADAPTER_DEFAULT, s_bUseRefRast ? D3DDEVTYPE_REF : D3DDEVTYPE_HAL, &m_theCaps);
	if(FAILED(hResult))
	{
		std::wstringstream str;
		str << L"Failed to get D3D device caps. Error: ";
		str << DXGetErrorString(hResult);
		app.SetError(str.str());
		ELog::Get().ErrorFormat(L"%s\n", app.GetError().c_str());
		RELEASE(m_pD3D);
		return false;
	}
	ELog::Get().SystemLog(L"Done\n");

	// Log maximum texture size
	ELog::Get().SystemFormat(L"RENDER  : Maximum texture size is %dx%d\n",
		m_theCaps.MaxTextureWidth, m_theCaps.MaxTextureHeight);

	// Get device type
	if(m_theCaps.DevCaps & D3DDEVCAPS_HWTRANSFORMANDLIGHT)
		m_dwFlags = D3DCREATE_HARDWARE_VERTEXPROCESSING;
	else
		m_dwFlags = D3DCREATE_SOFTWARE_VERTEXPROCESSING;
	if(m_theCaps.DevCaps & D3DDEVCAPS_PUREDEVICE)
		m_dwFlags |= D3DCREATE_PUREDEVICE;

	// Instance singletons
	PTextureMgr::Get();
// Crashes on the above line


EBP is set to 8 after the log of "RENDER : Creating D3D9 interface... ". PTextureMgr::Get() creates the singleton if needed by calling global operator new, which calls PMemory::Allocate, which calls PMemory::DetermineCaller, which calls RtlCaptureContext(). Now, if I enable whole program optimisation, EBP is set to a sensible value on entry to PMemory::DetermineCaller, and it all works out fine. Here's my compiler and linker command lines (Without Whole Program Optimization - I.e. the version that causes problems):
Quote:Compiler command line: /O2 /Ob1 /I "F:\Perforce\TEH ARTYESS\Code\/Engine" /I "F:\Perforce\TEH ARTYESS\Code\/Game" /I "F:\Perforce\TEH ARTYESS\Code\/" /I "F:\Perforce\TEH ARTYESS\Code\/../" /D "WIN32" /D "_WINDOWS" /D "_CRT_SECURE_NO_DEPRECATE" /D "D3D_DEBUG_INFO" /D "BUILD_DEBUG" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /MT /Fo"Optimised Debug\\" /Fd"Optimised Debug\vc80.pdb" /W4 /WX /nologo /c /Wp64 /Zi /TP /wd4127 /errorReport:prompt Linker command line: /OUT:"../Bin/TEH ARTYESS.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Optimised Debug\TEH ARTYESS.exe.intermediate.manifest" /DELAYLOAD:"d3d9.dll" /DELAYLOAD:"d3dx9d_34.dll" /DEBUG /PDB:"f:\Perforce\TEH ARTYESS\Bin\TEH ARTYESS.pdb" /SUBSYSTEM:WINDOWS /OPT:REF /OPT:ICF /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib DelayImp.lib
EDIT: I've tried rebuild all several times. Also, here's the generated assembly for the start of PMemory::DetermineCaller with and without whole program optimisation - With full program optimisation (works):

void PMemory::DetermineCaller(AllocHeader* pAllocation)
{
00410F30  push        ebp  
00410F31  mov         ebp,esp                ; ebp is updated here
00410F33  and         esp,0FFFFFFF8h 
00410F36  sub         esp,528h 
00410F3C  mov         eax,dword ptr [___security_cookie (4A92A4h)] 
00410F41  xor         eax,esp 
00410F43  mov         dword ptr [esp+524h],eax 
00410F4A  push        ebx  
00410F4B  push        esi  

Without (dies):

void PMemory::DetermineCaller(AllocHeader* pAllocation)
{
00410B20  sub         esp,51Ch                ; ebp isn't touched
00410B26  mov         eax,dword ptr [___security_cookie (4BC43Ch)] 
00410B2B  xor         eax,esp 
00410B2D  mov         dword ptr [esp+518h],eax 
00410B34  push        ebx  
00410B35  push        ebp  
00410B36  push        esi  
00410B37  push        edi  
00410B38  mov         edi,dword ptr [esp+530h] 

So, does anyone have any idea what's going on here? Is this a compiler / linker bug, or a problem with my compiler / linker settings? This bug only shows up in my optimised debug build, not in my "normal" debug build. Cheers, Steve [Edited by - Evil Steve on July 11, 2007 2:37:24 PM]
Advertisement
My low level x86 is a bit rusty, but if you have debugging turned on ebp should be used to store the stack frame, which the working version does. I can't remember the name of the option in MSVC, but it has to do with not saving the stack frame, do you have it turned on by chance?
The option is /Oy ("Omit Frame Pointer" or something like that in the IDE), which is implied by /O2. Find some way to pass /Oy- after /O2 using the IDE option or manually to force the compiler to keep track of stack frames and see if it runs.

See this page for more info.
Quote:Original post by outRider
The option is /Oy ("Omit Frame Pointer" or something like that in the IDE), which is implied by /O2. Find some way to pass /Oy- after /O2 using the IDE option or manually to force the compiler to keep track of stack frames and see if it runs.

See this page for more info.
Hooray! Adding /Oy- to the compiler command line worked perfectly.

Thanks immensely [smile]
But what's the real problem? It sounds like you've just disabled an optimization which was exposing a bug in your code.
Quote:Original post by phil_t
But what's the real problem? It sounds like you've just disabled an optimization which was exposing a bug in your code.
I suspect it's a "feature" of RtlCaptureContext. To capture a a context record it obviously needs a stack frame to capture. Although there's no reference to that in the MSDN, I suppose it's obvious really...

This topic is closed to new replies.

Advertisement