Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 17 May 2007
Offline Last Active Nov 04 2013 05:53 PM

Topics I've Started

Help with dbghelp/imagehlp

26 August 2012 - 03:16 PM

I have a case where SymGetSymFromAddr() says it's succeeding, but in fact it is returning an empty string for the symbol.

Here is my test case:
#include <windows.h>
#include <imagehlp.h>

#include <cassert>
#include <cstddef>
#include <cstring>
#include <iostream>

struct symbol_buffer
	static const std::size_t max_sym_length = 4096;

		std::memset(buffer, 0x00, sizeof buffer);

		IMAGEHLP_SYMBOL *sym = get();

		sym->SizeOfStruct = sizeof(IMAGEHLP_SYMBOL);
		sym->MaxNameLength = max_sym_length;

	IMAGEHLP_SYMBOL *get() { return reinterpret_cast<IMAGEHLP_SYMBOL *>(buffer); }

		DWORD for_alignment;
		unsigned char buffer[sizeof(IMAGEHLP_SYMBOL) + max_sym_length];

void print_function_name(UINT_PTR program_counter, HANDLE process)
	const char *symbol_ptr = "???";

	symbol_buffer sym_buff;

	IMAGEHLP_SYMBOL *symbol = sym_buff.get();
	//symbol->Address = program_counter; // don't think this is needed, but doesn't make any difference

	if (SymGetSymFromAddr(process, program_counter, 0, symbol))
		symbol_ptr = symbol->Name;

	if (symbol_ptr)
		std::cout << "function: '" << symbol_ptr << "'\n";

const char *splendid() { return "splendid"; }

int main()
	const HANDLE process = GetCurrentProcess();

	//	SymGetOptions()

	if (SymInitialize(process, 0, TRUE) == 0)
		return 1;

	const char *(*fp)() = &splendid;
	UINT_PTR fp_val = 0;

	assert(sizeof fp_val == sizeof fp);
	std::memcpy(&fp_val, &fp, sizeof fp);

	print_function_name(fp_val, process);


	return 0;

I am building with the Visual C++ 2010 toolchain as follows:

P:\guff>cl /nologo /EHsc /W3 /WX /GR /Zi /GS /arch:SSE2 /MTd /Oy- /c symbols.cpp /Fdsymbols.obj.pdb /Fosymbols.obj

P:\guff>link /nologo /incremental /WX symbols.obj imagehlp.lib /debug /pdb:symbols.exe.pdb /out:symbols.exe

When running the code, the output tells me that the function name is the empty string, whereas I would have expected something like "splendid", or "splendid(void)", etc:

function: ''

When stepping though the code in the Visual C++ debugger, the watch window quite happily tells me that the 'fp' variable in main() refers to "splendid(void)" as shown in the attached screenshot, so it seems the necessary symbolic information is present.

Even more curiously, if I enable global optimization by adding /GL compiler switch and replacing the /incremental linker switch with /LTCG, I get the kind of output I'd expect:

function: 'splendid'

I have also tried SymFromAddr() instead of SymGetSymFromAddr(), and changing the options in the SymInitialize() (see commented-out code in main()), but the same behaviour persists.

Can anybody see what I'm doing wrong?

pthread_mutex much slower than naive benaphore?

13 August 2012 - 06:10 PM

I came across the concept of a benaphore for the first time not so long ago and decided to experiment with it. I implemented it on Windows and OS X and while I'm not sure I'll use it for anything, it remains in my threads library for the time being.

On Windows, it performs as expected i.e. very well in low contention cases, terribly under high contention. But on OS X, I noticed my high contention stress tests were completing much faster for benaphores than they were for my pthread_mutex wrappers and I'm hoping someone might be able to shed any light on the reasons why.

I understand that pthread mutexes will be inherently slower due to additional features and capabilities, but given that we're talking over an order of magnitude's performance difference under high contention scenarios (in the opposite direction to that which I'd expect), I'd like to understand what's going on.

The forum says I'm not permitted to attach a .cpp file (really?!), so I've put the benchmark code in my DropBox Public folder for now. It's a single C++ file, essentially equivalent to one of my stress test cases. The semaphore I've used here is dispatch_semaphore_t from libdispatch. A number of threads are started, each of which increments a shared variable a large number of times, acquiring/releasing the lock before/after each increment. Here's the body of each thread:

template<typename Mutex>
void *thread_body(void *opaque_arg)
	CHECK( opaque_arg != 0 );

	shared_stuff<Mutex> &stuff = *static_cast<shared_stuff<Mutex> *>(opaque_arg);

	for (uint32_t i = 0; i != stuff.increments; ++i)
	return 0;

And here are the results on my quad core i5 iMac running OS X 10.7, built without error checking code.

Attached File  bena_vs_mutex_stats.png   18.25KB   39 downloads
(Sorry about the use of an image, I fiddled with formatting for 30 minutes but the forum always insisted on mangling it to some degree).

Even in the case of 2 threads we're looking at 2.458 vs 86.479 seconds (!). I can happily accept that the libdispatch semaphore is more efficient than a pthread_mutex, but given the degree of improvement in the benaphore case, I'm inclined to believe that I've misconfigured/misunderstood something. Any ideas as to an explanation?

compiler discrepancy: use of undefined type

10 August 2012 - 02:59 PM

#include <iosfwd>

struct foo { };

std::ostream &operator<< (std::ostream &out, const foo &) { return out; }

void format(std::ostream &out, const foo &x) { out << x; }

int main()
    return 0;

Comeau online and g++ 4.6 both accept this code without issue. Visual C++ 2010 complains:

P:\guff>cl /nologo /EHsc /W3 ostream_fwd.cpp /Feostream_fwd.exe
ostream_fwd.cpp(7) : error C2027: use of undefined type 'std::basic_ostream<_Elem,_Traits>'

Who's correct? If someone could try this on the latest Visual C++ compiler, I'd be interested to see what it says/does.

Windows PE TLS callbacks in DLLs

22 July 2012 - 09:19 AM

I'm attempting to use the TLS callback mechanism baked in to the PE format and Windows loader to provide proper cleanup in the Windows implementation my thread-local storage system.

(For those unfamiliar with this mechanism, the 'background' section of this codeproject article provides a quick summary).

When I link my threads library (static library) to an executable, my TLS callback is called as expected. However if I link the threads library in to a DLL (by which I mean, create a DLL which is linked against my static library), the callbacks aren't called.

Furthermore if I go poking around in my MSVC build of the dll (with this, for example) the AddressOfCallBacks field (0x1010DD88) in the PE TLS directory points in to the string table once I've subtracted the preferred image load address (0x10000000). Without the subtraction the address is beyond the end of the image, so it appears to be bogus either way, though realignment at load time might end up shuffling things about suitably(?).

Does anybody know if the PE TLS callback mechanism is supposed to work for DLLs in this manner?

I couldn't find anything on the web that indicated it shouldn't work, but on the other hand all the examples I found were creating executables.

I'm targeting recent versions of MS Visual C++ and MinGW GCC. Here's the code snippet compiled in to my static library that places a pointer to the callback in the appropriate PE section.

extern "C"

#if defined(_MSC_VER)
#   if defined (_M_IX86)
#	   pragma comment(linker, "/INCLUDE:__tls_used")
#   else
#	   pragma comment(linker, "/INCLUDE:_tls_used")
#   endif
#   pragma data_seg(".CRT$XLB")
	PIMAGE_TLS_CALLBACK mtl_tls_callback_ = mtl::tls_callback;
#   pragma data_seg()
#elif defined(__GNUC__)
	PIMAGE_TLS_CALLBACK mtl_tls_callback_ __attribute__ ((section(".CRT$XLB"))) = mtl::tls_callback;
#   error "Toolchain not supported <img src='http://public.gamedev.net//public/style_emoticons/<#EMO_DIR#>/sad.png' class='bbc_emoticon' alt=':(' />"

} // extern "C"

The workaround is to provide an auxhiliary DLL and have its DllMain do the cleanup, but I'd like to know if I can avoid having to do that.

Unicode: how to determine current/future combining character codepoint allocation?

25 March 2012 - 05:18 PM

The Unicode 6.0 standard, section 7.9, mentions a number of blocks containing combining marks. One of these blocks, 'Combining Marks for Symbols' runs from U+20D0 to U+20FF. Now, there are a number of code points in that range which are not assigned (at least not in version 6.0). The implication seems to be that those unassigned code points are reserved for additional combining marks, in future revisions. Is there any way of deducing this implication purely from the files in the Unicode database? I'd like to avoid having to trawl the Unicode PDF document for mention of potential future code point allocations.

I see that the database labels combining characters for assigned code points with General Category values of 'Mn', 'Mc', or 'Me', but since unassigned code points aren't listed in the database it doesn't look like I can infer their intended future use. But maybe I've missed a trick somewhere?

To ask the question another way, what's a reasonably efficient way of determining if a code point is a combining mark? Should/could the algorithm include as-yet-unassigned code points in its domain?