Sign in to follow this  
xelanoimis

Threads problem on Intel DualCore

Recommended Posts

Hi, I think I have a threads problem caused by my music streaming implementation, that uses a separate thread. The problem only appear on an Intel Pentium D 3.2Ghz (DualCore) processor. My thread code isn't too smart (no semaphores or mutexes), but it worked fine on other computers, including mine. However, trying to get some crash info from this DualCore, my debug call stack looks like a mess (i keep track of current call stack, pushing and poping function names in a buffer stack, with guard and unguard on each function). On my AMD (single core) when the processor runs code from the thread, I can see it included in the middle of the call stack, but it resumes from where it was. But on the dual core... it looks interleaved, gets accumulated and at some point it crashes. I was thinking that if I have a guard at tha begining of the function, that increments the stack, and an unguard at the end that decrements it, no matter how threads enters functions, they should exit too, and the stack should remain clear, not to accumulate entries or mess them up. I use standard memory allocation functions (malloc, realloc, free) and the cause of the crash is a realloc that returns NULL. Well, if anyone can give me some advice about how I can make it work on Dual Core... if that's the real problem. Maybe someone had some experience with this sort of problems on DualCore. Thanks! PS: It looks like those guys from Intel, really run parallel code on their Dual Cores, since they managed to mess my code :)

Share this post


Link to post
Share on other sites
Quote:
Original post by xelanoimis
My thread code isn't too smart (no semaphores or mutexes), but it worked fine on other computers, including mine.


Well, there's your problem. Once you have multiple CPUs you start having the possibility that two pieces of code run simultaneously, rather than just the illusion of it. Some errors can only happen on a multi-CPU system.

Quote:
I was thinking that if I have a guard at tha begining of the function, that increments the stack, and an unguard at the end that decrements it, no matter how threads enters functions, they should exit too, and the stack should remain clear, not to accumulate entries or mess them up.


Instead of using what I assume to be ad-hoc "guards", use real synchronization primitives. Do you even know if your increment/decrement operations are atomic?

Share this post


Link to post
Share on other sites
While opcodes are executed atomically on a single-core cpu (for instance XCHG, INC), this is not the case anymore on multicore cpus. Here you also have to lock the bus, so that only one cpu may access the memory (LOCK prefix).

Your code for a multicore spinlock may look something like this:

int flag = 0; // 0 - free; 1 - occupied
...
while (1)
{
if (flag == 0) // check if there is a chance to obtain a lock
{
if (testandset(flag, 1)) // obtain the lock atomically
{
break;
}
}
if (isSingleCore)
{
sleep(0); // yield to next thread on single core cpu
}
}
...
// perform the work inside the spinlock
...
flag = 0; // release the lock (if you are paranoid you also may lock the bus here)

where testandset(int& flag, int value) may contain something like this:
(adapt this to your assembler)

mov eax, value
xchg [flag], eax ; note that xchg automatically performs a lock of the bus!

Since XCHG automatically asserts the LOCK# signal, no LOCK prefix is needed. If you are using other instructions for synchronisation (BTS, INC, ADD, XOR, ...) you have to use a LOCK prefix for anything other than a singlecore cpu.

Share this post


Link to post
Share on other sites
Well, i guess the increment and decrement are atomic ( stacktop++ and stacktop--) but the Push and Pop themself ... they are simple but probably not atomic. Not to mention that inside the push, if there is no more space I do a realloc on the buffer. It looks ugly for a dual core, indeed.

In fact, here is some of my code.
I use a static string in each function, containing it's name
and a dummy object to take advantage of it's constructor and destructor to
push and pop the pointer to the static string on the stack.

#define guard(func) { static char* __FUNC__ = #func; cDebugGuardTrackDummy __GUARDTRACKDUMMY__(__FUNC__) __TRY

#define unguard() __CATCH }

class cDebugGuardTrackDummy // dummy object used to push and pop in the guard
{
public:
cDebugGuardTrackDummy( const char* func ) { cDebugGuard::Push(func); }
~cDebugGuardTrackDummy() { cDebugGuard::Pop(); }
};

void cDebugGuard::Push( const char* func )
{
if( !g_debug_init ) return;
if(m_stacktop==m_stacksize) // realloc
{
m_stacksize += 32;
m_stack = (char**)realloc(m_stack, m_stacksize*sizeof(char**));
}
m_stack[m_stacktop] = (char*)func;
m_stacktop++;
}

void cDebugGuard::Pop()
{
if( !g_debug_init ) return;
if( m_stacktop>0 ) m_stacktop--;
}



If any of you have some suggestion about how to use this debug method in threads with the dual core processor... please let me know.

I can't just not use guards with a part of the code that is to be run on the secondary thread. The same classes may also be used in the main thread.

Is there a way to test in the Push fnc if i'm running in the secondary thread and not track the stack if so?
Or I have to give up with this system and try somthing else for extra runtime debuging?

What are those "real synchronization primitives" you mentioned?

Share this post


Link to post
Share on other sites
Quote:
Original post by xelanoimis


void cDebugGuard::Push( const char* func )
{
if( !g_debug_init ) return;
if(m_stacktop==m_stacksize) // realloc
{
m_stacksize += 32;
m_stack = (char**)realloc(m_stack, m_stacksize*sizeof(char**));
}
m_stack[m_stacktop] = (char*)func;
m_stacktop++;
}


Wow, I'm surprized that works even on a single-core CPU. What would happen, for example, if two threads enter Push() at the same time, and get as far as both executing realloc() at the same time? Realloc is threadsafe: the first thread would free your m_stack pointer and malloc another one. The second thread would free the same pointer, which is a failure condition, return NULL, and your app will die a lingering and particularly gruesome death. Sound familiar?

Looks like you need to brush up on the basics of thread synchronization.

Share this post


Link to post
Share on other sites
I've tried to add a semaphore to my guard/unguard calls but the frame rate droped pretty low (from 1050 to 250).
I'm thinking to keep the guards only in the debug version.
However, does anyone know if a STL list would be thread safe - it will require only a call for Push/Pop in the guard/unguard, but I don't know if the job inside it is thread safe (or if it will work faster).

Share this post


Link to post
Share on other sites
Multithreading 101:
Yes, you need to synchronize all access to any shared data. No, not just in debug mode, and yes, even if it lowers your framerate. (You might then try to look at more efficient ways of ensuring synchronization)

Furthermore, you don't "think" an operation is atomic. It either is or it isn't. And when in doubt, it isn't.

STL isn't thread-safe. The language has no built-in support for threads, and neither does the STL. No knowledge of threads makes it hard to be thread-safe. :)

And no, ++ and -- are not atomic.
Even on a single-core system, they might consist of multiple operations. Read data into register, increment/decrement, and write back to memory.

Share this post


Link to post
Share on other sites
Thanks, so STL is out of question too :)
I was refering to remove my whole debug guard-unguard system on release builds, not only it's sincronization. It is a debug system after all and if I can't make it safe without droping the frame rate, it does not justify it's need.
Thanks for the help!

Share this post


Link to post
Share on other sites
You might want to have a look at Lock Free Data Structures. That should speed up your code somewhat if you can manage to use them.

Syncronisation can be pretty expensive since it requires a transition to kernel mode to perform the lock. And it's even more expensive if the thread has to block because the mutex is already held.

Out of interest, what are you using for syncronisation? Critical sections (InitializeCriticalSection(), EnterCriticalSection(), etc)?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this