• Content count

  • Joined

  • Last visited

Community Reputation

203 Neutral

About Chindril

  • Rank
  1. An update on this topic, our bugs were found and fixed. However this thread was me looking in the wrong direction. The deadlock happening was a side-effect of the real bug and not the source of my issues. The bug was coming from a system was acting erratically and sending way too many messages over the network and our code could not keep up and they would stack up. Overtime, having a ton of very small memory blocks for each message would fragment the memory and then various systems would fail. The deadlock was the most common effect but we also had thread initialization failing and sometimes a straight up memory allocation failure (malloc of a big chunk returns NULL and that pointer was then used).   Anyhow, a lot of time was wasted looking in the wrong direction but we learned a lot about our code base so it was not all in vain.   Thanks for your help !   (by the way I'm not sure if there's a way to tag this thread was closed, or solved, or something alike)
  2. @Pink Horror You are absolutely right, I was more trying to get ideas and brainstorm on this subject. Perhaps I overlooked something that someone could point me out. I understand it is very complicated but I can't release thousands of line of codes and hope someone does my job haha.   As for the debugging, the problem is that the freeze does not reproduce on our local test site, only at the customer's. We work on Windows with Visual Studio 2013 but I cannot install it on the target machine (Windows XP...) to attach the debugger.   So we are using ProcDump to generate a full memory dump file (using -ma), however when debugging it is missing some memory information for reasons I do not understand. I'm out of luck since I don't have the memory information of the mutex to retrieve the thread ID.
  3. A thread that acquires multiple locks is not a problem as long as all other threads acquires at most a single one of these locks at the same time. Am I correct in this assumption ?
  4. @ankhd: That's why a recursive mutex is used. As long as you are in the same thread you can look a mutex any number of times. It is reference counted and the lock is released only when all of the unlock() functions are called.   @frob: I was burned in the past with the ordering of multiple mutex locks and that's the first thing I look for when I have a multi thread. I suppose this code is risky and should be done in a better way. I never thought before of a class to handle multiple locks to prevent ordering issues but it sounds like a good idea for the future.   I need to review a larger scope of the code since the cause of this deadlock isn't obvious at first sight. I got 3 programmer to check the code and all of them think a deadlock shouldn't be happening. Perhaps the problem is elsewhere and the deadlock is a symptom of a bigger problem. Anyhow I'll keep in mind all of your suggestions and keep you posted when we find the issue :).   Thanks.
  5. @nfries88: I admit I should look back at the whole class to see if recursive mutex are really needed, but it is useful when your code is split in multiple functions and each one of those needs the lock. As for your 2nd point, the main thread does lock the mutexes throught the Foo class but never access them directly. Technically it cannot keep a lock on a mutex when the stack isn't in the Foo class. We actually updated our code to save the recusive mutex information about thread info so we will see if it is a memory corruption.   @Hodgman: DoStuffHere isn't a function, just to say there's some code in there that the mutex protect. I agree the code seems ridiculous because I removed most of the important processing. Would it make more sense if I said the Lock on MutexA at the end of a thread loop is really coming from a network event processed that needs to modify variables protected by that mutex ?   I'll try the patch of padding the mutexes with char buffers. I didn't think about it but it might point me to the correct direction.   Thank you both.
  6. I have a pretty big issue right now with a dead lock in a multi threaded software. I know which of my threads and mutexes cause the deadlock but I do not understand why. I have a setup that looks like this (Pseudo c++ code) AutoLock is a scoped lock class, nothing fancy. class Foo { public:     void start();   //Spawns the thread that will call run()     void doStuffA() {AutoLock lock(&mutexA); <DoStuffHere()> }     void doStuffB() {AutoLock lock(&mutexB); <DoStuffHere()> } private: void extraWork() { AutoLock lock(&mutexB); //Processing here doStuffB(); }     void run()   //Threaded function     {         while(true)         {             AutoLock lock(&mutexB);   //Lock the B Mutex             //Do a lot of work, networking stuff, etc extraWork(); //This is fine since we're using a recursive mutex             AutoLock lock2(&mutexA); //Dead lock here after a couple of hours of run time.         }     }     boost::recursive_mutex mutexA;     boost::recursive_mutex mutexB; }; int main() {     Foo foo;     foo.start();     while(true)     {         foo.doStuffA(); //Do stuff         foo.doStuffA(); //Do stuff         foo.doStuffA();         //Do some stuff         foo.doStuffB();    //Usually hangs here a bit while foo finishes a loop     } } So the code is obviously not exactly like this but the logic is the same. We ran this code with no problems for a long time and it just recently starting deadlocking. We traced the code with dumps and know this setup is causing the problem. Note that the main thread cannot lock A or B mutex directly but only by calling public functions of Foo. Since we lock only using the AutoLock class (Scoped lock), the main thread should never keep a lock on the mutexes. Yet, the thread sometimes hangs indefinitely when trying to lock mutex A. I know from looking at boost code that the hanging only happens if the current thread id inside the mutex is different from the one calling ->lock(). Therefore there's only 2 explanations to this problem. 1. The main thread somehow keeps a lock on mutex A. 2. There's memory corruption that messes the data of the mutex A. I'm really out of ideas and if some multithreading guru could give me tips on what to look for it would be greatly appreciated.  
  7. After quite a lot of testing, we found what causes the problem. Our setup that have issues is a multi-monitor setup (4 monitors) running all on a single geforce 980 card. We use the multi-head feature of directx to have only 1 device per monitor. When using 2 monitors there is no failure of the lock function and our code works as it always did. With 4 monitor and nothing else changing, it fails after a while as described in the top post.   While it doesn't give me a solution yet, it's good information. I'll dig deeper with that in mind but if you have any ideas, I'm open :)   As for directx debugging, when I activate the debug runtime on the client machine with problems, it doesn't fail anymore. I used DebugView to retrieve the directx output but I found nothing interesting in it.
  8. Ok, I misunderstood your question. I do not know if LockRect is the source of the problem, but it is the one that fails. My question initial should have been formulated as something like "What can cause LockRect to return an E_OUTOFMEMORY even if I have a lot of RAM remaining".   The truth is that I'm a beginner in regard of debugging DirectX calls. I'm using a third party graphic engine (Ogre) that is very well made and I rarely had issues that needed debugging the API. Now I need to but do not have the experience so I learn as I go. Right now I followed the questions of your previous post and activated all the debugging of the D3D9 libraries in the DirectX Control Panel and I am currently running my application in debug mode to gather more informations. I'll also try to remove some section of my code to pinpoint the cause of the issue.     Just a mistake when retyping this line of code. It should read "DWORD flags". I'll edit the post to remove the confusion.   Thanks,
  9. Thanks for the reply. I confirmed LockRect fails because my code looks like this D3DLOCKED_RECT lrect; DWORD flags = D3DLOCK_DISCARD; HRESULT hr = surface->LockRect(&lrect, NULL, flags); if(FAILED(hr)) {     //Log error and hr code } My log file shows that the hresult that fails is equal to 0x8007000E, which is E_OUTOFMEMORY. UnlockRect is called in the same function as LockRect and should be called every time. I'll do more checks right now to make sure there's no error thrown that would skip the Unlock.   As for the surface, it is not released between the calls to lock / unlock. I reuse the same surface every frame.   This problem happens only on a client machine so it is a release build. I have issues debugging directx in cases like this.
  10. I call this function to manually write data in my texture. This is called multiple times per second and it seems to work great until I get an E_OUTOFMEMORY from this function. Now obviously, it should tell me what I run out of ram or that I have a memory leak, however my application takes approximately 600mb and I still have 2gb + free. I also cannot find any information on the amount of memory allocated by this function.   The other possibility is that my memory gets fragmented so much that the application cannot find consecutive memory anymore, but I find that unlikely. Perhaps I'm missing something obvious but I don't see what could be happening.   Thanks,
  11. I have some more informations regarding this problem.   After more tracing, I realized I never get the D3DERR_DEVICELOST state when using TestCooperativeLevel, it is always D3DERR_DEVICENOTRESET. I receive this state at the first frame after I setup my 3 full screen windows and I can never reset it correctly. Calling IDirect3DDevice9::Reset with the correct presentation parameters returns S_OK as expected, however when I call TestCooperativeLevel again before rendering, the state is yet again D3DERR_DEVICENOTRESET.   The exact same code is working just fine when I use only 2 monitors, even with multisampling activated.   The multisampling values are used elsewhere in the code, for the depth buffer and textures among other things. Perhaps the problem is not with the device itself but with these extra resources ? Is it possible ?   Thanks,
  12. I have a problem right now when using DirectX 9 in multihead mode (3 monitor on 1 graphic card, using a single D3D9Device). My code works only if I deactivate the full screen anti-aliasing by setting MultiSampleType and MultiSampleQuality to 0 in the presentation parameters.   As soon as I activate the anti-aliasing, the device gets lost / restored every frame and all 3 monitors are only black. I'm trying to find out if these modes are compatible and if so, what am I doing wrong.   I could paste some code if needed, but I'm not sure what would be important.   Thanks,
  13. Microsoft and the Xbox One. Thoughts?

      Well, why should anyone pay gamestop (twice for the same disc) ?         But that is exactly what happens when a game enters the second hand market, an extra used copy suddenly appears out of thin air, and if it's sold again, another used copy, each time being played by somebody who could have just bought the game through a channel that supports the development of that product.     Gamestop doesn't make copies out of thin air, they buy used games and resell them. It's perfectly normal and legal to be able to do that. A game that you buy off the shelves is a product, not a service. Going to the theater is a service. The game developper and publisher should not get anything for second-hand games. The only difference between the game industry and any other industry is that they have the means to try and prevent second-hand, which in my opinion should be illegal.   Please tell me again why I can sell my movies in VHS / DVD form, but games in DVD should be forbidden ?
  14. How can a meteorite explode?

    samoth, where did I say I was 100% convinced it was a meteorite ?  I'm only 99.9% sure of that.   What I said was I'm 100% convinced it is NOT a north korean missile. Not only every video shows clearly that it is NOT a missile (seriously, don't even question it), but if it was a nuclear missile from north korea that exploded about Russia, Russia would be at war today.   And about this quote:   There are very few things I take for granted or trust without questioning and I usually stay out of conspiracy theories because of that, however your conspiracy is about the same level of the flat earther's. If by some miracle you are right, I'll gladly come back on and apologize publicly.   EDIT: some spelling mistakes
  15. How can a meteorite explode?

    Seriously, a conspiracy ? Did you even watched the movies ? There's not a single missile on earth that can outshine the sun. It just does not happen. And missiles don't leave a trail of fire behind them. The fact you believe it can even be a possibility is ludicrous. And then you say the movies would be part of the conspiracy ? Laughable. The movies were uploaded from many different sources minutes to a few hours after the event, so Russians had to prepare them beforehand. Yeah right.