• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.

Christian Weis

  • Content count

  • Joined

  • Last visited

Community Reputation

97 Neutral

About Christian Weis

  • Rank
  1. Quote:Original post by Antheus Edit: The above example is identical to: Foo * ptr; void threadA() { ptr = NULL; } void threadB() { if (ptr) ptr->bar(); } No it's more like: Foo * ptr; void threadA() { ptr = NULL; } void threadB() { Foo* local = ptr; // LOCAL COPY!!!!! if (local) local->bar(); } And yes, I know, pointer assignments are not atomic, and it doesn't has to be, assuming that the compiler does not *optimize* 'local' away in which case a memory barrier would be required! So, either 'local' is NULL or points to the object. So 'local->bar();' get's either executed or not depending which thread is faster. Anyway, it was just a silly example I hacked together in few seconds, and a poor one too I have to admit. Anyway, the snippet has no meaning, no uses, or any meaningful behaviour. It's just to show that using call-by-value for shared pointers is more secure, as it models the fact that a function also shares/uses an object for short period of time. And yes, that's a minor issue, if at all! But the other pitfalls remain. Quote:Original post by Antheus Make sure to report the issue mentioned above to boost, C++ TR1, compiler and STL implementors and QT development teams. Since they are most widely smart pointers this "bug" is present in almost every single codebase out there and will soon be part of every C++ compiler. I like sarcasm! Quote:Original post by SiS-Shadowman You just showed us a race condition and infered that shared_ptrs are are crap? I smell a fallacay... Please read all posts before you make judgments. Besides others, the main reason why shared pointers are crap is because of the issue with cyclic structures. Quote:Original post by MaulingMonkey Why? Because I favor freedom of choice? I don't follow. Oh common, nobody can banish anything from our world! It was a metaphor. I'd just was to say that shared pointers are bad design, IMOH. If you disagree, fine.
  2. Quote:Original post by Antheus It's a bug due to race condition. Some bigger heads got this same thing *very* wrong. But the number of obscure production bugs it caused helped recognize Singleton as a bad thing, so that counts for something. But note that final solution involves a local temporary. All individual operations on shared_ptr are atomic, hence thread-safe. But 'if' statement is not a transaction, so it's a sequence of two atomic operations. So the 'if' statement must be done either in a lock-free manner (spin loop) or inside a lock. It is not. Double checked locking does only apply to raw/dumb pointers. So if it were dumb/raw pointers in the example then you were absolutely right. But in this special case there is no race-condition at all. Shared pointers are special because of their atomic nature you already mentioned yourself. Due to the local copy of that shared pointer the ref-count is increased (to 2). So it doesn't matter what other threads do with the global pointer. They can reallocate, remove, reset or whatever. The ref-count stays above zero, hence the original object the local pointer refers to is not deleted. So the if-statement does not need to be synchronized or atomic. Only the method 'DoSomething' needs to be synchronized. And please stop teaching me about synchronization, atomicy, transactions, etc. I know all this stuff. And if you would study the example step by step, you would also realize that in this special case, the code is correct.
  3. Quote:Original post by Antheus That is broken code, it has nothing to do with shared pointers or resource management. Calling two sequential statements in a concurrent environment without atomicity guarantees is a race condition. shared_ptr is thread-safe (boost uses InterlockedInc/Decrement or equivalent), but if statement in threadB is not atomic. The refcount of shared_ptr will never be invalid above, the caller's assumption is flawed. In above example, without explicit lock around if statement the object could also be allocated/freed while doSomething is executing. It is not broken when the shared pointer is passed call-by-value. It is absolutely valid code in that case because of the local copy. Have a look at it again. Quote: I could also throw in something about map() and functional programming... Nothing wrong with that. I did a lot of functional programming at the university years ago. I just posted the code, to support my poor english. English is not my native language, you know. Quote: Concurrent pattern applicable to current and past realities of almost all hardware, excluding Crays, is quote simple - threads do not claim any resources beyond those local to thread. Execution progresses in lock-step. Processing phase generates some outputs which are redistributed after individual work units are done. This completely eliminates the issue outlined above - a pointer that is null at any stage of processing will remain null till the end of this phase. Map-reduce is an example of such general approach, but optimal hashing can be used for specific problems, such as sending messages between entities. The cell chips also keeps memory local and communicates through message passing. And we will have these issues as long as we rely on shared memory. Maybe STM is the bright light at the end of the tunnel. There's a lot potential in it. I also had an idea for a transactional GC a while ago, where I'm currently working at in my spare time. That GC does only reclaim memory local to thread and therefore doesn't suffer from any contention. Future will show. Quote: Quote:Garbage collection has nothing to do with garbage collection! How shall I read this? Like this. Interesting article. I'd never looked at GC's from that perspective.
  4. Quote:Original post by MaulingMonkey ... It seems that you don't like the freedom of opinion. Keeping your team in check doen't help much if rely on poor design choices. And thank you to that person how rated me down.
  5. Quote:Original post by Antheus Quote:Original post by Christian Weis Threads also share objects. If you don't do that in a multithreading environment, an object could be accidentally deleted although some threads still refer to it. Not sure I follow. The generic shared_ptr has interlocked increment so it's thread-safe. But that is an implementation detail. A conceptual issue to solve is how to ensure multiple threads determine that they are all done with a certain object. At which point the easiest solution to reason about is the scatter/gather, fork/join, map/reduce, whatever-you-wanna-callit fixed time step simulation. On each tick, distribute work among threads. When all are done, merge the artefacts they generated into shared state, start next step. Which is basically what the previous observation of long-lived vs. single-frame transient state is about. Those are by far the two most commonly used allocation schemes. Well, let's clear things up a bit. First some pseudocode: std::shared_ptr< Object > Shared; void ThreadA() { Shared = 0; } // call-by-reference. void Foo( std::shared_ptr& X ) { if( X ) X->DoSomething(); } void ThreadB() { Foo( Shared ); } Let's assume that 'Shared' points to an object with a ref-count of 1. And 'ThreadA' & 'ThreadB' are executed on two different threads respectively. Now, quess what's wrong with this code snippet. The Problem lies between 'if( X )' and 'X->DoSomething();'. What if a context-switch happens here. Thread B thinks that 'X' points to a valid object as 'if(X)' has already been executed. Now Thread A kicks in and sets 'Shared' to null. The ref-counter gets decreased to zero and the object is deleted. Now Thread B continues by executing 'X->DoSomething();'. 'X' is also null now and any access to it will hopefully crash. One possible fix to this, is to make a local copy in Thread B: void ThreadB() { std::shared< Object > Local = Shared; Foo( Local ); } The local copy increases the ref-count to 2, so that it's now safe to have call-by-ref in 'Foo'. That's a frequent pitfall. And the use of atomic interlock operations doesn't make shared pointers necessarily thread-safe, if you use them incorrectly as in my sample above. However, as a rule of thumb, you can easily get away with this by always using call-by-value semantics. This way, the compiler automatically does local copies which adjust the ref-count properly. With some performance-loss of course. That's also why smart-pointers are very inefficient in a multithreaded world. Quote: Quote:Yeah, but the COM principle is frithening popular.So is Java OOP mindset and its abuse in C++. But that doesn't invalidate Java, OOP, C++ or implicitly or explicitly managed shared pointers, just them being mixed/used possibly incorrectly. And this is exactly what makes me frightening. Quote: Quote:The only issues that mark&sweep has is with temporary short-lived objects.Which can be a fairly big deal, which is why per-frame, stack-based or similar pools are used. In-place or pre-allocated structures are also well understood for most problematic cases (collision detection, spatial hierarchies, various property tables, embedded scripting ). So the corner cases are covered completely without need for general solution. Agreed. That's why I prefer stack-based memory allocators and object pools for short-lived objects. But for long-lived objects I recommend more advanced algorithms. So, if you keep short-lived objects away from mark&sweep, you don't need to bother with this issue. Quote: GC has nothing to do with garbage collection, allocations, preformance or similar. The goal is to allow application to pretend it has infinite memory. Garbage collection has nothing to do with garbage collection! How shall I read this? Of course, ref-counting, auto-stack management and other strategies are also part of garbage collection. But whenever I say GC, I mean the more advanced algorithms to not confuse people. And of course, one purpose of GC's is to reclaim memory. And they are of course a part of memory management. The main purpose is to just increase productivity (a bit). But I don't think that anybody realy believes in infinite memory ;)
  6. Quote:Original post by Antheus Quote:Original post by Christian Weis Reference counting smart pointers introduce more pitfallsSome. It's not that bad. That's an understatement. Quote: Quote:Isn't that obvious!No. It should be. Quote: Quote:For instance, you must never use call-by-reference semantics.Why not? It's even better to pass by const. Threads also share objects. If you don't do that in a multithreading environment, an object could be accidentally deleted although some threads still refer to it. Quote: Quote:You also need to use the release-function properly. If you call it to often or not enough or in wrong areas, you are doomed.Ah, the COM model. It's much more convenient to use the auto-allocation provided by C++. Yeah, but the COM principle is frithening popular. Quote: Quote:Either mark&sweepConcurrency issues. Quote:generational algorithm.Very difficult to get right in constrained memory environment. This can also mean trying to juggle large objects in 32-bit address space. What concurrency issues? You can use mark&sweep like any other GC in concurrent software. You just need to implement a write barrier like card marking, page traps, etc. That's it. The only issues that mark&sweep has is with temporary short-lived objects. Quote: Quote:You can achieve deterministic destruction of objects by doing a full GC cycles every frame!Deterministic but unbounded. Something can be O(1), yet take either 1 ms or 1 million ms. Of course it's bounded. It's bounded by the complexity of the object-graph and by how many objects you'd removed from the tree during a frame. Quote: Quote:That's exactly what I do in my project. And again GC's are not necessarily slow!Speed wasn't even mentioned here much. I know, but that's what most people think at first.
  7. I don't understand you guys! Reference counting smart pointers introduce more pitfalls than they solve. Isn't that obvious! First of all, most implementations I saw where either completely wrong or had serious bugs. Especially those with multithreading support. And even if you can manage to get a good implemenation, there're still many things to be aware of. For instance, you must never use call-by-reference semantics. You also need to use the release-function properly. If you call it to often or not enough or in wrong areas, you are doomed. Oh, and let's not forget the most important weakness; Cyclic structures leak memory! Of course, you know the code you write yourself. You can manage to avoid cyclic references, as you exactly know how your object-graph looks like. But what if other people want to use your library in their projects. They would need a deep understanding of the object-graph. That's a huge burden! And if they accidentally model cyclic structures, they will have a very hard time to track that bug down. Ironically, ref. cnt. smart pointers lead to that what you've tried to avoid; memory leaks! And the list of pitfalls goes on and on. A design pattern should make my life easier and not harder. So, ref.cnt. smart pointers are definitely object oriented bullshit, IMHO. And they should be banished from our world! The only alternative is a full general GC. Either mark&sweep or a generational algorithm. You can achieve deterministic destruction of objects by doing a full GC cycles every frame! That's exactly what I do in my project. And again GC's are not necessarily slow! I also use a graph pruning algorithim to quickly reject many objects which are known to be either all referenced or all unreferenced. This boosts GC performance by a factor of 4 in the average. And there're still some more optimizations left to do. The overhead of my implemenation is just ~0.7% of the frame time. That is, if the games runs at 100 FPS without a GC, performance will drop to 99 FPS with a GC. And I guess it's acceptable to sacrifice 1 FPS for the increase in productivity and usability of your library. Unless you really need hard realtime guarantees. Just my two cent!
  8. Quote:Original post by Antheus For large number of items batching is likely to be a better choice. The cost of critical section acquisition is comparable to interlocked operations, but in case of batching resource loader needs to do one acquire per item, the main thread however will only do one per frame to acquire entire batch. The probability of congestion is low, and operations within critical section are fast. Agreed! However, batching/multi-buffering does only work well in a 1-1 relationship, that is, single producer, single consumer. So the producer fills the first buffer with requests while the consumer processes the second buffer. They can act 100% simultaneous without any contention. And at the end of each frame you simply swap the buffers with a cheap InterlockExchange. In fact it's one of the most efficient approaches. I also use this whenever possible. Yet it assumes a 1-1 relationship. What about n/m relations (multiple producer, multiple consumer)! Of course, with today quad-cores you can design the engine in such a way, so that you have 1-1 relations exclusively. You can work around it. Or you can simply fall back to critical sections. And yes, they're fast, especially on Windows where they are simple spin-locks basically. And spin-locks spin around in a loop as well, and you have exactly the same issues with cache coherence, cache line fighting as with lock-free structures. No win here. I'm currently asking myself how processors might look like in the near future. And my personal guess is, that we'll have processors with dozens of cores able to run dozens of hardware threads. On such platforms it's very hard to avoid n/m relations between threads. And I think that in such a massive multithreaded world lock-free structures may supersede locks/multi-buffering. By the way, you also missed one thing. Lock-free programming is hard and error-prone. Batching with multi-buffering is very simple and straightforward to implement and maintain. One more point for you ;) However, I like lock-freedom for some reason, and it's an interesting alternative.
  9. Right! Programming in general is moving forward quickly, so a 10 year old article is pretty much outdated today. Almost everyone uses resource files nowdays.
  10. Why not writing a simple mark&sweep GC?! This algorithm is safe, reliable, and not that inefficient as many people think. I use mark&sweep for more than 7 years now in my projects and dit not have any problems with it yet. Give it a try.
  11. Quote:Original post by Antheus Lock-free has the practical problem of not being readily available from some standard library. That's right. It's about time to add lock-free data structures to boost. Quote:Original post by Antheus Either way, the number of items for resource loader will be relatively small (one cannot realistically have millions of textures loaded during handful of frames, or at all), so either approach works. It does matter when each mipmap is loaded individually rathen than the texture as a whole. And having some thousands of mipmaps per second loaded in a background-thread is common. It also depends on how fast the player can move and whether he can teleport himself around.
  12. First of all, waiting for something is almost always a very bad idea. So all these condition variables shouldn't be used in a realtime application. I know many people do it that way, and that's exactly why so many games with resource streaming out there have such a jerky behavior and occasionally long pauses. This is because the OS puts the thread asleep whenever there're no work for the thread to do. This can eat up millions of CPU cycles. And when a request arrives the OS needs to wake up the thread again which also takes millions of CPU cycles. You are wasting a lot of CPU power that way. In other words, using synchronization primitives provided by the OS is not advisable. A better solution is to use a lock-free queue and to spin around until a resource is about to be loaded. This way the thread is always *hot* and never put asleep by the OS. Some code: while( Running ) { Request* Item = LockFreeQueue.PollGet(); if( Item ) { Item->Process(); LockFreeQueue.FreeForReuse( Item ); continue; } // Idle the thread. SwitchToThread(); Atomic::MemoryBarrier(); } As a rule of thumb, never wait for anything. So it's also a bad idea to wait for a resource to be loaded. Instead just poll whether the resource is available. If so, then use that resource, otherwise use a fallback-path. For instance, the loading of textures could be done like this: Image* TextureToUse; bool IsLoaded = Texture->IsLoaded; Atomic::WeakMemoryBarrier(); if( IsLoaded ) TextureToUse = Texture; // The texture has been loaded, so use it. else TextureToUse = DefaultTexture; // The texture has not been loaded yet, so fall back to some other default-texture. TextureToUse->MakeCurrent(); ... All default textures are loaded and cached at startup. The Unreal Engine 3 does it this way, for instance.
  13. I don't want to be harsh, but writing FPU optimized assembly code is almost always a waste of time. The FPU is actually so slow that any inline assembly code doesn't give you any noticable benefits. If this function is so time-critical to you then I recommend to use SSE intrinsics as proposed by RobTheBloke.
  14. Well, that guy who wrote the nonsignificant article made all the destructors virtual. And deleting items with virtual destructors is time-consuming. So he wrote a custom stack which does not actually delete popped elements. Instead the elements are kept around for later allocations. This way no destructors are getting called. Well they are only called when the whole stack is destroyed. So basically he did an absolute unnecessary optimization with his custom allocator, which could have been avoided completely by simply using std::vector instead of std::stack, which uses std::list internally by default in many STL distributions. However that guy doesn't seem to be that smart. Just ignore that article. [Edited by - Christian Weis on August 7, 2010 9:04:14 PM]
  15. void (__stdcall Canvas::* )(PVOID,BOOLEAN) appears like a member function to me. But Canvas::TimerProc must be static!