• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
  • entries
  • comments
  • views


Sign in to follow this  
Followers 0


Around the middle of the week I decided to get my arse in gear and get working on some code related to a game idea I had.

I also decided to grab a trial copy of Intel's Parallel Studio to have a play with it, see what its auto vectorisation is like and also try things like the Thread Building Blocks.

One of the things I need in this game are particles.
Lots of them.

Now, the obvious way to deal with it is the give it to my shiney HD5870 and let that sim things, however I've never been one to take the obvious route and I've got a core i7 sitting here which hardly gets a work out; time for particles on the CPU.

Having installed the Intel stuff I had a look over the TBB code and decided that the task based system would make a useful platform to build a multi-threaded game on; it's basically a C++ way of playing with a threadpool, the thread pool being hidden behind the scenes and shared out as required to tasks. You don't even have to be explicate with tasks they provide parallel_for loops which can subdivide work and share it out between threads, including task stealing. All in all pretty sweet.

After putting it off for a number of days I decided to finally knuckle down and get on with some performance testing code. I've decided to start simply for now, get a vague framework up and running, however it'll require some work to make it useful with the final goal being to throw it at some people and see what performance numbers I get back for emitters and particle count.

I've so far tested 5 different methods of updating the particle system;

The first is a simple serial processing on a single thread; so each emitter is processed in turn and each particle is processed one step at a time. Best I can hope for is some single floating point value SSE processing to help out. The particles themselves are structs stored in an std::vector (floats and arrays of floats in the struct).

The second is the first parallel system I went for; emitters and particles in parallel. This uses the parallel_for from the TBB to split the emitters over threads and then to break each emitter's particles up between the threads. Same data storage as the first test.

The third was me wondering if splitting the particles over threads was a good idea so I removed the parallel_for from the particle code and left it in the emitter processing. Same storage setup as before.

Having had a look at my basic processing of the particles and a bit of a read of the Intel TBB docs I realised that I had a bit of a dependancy going on;

momentum += velocity
position += momentum * time;

So, I decided to split it into two passes; the first works out the momentum and the second the position. Storage stayed the same as before.

At this point I had some timing data;
With 100 emitters, each with 10,000 particles and the particle system assuming an update rate of 16ms a frame and a particle lifetime of 5 seconds:

- serial processing took 5.06 seconds to complete, with the final frame taking 23ms to process
- Emitter and particle parallel took 2.44 seconds to complete, with the final frame time taking 8.6ms to complete.
- Emitter only parallel took 2.4 seconds to complete, with the final frame taking 8.53ms
- Two pass parallel processing took 2.44 seconds to complete, with the final frame taking 8ms

Clearly, single threaded processing wasn't going to win and speed prizes, however splitting the work over multiple threads had the desired effect, with the processing now taking ~8ms, or half a frame.

The final step; SSE.

I decided to build on the two pass parallel processing as it seemed a natural way to go.

The major change to SSE was dropping the std::vector of structures. There were two reasons for this;
- memory alignment is a big one; SSE likes things to be 16byte aligned
- SIMD. Structures of data don't lend themselves well to this as at any given point you are only looking at once chunk of data.

SIMD was a major one, as the various formulas being applied to the components could be applied to x and y seperately, so if we closely pack the data this means we can hold say 4 X positions and work on them at once.

So, the new storage was a bunch of aligned chunks of memory aligned to 16bytes. The particle counts were forced to be multiple of 4 so that we can sanely work with 4 components at a time.

The dispatch is the same as the previous versions; we split threads off into emitters and then into groups of particles and these particles are processed.

Final result;
- Two pass SSE parallel processing took 1.09 seconds, with the final frame taking 4.9ms.

Thats a nicer number indeed, as it still leaves me 11ms to play with to hit 60fps.

There is still some tuning to do (min. particles per thread for example, currently 128) and it might even be better to do my own task dispatch rather than leave it to the parallel_for loop to setup. I've also not done any profiling with regards to stalling etc or how particles and emitter count effect the various methods of processing. Oh, and my own timing system needs a bit of work as well; taking all frame times and processing them for min/max/average might be a good idea.

However, for now at least I'm happy with todays work.

ToDo: Improve test bed.

Sign in to follow this  
Followers 0

1 Comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now