This topic is 2707 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

My program must remain compatible on Linux, Windows, and MAC.

I am using these Librarys:
OpenGL
SDL
libcUrl

I am at the point in my game where I need the map simultaneously load
while the player continues to move and update.
Right now when the map scrolls it shifts to a function that loads the new
part of the map interprets it and then creates the vertices's to draw. Then
returns to the player. Meaning when the player moves and the map
scrolls it stops for about a second. Which is way to long. If the player
happens to be between 2 pieces then both above and below keep

I need a way for both, Map Loading and interpreting and Player Actions
to be happening simultaneously.

My game is now starting to lag a little bit since I implemented the map

The solution would be threads, I plan on doing it as follows.
0 - Leave to be used by SDL and OpenGL (in case of no GPU)
1- Load and interpret new map Chunks
2- Calculate basic Physics for objects
3- Player Movement and Actions

I am using a Amd Phenom x4 2.2Ghz Cpu and with a program with one
thread that means it can only utilize 6Mhz + 1GPU.

Threads are the solution to my problem correct?

Research I have done shows that threads are different on all Operating
systems.
How should I implement threads so that it is still compilable on Linux,
MAC and Windows without code modification?

Eliminating the 1 second lag when the map moves is something that must be done.
Also, my apologies if some questions seem noobish because they are, but I like
to double check everything so that I can have it right.

##### Share on other sites
Personally I'd combine threads 2 & 3 assuming the player's movement and actions affect / are affected by the physics engine. But there are better people on here that can advise you on threading.

I just wanted to chime in and say you should keep recently loaded chunks of map in memory, so as to handle the situation where a player is bouncing between 2 (or even 4) of them. Keep 9 chunks in memory (the one the player is currently in, and one in each of the 8 directions) and you should be good (or you could do more, but make them smaller). Maybe have a threshold of how far into a map chunk he has to travel before removing the ones behind him and loading the next batch ahead (so if he'd have to make a decent sized zig-zag before you'd be loading and unloading chunks like crazy).

##### Share on other sites
Thanks for the info Krez I think i should keep more in memory. However they still take a while to load.

##### Share on other sites

The solution would be threads, I plan on doing it as follows.
0 - Leave to be used by SDL and OpenGL (in case of no GPU)
1- Load and interpret new map Chunks
2- Calculate basic Physics for objects
3- Player Movement and Actions

I'd cut that down to just 2 threads. Thread 0/2/3 as the first one, and thread 1 as the second. Threading isn't easy, and combining stuff like that will reduce the number of sync points you need.

Research I have done shows that threads are different on all Operating
systems.
How should I implement threads so that it is still compilable on Linux,
MAC and Windows without code modification?
[/quote]

##### Share on other sites

Research I have done shows that threads are different on all Operating
systems.
How should I implement threads so that it is still compilable on Linux,
MAC and Windows without code modification?

SDL has a threading system built in, and it's designed for exactly the sort of thing you're doing. Task parallelism, cross-platform. The Windows implementation of SDL produces Windows threads, and the Linux implementation produces POSIX threads, so you don't have to worry about the details.

However for easier development, I'd recommend OpenMP instead. It's more modern and requires less fuss. http://en.wikipedia.org/wiki/Openmp

If you're using Visual Studio, OpenMP is installed natively with the Pro & Team editions. If you're using Visual C++ Express (2005 or later), you can still get OpenMP for free by installing the Windows 7 SDK, and enabling OpenMP in the solution properties (Configuration Properties, C/C++, Language).

Under Linux, OpenMP is supported by GCC 4.2 & up. For Mac... Dunno. Who cares.

One advantage of OpenMP is that it's mostly done through compiler directives (pragmas), so you don't actually have to change your code very much. You will have to move some things around to prevent synchronisation problems, but you won't have to write any special code for threading, which is the way it should be. Also, if you find yourself porting to a platform without OpenMP, those pragmas will just be ignored, and your project will still compile & run as old-fashioned sequential code. Same if you want a single-threaded debug build: just disable OpenMP for the debug build. No other code-changes required.

But the best thing about OpenMP is just how ridiculously easy it is to take an 'embarassingly parallel problem' and parallelise it:
 #pragma omp parallel for for(int i = 0; i < 1000000; i++) { do_some_work(i); } 

##### Share on other sites

In the task-based approach there is no "physics thread" or "loading thread" and it's easier to write high-level parallel algorithms such as parallel_for_each, parallel_sort and so on, that will at least in principle scale to make use of whatever hardware is available.

OpenMP, as suggested by Helicobster will get you some way towards this goal and in fact might fulfil your needs entirely. But last time I checked, it doesn't explicitly provide a substrate on which to build message-passing kernels, which can also be extremely useful.

Some libraries you might be interested in include Intel's Threading Building Blocks, and Microsoft's Parallel Patterns Library.

##### Share on other sites
I second Intel Threading Building Blocks. It is specifically designed to exploit parallelism for performance and does so at the cost of not supporting asynchronous tasks. That is, you should not use it to run long running or blocking tasks (as this may cause deadlock, or at least ruin the performance benefits of TBB) - for this, you should use full blown threads (SDL's threads, boost threads, pthreads or whatever).

On the other hand, TBB is suitable (and I highly recommend it) for code that you wish to parallelize for performance. That is, you can use it to prepare data for rendering, updating AI or characters or physics, preparing data to be rendered and so on in a way that makes near optimal use of available processor cores.

Besides tasks, TBB provides a number of other (higher level) abstractions and algorithms to make it easy for you to parallelize your code, including parralel_for (loop over containers or arrays where the size is known ahead of time, in parallel), parallel_while/parallel_do (loop over data where the size is not known ahead of time or until a condition is met), parallel_sort, pipeline (where data is passed from one pipeline "stage" to the next and all stages are executed in parallel, so, eg, while stage1 processes item A, stage2 may be processing item B and so on) and more. Programming in terms of tasks is much easier (and better at achieving high performance/processor utlization), but programming with these higher level algorithms is much easier still than programming in terms of tasks.

On top of that, TBB also contains a bunch of other useful tools: concurrent containers (concurrent_vector, concurrent_unordered_map (hash table), concurrent_queue), atomic variables (with atomic operations such as get_and_set, compare_and_swap, get_and_add etc), a variety of locks, a multithreading-safe high performance timer, scalable versions of malloc and free (and scalable allocators for use with STL containers) and a few other useful bits and pieces.

In my engine, I have three threads:
- The main thread is dedicated to gathering input and rendering (because SDL's input system and OpenGL are required to be run from the main thread)
- The first SDL thread is used to run game logic
- The second SDL thread is used to asynchronously load resources from disk

The main thread and the resource thread basically consist of sequential code - no additional parallelism besides that they have dedicated threads. The logic thread, however, uses TBB (especially parallel_for) to process game objects, events and so on in parallel (in an entity system, but it will work just as well for other kinds of game engines).

The great thing about task-based parallelism, rather than thread-based, is that your program will use enough threads to make use of however many cores future processors will have, while if you have only a fixed number of dedicated threads, you will not gain anything from having more cores than what the program was designed for.

Note: you do not have to use TBB for any of this, there are other, similar, libraries that you could choose instead, but since I use TBB, you will have to google for them yourself. I would definitely discourage you from trying to write TBB-style code manually using more traditional threading packages though, as this is a recipe for horribly slow code.

##### Share on other sites
I would avoid heterogeneous thread layouts (that is, running a lot of different threads at the same time).

what you want to do, is break your game loop into "steps" and "jobs". at each "step" and "job" you want to go as wide as possible. using your threads, as a constantly running thread pool.

so, your game loop ends up like
prephysics gameplay - prephysics AI - physics - post gameplay - post AI - generate render lists - async IO.

and within each "step" there may be multiple "jobs" which get run, as "wide" as possible. so your physics step, may have a raycast resolve, an update, and a collision resolve.
not all jobs will be able to go wide (eg: UI) but some jobs will be easier to multiprocess (async IO/physics).

its pretty typical to run a separate render thread regardless, which is fed from the render list point in your game loop.

you should also aim to keep each "step" as completely dependency free as possible. and only communicate between steps using events. in each job. use lockfree queues to write results from worker jobs to.

also use "copies" of data for worker threads. and only commit all the updates when the job has finished.

this ensures that you have consistent state during processing, and gives you consistent synchronisation points during execution.

doing things this way, with a mostly homogenous thread layout, wins you massive instruction cache coherence, good predictability in terms of deadlock/livelock problems, very little actual mutex/other usage.

1. 1
Rutin
26
2. 2
3. 3
4. 4
JoeJ
18
5. 5

• 14
• 14
• 11
• 11
• 9
• ### Forum Statistics

• Total Topics
631757
• Total Posts
3002135
×