Mutlithreaded rendering.... (RFC)

Started by
26 comments, last by Harvester 24 years, 1 month ago
Usually, your program is getting 99% of the CPU anyways, multithreading gives no noticeable speedup, and will probably give a slow down due to the additional context switching and synchronization overhead.

I tried this once, I created a multithreaded app with two CPU intensive threads. Doing the two intensive CPU calculations in parallel took 8.67 seconds. Doing it a sequentially in a single thread took 8.63 seconds.

If you're locking any of these surfaces, remember lock makes the system acquire the Win16Lock.

Also, please don't mess with SetPriority, you can cause major problems with your system by doing this (if you set it too high, disk buffers won't flush.)

I'm not saying never use multithreading, use it in the right situation. IO intensive tasks are great to put in another thread, since they spend most of their time blocked (waiting for a slow IO device like a disk) and if you're smart, you can get the effect of disk accesses taking zero time. User input in another thread is ok also (just please don't make it a busy-wait constantly checking the DInput states.)

Edited by - mhkrause on 2/22/00 8:11:59 AM
Advertisement
My game was taking up 99% of the cpu power, then i added 2 more threads and it only uses 76%. Maybe it''s not sped-up, but it can handle a lot more now without fear of slowing down. Multi-threaded applications are good, multi-threaded rendering may not be so good, what is going to guarantee that the background is the last thing to be drawn and the only thing to be seen? You might be able to do it, but it would involve more than just threads and end up being more trouble than it''s worth
hmmm, i see your point now mhkrause.
True, i use threads in IO mode, and especially when writting servers in Linux. In windows i don''t have much experience in threads.

Well, based on my Linux threads experience, i can tell that using multiple threads is the only way to serve much traffic, such as 100000 requests/sec. I use multiple threads for aquiring the data through socks, and then a lot more threads that will process the data. To be honest, i never expected that the threads used for processing will be a lot slower!!!! But then, thats another project


Well, i think that the best thing is to write a mutlithreaded rendering app , and then meassure the results.

PS: I don''t need syncronization, nor care on the locking...

I''ll try to give another explanation (summary) of what i''ve thinked so far...

Each thread renders a part of the map. Not a diferent layer, but a diferent area. These areas can then be puzzled together, and they form then the screen rendered scene.

Hence each thread will be writting on its own MemorySpace, i won''t need to use semaphores or anything. The threads in NO way interact with eachother. When all threads are finished blitting, then all the sufraces (that were blitted by each thread), are put together.

PS: The IO uses in rendering is really quick. It is actually the time that it takes that data to go into memory, through the busses.

Well, i''ll give it a try .

I''ll post soon results of this research

c ''ya around.

Paris Theofanidis
... LEMMINGS ... LEMMINGS ... LEMMINGS ... LEM..... SpLaSh!...Could this be what we stand like before the mighty One?Are we LeMmIngS or WhAt!? ;)
Heh, that''s funny reading the posts of people who don''t know how actually hardware works and (maybe) never handcode this kind of stuff in asm.

I''ll say: NEVER use multithreading for rendering. You can use it for background music, inputs etc (since they must be implemented asynhronius with rendering).
When it comes to rendering, following factors will greatly decrease perfomance in multithreading:

- task switching (very slow)
- data/code cache trashing (cache gives a lot of speedup if used currectly)
- some other facts...

While doing low-level rendering (mostly in inner-loops), it''s better to not interrupt this process. Also, cache misses while accessing data can greatly slow down whole stuff.



FlyFire/CodeX
http://codexorg.webjump.com

FlyFire, i may not be a hardware expert, but process/thread switching speed depends much on the OS, and as i've posted in a prev. post, i'm not much into Win threads.

PS: Under UNIX each thread is actually a process... i'd never use threads under linux for rendering. Does this rule applies under Windows too?

Its not just the cache of the CPU that increases the speed. CPU's handle instructions in parallel, allowing it to handle more than one instructions per cycle. This parallel processing was used since 80486 if i recall correctly, and optimizing important loops with this in-mind, you can speed up your code a lot more than you can imagine. However, this type of optimization usually turn's your app's code upside down (i've seen it)

c 'ya

PS: How are things up there? The news haven't reported anything for sometime now...
Regards to my neighbour Russia


Edited by - Harvester on 2/27/00 4:57:47 PM
... LEMMINGS ... LEMMINGS ... LEMMINGS ... LEM..... SpLaSh!...Could this be what we stand like before the mighty One?Are we LeMmIngS or WhAt!? ;)
Hi, Harvester!

Yes, thread changing speed also depends on OS (OS can handle it fast or slow), but of cource, it mainly depends on CPU. Processor spends a lot of cycles while changing thread from one to another and this causes a great speed loss in rendering.

quote:
This parallel processing was used since 80486 if i recall correctly,


No, only from Pentium class processors

quote:
However, this type of optimization usually turn''s your app''s code

Yes, i''ve just finish my assembler DDA line drawing procedure. Without clipping, i''ll say it have no conditional jumps, and hand-uptimized setup code takes about 40 cycles (all instructions pairs, and setup code calculates dy/dx, dy/dx, abs(dx), abs(dy), sign(dx), sign(dy), round(x1), round(x2), round(y1), round(y2) values)

But shit, it''s FAST!!

quote:
PS: How are things up there? The news haven''t reported anything for sometime now...
Regards to my neighbour Russia

Good...



FlyFire/CodeX
http://codexorg.webjump.com

The INTEL 80386 was multithread capable. (protected mode)
I think that the 80386 was capable to do parallel processing if correctly programmed.
(I think it can handle it in special cases)
Even so, only win32 used protected mode and parallel processing.
M$ is always very late with it''s software solutions.

I used thread for I/O and INPUT, while my ''main'' thread is producing images.
Maybe I''ll create a ''IA'' thread...

Only critical functions must be code in ASM, in the end of the process of writting an app.

-* Sounds, music and story makes the difference between good and great games *-
-* So many things to do, so little time to spend. *-
quote:
The INTEL 80386 was multithread capable. (protected mode)
I think that the 80386 was capable to do parallel processing if correctly
programmed.

We are talking about multithreading or pipelined processor arhitecture?

quote:
I used thread for I/O and INPUT, while my ''main'' thread is producing images.

This, i think, is the best soultion (as was written above)

quote:
Only critical functions must be code in ASM, in the end of the process of writting
an app.

Is this an unwritten rule? Of course, i can code whole app in asm and nothing can stop me



FlyFire/CodeX
http://codexorg.webjump.com

Technically, the 386 had native support for multiple processes in a sort of 8086 multiple virtual machine setup. Yes, 8086, the granddaddy of them all. I''ve never seen this taken advantage of it, as the OS had run specifically in that mode, and essentially no OS runs in that funky mode, but it''s in the 386 technical specs.

However, I believe the original idea in the parallel mode point was that the 486 was pipelined, which is true. The second pipeline only handled about a third of the instruction set, and only ran the instructions it could handle about 1 in a 3 times, but it was there. The 586 was simply pipelined much much better.

But really all the discussion of pipelined/parallel whatever is essentially pointless because the CPU still only executes a single thread of execution at one time. It will *not* execute a different thread in each pipeline. Even under the superscalar cores. In order to truly benefit from multi-threading you need multiple processors. Multiple threads on a single processor machine just simplifies bookkeeping for the programmer (and trashes the cache). For something like rendering the overlap in computation between the threads combined with the threading overhead make multi-threading impractical.

If your UNIX system is running each thread as a separate process, then you have an old and/or non-POSIX compliant version of UNIX. POSIX provides for multiple threads of execution within a single process. And last I checked linux was POSIX compliant.

quote:
Is this an unwritten rule? Of course, i can code whole app in asm and nothing can stop me

Except maybe rising aspirin costs.
Just another idea for using threads in games: Would it be possible to use 2 threads in place of a triple buffer? One thread could handle the game logic and the other could handle rendering... Once the rendering thread finished, it could use a simple semaphore to tell the logic thread to start again, and then call flip() That way you could start the next game loop while waiting for your vsync.

Is this doable? I haven''t done any tests yet, but it seems like it has some potential.

This topic is closed to new replies.

Advertisement