Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Jul 2001
Offline Last Active Yesterday, 07:33 AM

#5155265 Using the ARB_multi_draw_indirect command

Posted by Promit on 22 May 2014 - 01:53 PM

There's MultiDrawElements as well. You should go over the spec for the extension carefully:



#5155252 Using the ARB_multi_draw_indirect command

Posted by Promit on 22 May 2014 - 01:01 PM

First, read the NVIDIA slides if you haven't already. multi_draw_indirect starts at slide 63.


Consider the signature of glDrawArrays:

void glDrawArrays(GLenum  mode, GLint  first, GLsizei  count);

This function takes a mode and two int parameters (basically). Instead of submitting that function, create a buffer:

{ [first | count],
[first | count],
[first | count],
[first | count],
[first | count] }

Now you can call MultiDrawArrays, passing appropriate pointers into this buffer, and a single call will submit five draws at once. Or you can call MultiDrawArraysIndirect, with a single pointer and a stride. Here's the trick: Indirect understands a buffer binding called DRAW_INDIRECT_BUFFER. Now you can upload that buffer above into GPU memory (a buffer object) and execute it from there. Why would you want to do that? You wouldn't. But this is the cleverest part: you can use GPU compute to generate the buffer without any copies. And here's another bit mentioned by the slides: there's an extension called shader_draw_parameters that adds a DrawID into the shader, telling you whether this is draw call 1/2/3/4/5. So you can use that value to select between, let's say, multiple modelview matrices passed into the shader. 


The tricky part is setting up all of your input data to leverage as much of this as possible. You need to share buffers and as many shader parameters as possible, and use DrawID cleverly.

#5155106 deferred shading question

Posted by Promit on 21 May 2014 - 12:23 PM

You may find this presentation useful: The Rendering Technology of Killzone 2

#5154949 why the alphablend is a better choice than alphatest to implement transparent...

Posted by Promit on 20 May 2014 - 07:27 PM

Alpha testing hasn't been supported in OpenGL ES since 2.0 anyway; you're required to implement it manually using the discard operation. Discard will essentially force that draw call to slow path, just like alpha blend. In those cases, no hidden surface removal is available. You may want to review the Smedberg recommendations about how to handle the various cases.

#5154948 What OpenGL book do you recommend for experts?

Posted by Promit on 20 May 2014 - 07:23 PM

I found Insights to be supremely useful -- you probably already know how closely Riccio follows the state of the industry with regards to OpenGL. Of course the big problem is that any book with too many concrete recommendations is likely doomed, given the quicksand shifting of drivers, implementations, and specs in current day GL. There's too much space for stale information.

#5154658 Prevent Losing Entire Project To Malware

Posted by Promit on 19 May 2014 - 11:28 AM

Run an up-to-date OS (Windows XP, Vista, 7 and 8.0 do not count).
I'm sorry but 7 absolutely DOES count (assuming it is service packed). As long as it's not an end of life product, Microsoft is continuing to issue security patches. There's nothing about 8.1 that improves security over 7 when both systems are properly maintained.


First of all: any file you don't want to lose should be able to survive the total physical destruction of any given computer you own. Ideally all of them, and your house. Personally I like using a combination of externally hosted cloud backup services, internal backup, and good old external source control. Second: you need to figure out how and why you're getting virused, because that's a problem in itself.

#5153838 Cases for multithreading OpenGL code?

Posted by Promit on 15 May 2014 - 03:20 PM

The long and short of it is that multiple contexts and context switching are so horrifically broken on the driver side, across all vendors for all platforms, that there is nothing to gain and everything to lose in going down this road. If you want to go down the multithreaded GL road in any productive way whatsoever, it'll be through persistent mapped buffers and indirect draws off queued up buffers. More info here: http://www.slideshare.net/CassEveritt/beyond-porting

#5150461 Go for video game programming?

Posted by Promit on 29 April 2014 - 07:27 PM

The problem with GC in games isn't really performance, particularly for the indie crowd who are using those types of languages. At least, not directly. GC is plenty fast. What's lacking is control. The GC based languages are horrified that the application might want to exercise any kind of control or hinting about what to GC and when. I don't want random allocations to block and trigger GC. I want to be able to dispatch a bunch of GPU calls, then tell the runtime "hey, you've got 3ms to do as much incremental GC as you can". I want to be able to control the balance of GC time and convergence, and force full GC when required. I'd love to be able to tag objects explicitly with lifetime hints.


And frankly, I need to be able to rely on a consistent implementation underneath with known characteristics. I get why the language designers don't want this, but it's a huge practical problem for games. We NEED to be able to exercise real-time guarantees.

#5148643 HLSL float4x4 vs float3x3

Posted by Promit on 21 April 2014 - 08:59 PM

I haven't yet tried any of this because I want to get as much info as possible so I can "do it right". If I used your suggestion, how would I change the float3 into some type of rotation matrix? Would it be a float3x3 with [0,0] for x axis, [1,1] for y axis, and [2,2] for z axis? I don't actually know how to set it up.....


And the question remains, would I be sacrificing speed for a smaller size?

Look up quaternion <-> matrix conversions, or how to set up transformations directly as quaternions. It's easier to start with float4 and then apply the float3 optimization later.


As for speed, it depends on your bottleneck. If you've got ALU to spare but bandwidth or interpolators are a problem, you win from the conversion. This is probably the case for instancing.

#5147785 10-bit Monitors

Posted by Promit on 17 April 2014 - 08:04 PM

First, only Quadro and FirePro GPUs support 10 bit output. NOT GeForce or Radeon. Second, DisplayPort is typically required for 10 bit support, NOT DVI/HDMI. Third, D3D 9 does NOT support 10 bit. You have to use OpenGL or D3D 10+; 9Ex probably supports it. You'd probably access it by creating a D3DFMT_A2R10G10B10 or D3DFMT_A2B10G10R10 format device. Also, 10 bit will shut off Windows compositing/Aero. NVIDIA might have a workaround, not sure. Haven't tested.


This document will give you the rundown for NVIDIA/OpenGL: http://www.nvidia.com/docs/IO/40049/TB-04701-001_v02_new.pdf

This document covers AMD/OpenGL: http://www.amd.com/Documents/10-Bit.pdf

#5146975 My sockets are as fast as pipes..

Posted by Promit on 14 April 2014 - 02:26 PM

Actually TCP/IP is sometimes faster than pipes on Windows, and you get the advantages of fairly portable code and no special case work too. 

#5146593 HSL with smoother transitions?

Posted by Promit on 12 April 2014 - 05:27 PM

I did some research on this a while back. I'm just going to link-dump on you, hopefully you can piece it together.





Further reading on HCL in particular and perceptually uniform color spaces in general should provide some guidance

#5146556 512x384 15 Layers 30Million Double Precision Calculation Software Blitting wi...

Posted by Promit on 12 April 2014 - 12:44 PM

So here's how this is going to go. I'm going to lock every thread you make that claims anything about NDAs or licenses from the GPU manufacturers. If it happens enough times, where "enough" is indeterminate but almost certainly less than three, I'm going to ban you. 


In the meantime you might want to face reality: your code is written like shit and that's why it runs slow. If you actually take some time and learn to write graphics code properly, you won't have this problem.

#5146124 How to get multiple keys input at once in c++

Posted by Promit on 10 April 2014 - 06:35 PM

I just want to jump in here and mention, you should be aware some keyboards can NEVER transmit some keys together. This article has more information: http://www.microsoft.com/appliedsciences/antighostingexplained.mspx

Just something to be aware of in general.

#5144865 Preparing for Mantle

Posted by Promit on 06 April 2014 - 07:30 PM

Sounds like a pain in the ass, to me.