Jump to content

  • Log In with Google      Sign In   
  • Create Account

Not dead...

On APIs.

Posted by , in NV, OpenGL, OpenCL, DX11, AMD 23 March 2011 - - - - - - · 873 views

Right now 3D APIs are a little... depressing... on the desk top.

While I still think D3D11 is technically the best API we have on Windows the fact that AMD and NV currently haven't implimented multi-threaded rendering in a manner which helps performance is annoying. I've heard that there are good technical reasons why this is a pain to do, I've also heard that right now AMD have basically sacked it off in favour of focusing on the Fusion products. NV are a bit further along but in order to make use of it you effectively give up a core as the driver creates a thread which does the processing.

At this point my gaze turned to OpenGL, and with OpenGL4.x while the problems with the API are still there in the bind-to-edit model which is showing no signs of dying feature wise it is to a large degree caught up. Right now however there are a few things I can't see a way of doing from GL, but if anyone knows differently please let me know...

  • Thread-free resource creation. The D3D device is thread safe in that you can call its resource recreation routines from any thread. As far as I know GL still needs to use a context which must be bound to the 'current' thread to create resources.
  • Running a pixel shader at 'sample' frequency instead of pixel frequency. So, in an MSAA x4 render target we would run 4 times per pixel
  • The ability to write to a structured memory buffer in the pixel shader. I admit I've not looked too closely at this but a quick look at the latest extension for pixel/fragment shaders doesn't give any clues this can be done.
  • Conservative depth output. In D3D a shader can be tagged in such a way that it'll never output depth greater than the fragment was already at, which will conserve early-z rejection and allow you to write out depth info different to that of the primative being draw.
  • Forcing early-z to run; when combined with the UAV writing above this allows things like calculating both colour and 'other' information per-fragment and only have both written if early-z passes. Otherwise UAV data is written when colour isn't.
  • Append/consume structured Buffers; I've not spotted anything like this anyway. I know we are verging into compute here which is OpenCL but Pixel Shaders can use them

There are probably a few others which I've missed, however these spring to mind and, many of them, I want to use.

OpenGL also still has the 'extension' burden around it's neck with GLee out of date and GLEW just not looking that friendly (I took a look at both this weekend gone). In a way I'd like to use OpenGL because it works nicely with OpenCL and in some ways the OpenCL compute programming model is nicer than the Compute model but with apprently API/hardware features missing this isn't really workable.

In recent weeks there has been talk of ISVs wanting the 'API to go away' because (among other things) it costs so much to make a draw call on the PC vs Consoles; while I somewhat agree with the desire to free things up and get at the hardware more one of the reasons put forward for this added 'freedom' was to stop games looking the same, however in a world without APIs where you are targetting a constantly moving set of goal posts you'll see more companies either drop the PC as a platform or license an engine to do all that for them.

While people talk about 'to the metal' programming being a good idea because of how well it works on the consoles they seem to forget it often takes half a console life cycle for this stuff to become used/common place and that is targetting fixed hardware. In the PC space things change too fast for this sort of thing; AMD themselves in one cycle would have invalidated alot of work by going from VLIW5 to VLIW4 between the HD5 and HD6 series, never mind the underlaying changes to the hardware itself. Add into this the fact that 'to the metal' would likely lag hardware releases and you don't have a compelling reason to go that route, unless all the IHVs decide to go with the same TTM "API" at which point things will get.. intresting (see; OpenGL for an example of what happens when IHVs try to get along.).

So, unless NV and AMD want to slow down hardware development so things stay stable for multiple years I don't see this as viable at all.

The thing is SOMETHING needs to be done when it comes to the widening 'draw call gap' between consoles and PCs. Right now 5 year old hardware can out perform a cutting edge system when it comes to CPU cost of draw calls; fast forward 3 year to the next generation of console hardware which is likely to have even more cores than now (12 min. I'd guess), faster ram and DX11+ class GPUs as standard. Unless something goes VERY wrong then this hardware will likely allow trivial application of command list/multi-threaded rendering further openning the gap between the PC and consoles.

Right now PCs are good 'halo' products as they allow devs to push up the graphics quality settings and just soak up the fact we are being CPU limited on graphics submissions due to out of order processors, large caches and higher clock speeds. But clock speeds have hit a wall and when the next generation of consoles drops they will match single threaded clock speed and graphics hardware... suddenly the pain of developing on a PC, with its flexible hardware, starts to look less and less attractive.

For years people have been saying about the 'death of PC gaming' and the next generation of hardware could well cause, if not that, then the reduction of the PC to MMO, RTS, TBS and 'facebook' games while all the large AAA games move off to the consoles where development is easier, rewards are greater and things can be pushed futher.

We don't need the API to 'go away' but it needs to become thinner, both on the client AND the driver side. MS and the IHVs need to work together to make this a reality because if not they will all start to suffer in the PC space. Of course, with the 'rise in mobile' they might not even consider this an issue..

So, all in all the state is depressing.. too much overhead, missing features and in some way doomed in the near future...

On HD5870 and Memory

Posted by , in OpenCL, DX11, AMD 15 January 2011 - - - - - - · 319 views
DX11, OpenCL, AMD, HD5870
While gearing up to work on parser/AST generator as mentioned in my previous entry I decided to watch a couple of Webinars from AMD talking about OpenCL (because while I'm DX11 focused I do like OpenCL as a concept); the first of which was talking about the HD5870 design.

One of the more intresting things to come out of it was some details on the 'global data store' (GDS), wihch while only given an overview had an intresting nugget of information in it which would have been easy to skip over.

While not directly exposed in DXCompute or OpenCL the GDS does come into play with DXCompute's 'appendbuffers' (and whatever the OpenCL version of this same construct is) as that is where the data is written to thus allowing the GPU to accelerate the process.

What this means in real terms is that if you compute shader needs to store memory which everyone in dispatch needs to get to for some reason then you could use these append buffers with only a small hit (25 cycles on the HD5 series) as long as the data will fit into the memory block. Granted, you would still need to place barriers into your shader/OpenCL code to ensure that everyone is done writing but it might allow for faster data sharing in some situations.

I don't know if NV does anything simular however, maybe I'll check that out later..

Right, back to the Webinars...

Recent Entries

Recent Comments