Jump to content

  • Log In with Google      Sign In   
  • Create Account

Not dead...

6 months later....

Posted by , 13 December 2011 - - - - - - · 715 views

So, I'm not dead still ;) however the last 6 months have been largely lacking in much of anything in my own projects.

What has gone down however is a slight change of work status; after OFP:RR wrapped up I got moved to another project to help that start up, which I was happy with as it was something new however as it was some way from starting I ended up working on other things. Then shortly after my last entry I had two weeks off and upon my return found out things had changed again, heh

So, as of July this year I've been working for Codemaster's Central Tech team in the rendering team for the new engine that is being developed and will, in time we hope, power all Codemaster's future games... which is, ya know, pretty cool :) Small team but once I got settled in it's been good.

Between settling in with a new team, time off due to TOIL from OFP:RR and a rash of decent games being released (or just revisiting some old ones) my weekends have been somewhat devoid of progress.

Until recently... dun dun duuuuuun!

A few weeks back I got an itch to do some coding of the graphical type; despite my knowledge I feel like I'm behind the curve a bit these days with techniques so I want to setup my own DX11 renderer/engine to play with things.

I decided, however, that first things first I need something to render. Cubes are all well and good but to do any serious graphical work you need something big... something decent... I settled on Sponza.

For those who don't know the Sponza Atrium is a very extensively used GI test scene as with it's arches, ceilings and open roof it makes a good test location for testing out and visualising various GI solutions. A few years back Crytech released an 'updated' version with a higher poly count and some banners in the scene to make it even more testing.

The scene is avalible in 3DS Max and Obj format however I decided that I didn't want to parse Obj by hand and decided that I'd export the scene using 3DS Max to FBX and then use the FBX SDK with some of my own code to dump out the information in a slightly more 'engine friendly' format aka a dump of the Vertex Buffer and Index buffer along with material files to describe properties/textures.

The FBX SDK is pretty easy to work with and it didn't take too long to get something hanging together which compiled so I figured "time to test the model...".... and thus began some hell.

Firstly, the 3DS Model uses some crazy CryTech plugin for Max, the net result being even once you get the plugin and delete the large banner they haven't provided a texture for you are still stuck with a bunch of broken materials and, even once you've fixed them, the FBX exporter doesn't even attempt to export the CryTech shader based materials in any way so while you'll get an FBX file it is devoid of textures or any other useful data.

The Obj version suffers the missing texture problem too, however after fixing those up as best I could and deleting the offending geometry which has no textures at all the scene did sanely export at last. I had to do some copy and paste work from the max version into the OBJ version to get the lights and camera positions into the FBX file however the net result, after a couple of weekends work, is a directory filed with per object vb/ib/meta files and 'material' files :)

Unfortunately this weekend I'm heading back to my home town for Xmas, which means no access to a decent computer for two weeks, so any further work is going to have to wait until the new year at which point I'm going to set about getting a basic flat shaded, no textures and no lighting version of the scene loaded and rendering. After that I need to adapt my exporter to also dump out camera, lights and maybe positional information for the various objects but we'll see how that goes.

My aim, by the end of Jan 2012 is to have the scene rendering, lit, optionaly with shadows even if basic shadow maps, and textured using the cameras and lights provided. I shouldn't be a big ask.

Finally, tomorrow Dec 15th marks my 10th year as a member of this site; 10 years ago I signed up as a 21 year old just having failed out of uni with small amount of OpenGL knowledge picked up in the previous years. 10 years later I've been published both in a book and on this site, got my degree, working for my 2nd company in the industry and now part of a core team working on a AAA engine. Intresting how things go..

I've also drunk a lot, fallen over a lot, danced a lot, got beaten up once, broke a table in a club trying to kick-jump off it (in front of the owner, without getting thrown out :D), got accidently drunk at lunch in college and scared a student teach away from the profession and generally had 10 years of amusing times including some GD.Net London Pie and Pint meet ups.

Lets hope for another good 10 years... and if they are more intresting than the last 10 I'll be cool with that too :D


To The Metal.

Posted by , in low level, scripting 12 June 2011 - - - - - - · 381 views

I'm trying to recall when I first got into programming, it probably would have been in the window of 5 to 7, we had a BBC Micro at home thanks to my dad's own love of technology and I vaguely recall writing silly BASIC programs at that age; certainly my mum has told me stories about how at 5 I was loading up games via the old tape drive something she couldn't do *chuckles* but it was probably around 11 when I really got into it after meeting someone at high school who knew basic on the Z80 and from there my journey really began.

(In the next 10 years I would go on to exceed both my dad and my friend when it came to code writing ability, so much so that the control program for my friend's BSc in electronics was written by me in the space of a few hours when he couldn't do it in the space of a few months ;) )

One of the best times I had however during this time was when I got an Atari STe and moved from using STOS Basic to 68K Assembly so that I could write extensions for STOS to speed operations up. I still haven't had a coding moment which has given me as much joy as the time I spent 2 days crowbaring a 50Khz mod replay routine into an extension, I lack words for the joy which came when that finally came to life and played back the song without crashing :D You think you've got it hard these days with compile-link-run debug cycles? Try doing a one of them on a 8Mhz computer with only floppy drives to load from and only being able to run one program at a time ;)

The point to all this rambling is that one of the things I miss on modern systems with our great big optimising compilers is the 'to the metal' development when was the norm back then; and I enjoyed working at that level.

In my last entry I was kicking around the idea of making a new scripting language which would natively know about certain threading constraints (only read remote, read/write private local) and I was planning to backend it onto LLVM for speed reasons.

During the course of my research into this idea I came across a newsgroup posting by the guy behind LuaJIT talking about why LuaJIT is so fast. The long and the short of it is this;

A modern C or C++ compiler suite is a mass of competting heuristics which are tuned towards the 'common' case and for that purpose generally work 'well enough' for most code. However an interpreter ISN'T most code, it has very perticular code which the compiler doesn't deal well with.

LuaJIT gets its speed from the main loop being hand written in assembler which allows the code to do clever things that a C or C++ compiler wouldn't be able to do (such as decide what variables are important enough to keep in registers even if the code logic says otherwise).

And that's when a little light went on in my head and I thought; hey.. you know what, that sounds like fun! A low level, to the metal, style of programming which some decent reason for doing it (aka the compier sucks at making this sort of code fast).

At some point in the plan I decided that I was going to do x64 support only. The reasons for this are two fold;

1) It makes the code easier to do. You can make assumptions about instructions and you don't have to deal with any crazy calling conventions as the x64 calling convention is set and pretty sane all things considered.

2) x86 is a slowly dying breed and frankly I have no desire to support it and contort what could be some nice code into some horrible mess to get around the lack of registers and crazy calling conventions.

I've spent the better part of today going over alot of x86/x64 stuff and I now know more about x86/x64 instruction decoding and x64 function calling/call stack setup then any sane person should... however it's been an intresting day :)

In fact x64 has some nice features which would aid the speed of the development such as callers seting up the stack for callees (so tail calls become easy to do) and passing things around in registers instead of via the stack. Granted, while inside the VM I can always do things 'my way' to keep values around as needed but it's worth considering the x64 convention to make interop that much easier.

The aim is to get a fully functional language out of this, once which can interop with C and C++ (calling non-virtual member functions might be the limit here) functions and have some 'safe' threading functionality as outlined in the previous entry.

Granted, having not written a single line of code yet that is some way off to say the least :D So, for now, my first aim is to get a decoding loop running which can execute the 4 'core' functions of any language;

- move
- add
- compare
- jump

After that I'll see about adding more functionality in; the key thing here is designing the ISA in such as way that extending it won't horribly mess up decode and dispatch times.

Oh, and for added fun as the MSVC x64 compiler doesn't allow inline assembly large parts are going to be fully hand coded... I like this idea ^_^


Kicking about an idea...

Posted by , in scripting 26 May 2011 - - - - - - · 285 views

Scripting languages, such as Lua and Python, are great.

They allow you to bind with your game and quickly work on ideas without the recompile-link step as you would with something like C++ in the mix.

However in a highly parrallel world those languages start to look lacking as they often have a 'global' state which makes it hard to write code which can execute across multiple threads in the langauge in question (I'm aware of stackless python, and I admit I've not closely looked at it), certainly when data is being updated.

This got me thinking, going forward a likely common pattern in games to avoid locks is to have a 'private' and 'public' state of objects which allows loops which look like this;

[update] -> [sync] -> [render]

or even

[update] -> [render] -> [sync]

Either way that 'sync' step can be used, in a parallel manner, to move 'private' state to be publical visable so that during the 'update' phase other objects can query and work with it.

Of course to do this effectively you'd have to store two variables, one for private and one for public state, and deal with moving it around which is something you don't really want to be doing.

This got me thinking, about about if you could 'tag' elements as 'syncable' in some way and have the scripting back end take care of the business of state copying and, more importantly, context when those variables were active. Then, when you ran your code the runtime would figure out, based on context, which copy of the state it had to access for data.

There would still need to be a 'sync' step called in order to have the run time copy the private data to the public side, which would have to be user callable as it would be hard for the runtime to know when it was 'safe' to do so but it would remove alot of the problem as you would only declare your variables once and your functions once and the back end would figure it out. (You could even use a system like Lua's table keys where you can make them 'weak' by setting a meta value on them so values could be added to structures at runtime). The sync step could also use a copy-on-write setup so that if you don't change a value then it doesn't try to sync it.


It needs some work, as ideas go, to make it viable but I thought I'd throw the rough idea out for some feedback, see if anyone has any thoughts on it all.


On APIs.

Posted by , in NV, OpenGL, OpenCL, DX11, AMD 23 March 2011 - - - - - - · 903 views

Right now 3D APIs are a little... depressing... on the desk top.

While I still think D3D11 is technically the best API we have on Windows the fact that AMD and NV currently haven't implimented multi-threaded rendering in a manner which helps performance is annoying. I've heard that there are good technical reasons why this is a pain to do, I've also heard that right now AMD have basically sacked it off in favour of focusing on the Fusion products. NV are a bit further along but in order to make use of it you effectively give up a core as the driver creates a thread which does the processing.

At this point my gaze turned to OpenGL, and with OpenGL4.x while the problems with the API are still there in the bind-to-edit model which is showing no signs of dying feature wise it is to a large degree caught up. Right now however there are a few things I can't see a way of doing from GL, but if anyone knows differently please let me know...


  • Thread-free resource creation. The D3D device is thread safe in that you can call its resource recreation routines from any thread. As far as I know GL still needs to use a context which must be bound to the 'current' thread to create resources.
  • Running a pixel shader at 'sample' frequency instead of pixel frequency. So, in an MSAA x4 render target we would run 4 times per pixel
  • The ability to write to a structured memory buffer in the pixel shader. I admit I've not looked too closely at this but a quick look at the latest extension for pixel/fragment shaders doesn't give any clues this can be done.
  • Conservative depth output. In D3D a shader can be tagged in such a way that it'll never output depth greater than the fragment was already at, which will conserve early-z rejection and allow you to write out depth info different to that of the primative being draw.
  • Forcing early-z to run; when combined with the UAV writing above this allows things like calculating both colour and 'other' information per-fragment and only have both written if early-z passes. Otherwise UAV data is written when colour isn't.
  • Append/consume structured Buffers; I've not spotted anything like this anyway. I know we are verging into compute here which is OpenCL but Pixel Shaders can use them

There are probably a few others which I've missed, however these spring to mind and, many of them, I want to use.

OpenGL also still has the 'extension' burden around it's neck with GLee out of date and GLEW just not looking that friendly (I took a look at both this weekend gone). In a way I'd like to use OpenGL because it works nicely with OpenCL and in some ways the OpenCL compute programming model is nicer than the Compute model but with apprently API/hardware features missing this isn't really workable.

In recent weeks there has been talk of ISVs wanting the 'API to go away' because (among other things) it costs so much to make a draw call on the PC vs Consoles; while I somewhat agree with the desire to free things up and get at the hardware more one of the reasons put forward for this added 'freedom' was to stop games looking the same, however in a world without APIs where you are targetting a constantly moving set of goal posts you'll see more companies either drop the PC as a platform or license an engine to do all that for them.

While people talk about 'to the metal' programming being a good idea because of how well it works on the consoles they seem to forget it often takes half a console life cycle for this stuff to become used/common place and that is targetting fixed hardware. In the PC space things change too fast for this sort of thing; AMD themselves in one cycle would have invalidated alot of work by going from VLIW5 to VLIW4 between the HD5 and HD6 series, never mind the underlaying changes to the hardware itself. Add into this the fact that 'to the metal' would likely lag hardware releases and you don't have a compelling reason to go that route, unless all the IHVs decide to go with the same TTM "API" at which point things will get.. intresting (see; OpenGL for an example of what happens when IHVs try to get along.).

So, unless NV and AMD want to slow down hardware development so things stay stable for multiple years I don't see this as viable at all.

The thing is SOMETHING needs to be done when it comes to the widening 'draw call gap' between consoles and PCs. Right now 5 year old hardware can out perform a cutting edge system when it comes to CPU cost of draw calls; fast forward 3 year to the next generation of console hardware which is likely to have even more cores than now (12 min. I'd guess), faster ram and DX11+ class GPUs as standard. Unless something goes VERY wrong then this hardware will likely allow trivial application of command list/multi-threaded rendering further openning the gap between the PC and consoles.

Right now PCs are good 'halo' products as they allow devs to push up the graphics quality settings and just soak up the fact we are being CPU limited on graphics submissions due to out of order processors, large caches and higher clock speeds. But clock speeds have hit a wall and when the next generation of consoles drops they will match single threaded clock speed and graphics hardware... suddenly the pain of developing on a PC, with its flexible hardware, starts to look less and less attractive.

For years people have been saying about the 'death of PC gaming' and the next generation of hardware could well cause, if not that, then the reduction of the PC to MMO, RTS, TBS and 'facebook' games while all the large AAA games move off to the consoles where development is easier, rewards are greater and things can be pushed futher.

We don't need the API to 'go away' but it needs to become thinner, both on the client AND the driver side. MS and the IHVs need to work together to make this a reality because if not they will all start to suffer in the PC space. Of course, with the 'rise in mobile' they might not even consider this an issue..

So, all in all the state is depressing.. too much overhead, missing features and in some way doomed in the near future...


Basic Lua tokenising is go...

Posted by , in Lua 16 January 2011 - - - - - - · 368 views

Over a few weekends leading up until Xmas and the last couple since then I have been playing around with boost::spirit and taking a quick look at ANTLR in order to setup some code to parse Lua and generate an AST.

Spirit looked promising, the ability to pretty much dump the Lua BNF into it was nice right up until I ran into Left Recursion and ended up in stack overflow land. I then went on a hunt for an existing example but that failed to compile, used all manner of boost::fusion magic and was generally a pain to work with.

I had a look at ANTLR last weekend and while dumping out a C++ parser using their GUI tool was easy enough the C++ docs are... lacking.. it seems and I couldn't make any headway when it came to using it.

This afternoon I decided to bite the bullet and just start doing it 'by hand'. Fortunately the Lua BNF isn't that complicated with a low number of 'keywords' to deal with and a syntax which shouldn't be too hard to build into a sane AST from a token stream.

I'm not doing things completely by hand; the token extraction is being handled by boost::tokeniser with a custom written skipper which dumps space and semi-colons, keeps the rest of the punctuation required by Lua and, importantly, it aware of floating point/double numbers so that it can correctly spit out a dot as a token when it makes sense.

Currently it doesn't deal with/hasn't been tested with octal or escaped characters and comments would probably cause things to explode, however I'll deal with them in the skipper at some point.

Given the following Lua;

foo = 42; bar = 43.4; rage = {} rage:fu() rage.omg = "wtf?"

The following token stream is pushed out;

<foo (28)> <= (18)> <42 (30)>
<bar (28)> <= (18)> <43.4 (29)>
<rage (28)> <= (18)> <{ (20)> <} (21)>
<rage (28)> <: (26)> <fu (28)> <( (24)> <) (25)>
<rage (28)> <. (27)> <omg (28)> <= (18)> <"wtf?" (31)>

Where the number is the token id found

There is a slight issue right now, such as when given this code;

foo = 42; bar <= 43.4; rage = {} rage:fu() rage.omg = "wtf?"

The token stream created is;

<foo (28)> <= (18)> <42 (30)>
<bar (28)> << (26)> <= (18)> <43.4 (29)>
<rage (28)> <= (18)> <{ (20)> <} (21)>
<rage (28)> <: (26)> <fu (28)> <( (24)> <) (25)>
<rage (28)> <. (27)> <omg (28)> <= (18)> <"wtf?" (31)>

Notice that it create two tokens for the '<=' sequence; this will probably need to be solved in the skipper as well.

So, once that is solved the next step will be the AST generation.. fun times...








Recent Entries

Recent Comments