Sign in to follow this  
  • entries
    455
  • comments
    639
  • views
    421681

About this blog

...dreaming.

Entries in this blog

_the_phantom_

Repeated from http://imadiversion.co.uk/2016/12/08/c-17-and-memory-mapped-io/

In some ways this is a Part One of this subject, largely because my IO subsystem isn't in any way finished and I have literally just put something together so that I can get things loaded in to my test framework, but the basic idea I have here is one I'll probably base a few things on so it is worth quickly writing about.

Loading Data

As we know unless you are going fully hard coded procedural with your project at some point you are going to need to load things. It is a pretty fundamental operation but one with an array of solutions in the C++ world.

The two I have been toying with, trying to make up my mind between, have been Async IO (likely via IOCP on Windows) or Memory Mapped IO.

I've done some experiments in the past with the former, hooking up IOCP callbacks to a task system based around Intel's Threading Building Blocks[/url and it certainly works well but I'm not sure it is the right fit; while I'm interested in being able to stream things in an async manner other solutions could well exist for the async part of the problem when coupled with another IO solution.

Which brings us to memory mapped IO, which in some ways is the fundamental IO system for Windows, being built upon (and a part of) the virtual memory subsystem. While not async, and risking stalling a thread due to page faults, it does bring with it the useful ability to be able to open views in to an already open file, perfect for directly reading from an archive for example.

A Mapping we will go

Memory mapped IO on Windows is also pretty simple;

1. Open target file
2. Create a file mapping
3. Create a view in to that file mapping
4. Use the returned pointer

Then, when you are done, you unmap the view and close the two handles referencing the opened file mapping and file you want to use. (If you were doing archive-like access then you might not do the latter two steps until program end however.)

The code itself is pretty simple, certainly if we want to open a full file for mapping;


char * OpenFile(const std::wstring &filename) { HANDLE fileHandle = ::CreateFile(filename.c_str(), GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0); if (fileHandle == INVALID_HANDLE_VALUE) return nullptr; int fileSize = query_file_size(fileHandle); if (fileSize <= 0) return nullptr; HANDLE fileMappingHandle = ::CreateFileMapping(fileHandle, 0, PAGE_READONLY, 0, 0, 0); if (fileMappingHandle == INVALID_HANDLE_VALUE) return nullptr; char* data = static_cast(::MapViewOfFile(fileMappingHandle, FILE_MAP_READ, 0, 0, fileSize)); if (!data) return nullptr; return data; }It gets a bit more complicated if you want offsets and the like, however for our initial purposes it will do.


Enter C++

There is, of course, an obvious problem with the above; no clean up code and all you get back is a char * which doesn't really help; at best you can undo the mapping of the file but those handles are lost to the ether.

So what can we do?

One approach would be to wrap the data in an object and have it automatically clean up for us; so our function returns an instance of that class with the associated destructor and access functions.

class DirectFileHandle { public: DirectFileHandle(char * data, HANDLE file, HANDLE mapping) : data(data), file(file), mapping(mapping) {}; ~DirectFileHandle() { ::UnmapViewOfFile(data); ::CloseHandle(fileMappingHandle); ::CloseHandle(fileHandle); } char * getData() { return data; } // default copy and move functions // plus declarations for holding the data and two handle pointers }Not a complete class, but you get the idea I'm sure.


However that's a lot of work, plus the introduction of a type, in order to just track data and clean up.

Is there something easier we can do?

std::unique_ptr to the rescue!

As mentioned all we are really doing is holding a pointer and, when it dies, needing to clean up some state which we don't really need access to any more.

Fortunately in std::unique_ptr we have a class designed to do just that; clean up state when it goes out of scope. We can even provide it with a custom deletion function to do the clean up for us.

So what does our new type look like?
std::unique_ptr>;

As before the primary payload is the char* but we directly associate that with a clean up function which will be called when the unique_ptr goes out of scope.

From there it is a simple matter of changing our function's signature to return that type and update our final return statement with my new favourite C++ syntax;

return{ data, [=](char* handle) { ::UnmapViewOfFile(handle); ::CloseHandle(fileMappingHandle); ::CloseHandle(fileHandle); return; } };As with the Message Handler code from the previous entry we don't need to state the type here as the compiler already knows it.


The capture-by-copy default of the lambda ensures we have a copy of the handle objects come clean up time and the address of the data is supplied via the call back.

But what about those error cases? In those cases we change the returns to be return{nullptr, [](char*) { return; }}; effectively returning a null pointer.

The usage so far

A quick example of this in usage can be taken from my test program, which I'm using to test and build up functionality as I go;

int APIENTRY wWinMain(_In_ HINSTANCE hInstance, _In_opt_ HINSTANCE hPrevInstance, _In_ LPWSTR lpCmdLine, _In_ int nCmdShow) { sol::state luaState; luaState.open_libraries(sol::lib::base, sol::lib::package); luaState.create_named_table("Bonsai"); // table for all the Bonsai stuff Bonsai::Windowing::Initialise(luaState); using DirectFileHandle = Bonsai::File::DirectFile::DirectFileHandle; DirectFileHandle data = Bonsai::File::DirectFile::OpenFile(L"startup.lua"); luaState.script(data.get()); std::function updateFunc = luaState["Update"]; while (updateFunc()) { Sleep(0); } return 0; }A few custom libraries in there, however the key element is dead centre with the file open function and the usage of the returned value on the next line to feed the Lua script wrapper.


The Problem

There is, however, a slight issue with the interface; we have no idea of the size of data being returned.

Now, in this case it isn't a problem; one of the nice function of memory mapped files on Windows (at least) is that due to the security in the kernel memory pages returned by the OS to user space get zero initialised. In this case we can see that by catching things in a debugger and then looking at the memory pointed at by data.get() which is, as expected, the file content followed by a bunch of nulls filling the rest of the memory page.

Given that setup we are good when we are loading in string based data but what if we need something else or simply want the size?

At this point it is temping to throw in the towel and head back towards a class, but a simpler option does exist; our old friend std::pair which in this case will let us pair a size_t with the pointer to the data handler.

The Solution

So, first of all we need to perform some type changes;

using FileDataPointer = std::unique_ptr>; using DirectFileHandle = std::pair;What was our DirectFileHandle before now becomes FileDataPointer and DirectFileHandle is now a pair with the data we require. Right now I've decided to order it as size and pointer but it could just as easily be the reverse of that.


After that we need to make some changes to our function;

FILEIO_API DirectFileHandle OpenFile(const std::wstring &filename) { // as before if (fileHandle == INVALID_HANDLE_VALUE) return{ 0, FileDataPointer{ nullptr, [](char*) {return; } } };The function signature itself doesn't need to change thanks to our redefining of our alias, however the return types declared in the code do.


Previously we could just directly construct the std::unique_ptr and the compiler would just figure it out, however if we try that with the new type it seems to change deduction rules and we get errors;

return{ 0, {nullptr, [](char*) {return; } } }; // The above results in the error below from MSVC in VS17 error C2440: 'return': cannot convert from 'initializer list' to 'std::pair<::size_t,Bonsai::File::DirectFile::FileDataPointer>' note: No constructor could take the source type, or constructor overload resolution was ambiguousThe compiler has decided what we have supplied it with is an initialiser list and as such tries to find a constructor to convert, but as none existed it produced an error.
(I believe this is a legitimate problem and not a case of 'early compiler syndrome')


So, we have to supply the type of the std::unqiue_ptr in order to sort the types out. This change is repeated all down the function at the various return points, including the final one, where the main difference at that point is that we return the real file size and not the 0 place holder.

After that we need to make a change to the usage site as now we have a pair being returned and not a wrapped pointer to use;

auto data = Bonsai::File::DirectFile::OpenFile(L"startup.lua"); luaState.script(data.second.get());In this case nothing much changes but we now have the size around if we need it.


But we aren't quite done...

Now, if we had a full C++17 compiler to use we could make use of one final thing; structured bindings

Structured bindings give us syntax to unpack return values in to separate variables; we can do something like this already with std::tie but structured variables allow us to both declare the assign at the same time.

// C++14 way using FileDataPointer = Bonsai::File::DirectFile::FileDataPointer; // Declare the variables up front FileDataPointer ptr; size_t size; // Now unpack the return value std::tie(size, ptr) = Bonsai::File::DirectFile::OpenFile(L"startup.lua"); luaState.script(ptr.get()); // C++17 way // Declare and define at the same time auto [size, ptr] = Bonsai::File::DirectFile::OpenFile(L"startup.lua"); luaState.script(ptr.get());That, however, is for a future compiler update; for now we can stick to the std::pair method which at least allows us to switch to the C++17 syntax as and when the compilers can handle it.


Summing up

In a real setup you would check for nulls before using however this code demonstrates the principle nicely I feel.
(I also know it works as I made a slight error on the first run where my lambda captured by reference, meaning at callback time I got a nice crash during shutdown as the handle it was trying to reference was no longer valid.)

So there we have it, a simple C++17 based Memory Mapped File IO solution - I'll be building on this over time in order to build something a bit more complex, but as a proof of concept it works well.

---

Lua binding library : Sol2
std::tie]http://en.cppreference.com/w/cpp/utility/tuple/tie]std::tie

_the_phantom_

I'm basically really bad at working on my own projects, but with the recent release of Visual Studio 2017 RC and its improved C++17 support I figured it was time to crack on again...

To that end I've spent a bit of time today updating my own basic windowing library to use C++17 features. Some of the things have been simple transforms such as converting 'typedef' to 'using', others have been more OCD satisfying;

// Thisnamespace winprops{ enum winprops_enum { fullscreen = 0, windowed };}typedef winprops::winprops_enum WindowProperties;// becomes thisenum class WindowProperties{ fullscreen = 0, windowed};The biggest change however, and the one which makes me pretty happy, was in the core message handler which hasn't been really updated since I wrote it back in 2003 or so.


The old loop looked like this;

LRESULT CALLBACK WindowMessageRouter::MsgRouter(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam){ // attempt to retrieve internal Window handle WinHnd wnd = ::GetWindowLongPtr(hwnd, GWLP_USERDATA); WindowMap::iterator it = s_WindowMap->find(wnd); if (it != s_WindowMap->end()) { // First see if we have a user message handler for this message UserMessageHandler userhandler; WindowMessageData msgdata; bool hasHandler = false; switch (message) { case WM_CLOSE: hasHandler = it->second->GetUserMessageHandler(winmsgs::closemsg, userhandler); msgdata.msg = winmsgs::closemsg; break; case WM_DESTROY: hasHandler = it->second->GetUserMessageHandler(winmsgs::destorymsg, userhandler); msgdata.msg = winmsgs::destorymsg; break; case WM_SIZE: hasHandler = it->second->GetUserMessageHandler(winmsgs::sizemsg, userhandler); msgdata.msg = winmsgs::sizemsg; msgdata.param1 = LOWORD(lparam); // width msgdata.param2 = HIWORD(lparam); // height break; case WM_ACTIVATE: hasHandler = it->second->GetUserMessageHandler(winmsgs::activemsg, userhandler); msgdata.msg = winmsgs::activemsg; msgdata.param1 = !HIWORD(wparam) ? true : false; break; case WM_MOVE: hasHandler = it->second->GetUserMessageHandler(winmsgs::movemsg, userhandler); msgdata.msg = winmsgs::movemsg; msgdata.param1 = LOWORD(lparam); msgdata.param2 = HIWORD(lparam); break; default: break; } if (hasHandler) { if (userhandler(wnd, msgdata)) { return TRUE; } } MessageHandler handler; hasHandler = it->second->GetMessageHandler(message, handler); if (hasHandler) { return handler(*(it->second), wparam, lparam); } } else if (message == WM_NCCREATE) { // attempt to store internal Window handle wnd = (WinHnd)((LPCREATESTRUCT)lparam)->lpCreateParams; ::SetWindowLongPtr(hwnd, GWLP_USERDATA, wnd); return TRUE; } return DefWindowProc(hwnd, message, wparam, lparam);}The code is pretty simple;
- See if we know how to handle a window we've got a message for (previous setup)
- If so then go and look for a user handler and translate message data across
- If we have a handler then execute it
- If we didn't have user handler then try a system one


The final 'else if' section deals with newly created windows and setting up the map.

So this work, and works well, the pattern is pretty common in C++ code from back in the early-2000s but it is a bit... repeaty.

The problem comes from C++ support and general 'good practise' back in the day; but life moves on so lets make some changes.

The first problem is the query setup, the function for the 'do you have a handler' looked like this;

bool Window::GetMessageHandler(oswinmsg message, MessageHandler &handler){ MessageIterator it = messagemap.find(message); bool found = it != messagemap.end(); if(found) { handler = it->second; } return found;}Not hard;
- We check to see if we have a message handler
- If we do then we store it in the supplied reference
- Then we return if we found it or not


Not bad, but it is taking us 5 lines of code (7 if you include the braces) and if you think about it we should be able to test for the existence of the handler by querying the handler object itself rather than storing, in the calling function, what is going on. Also, the handler gets default constructed on the calling side, which might be a waste too.

So what can C++17 do to help us?
Enter std::optional.

std::optional lets us return an object which is either 'null' or contains an instance of the object of a given type; later we can tell to see if it is valid (via operator bool()) before tying to use it - doesn't that sound somewhat like what was described just now?

So, with a quick refactor the message handler lookup function becomes;

std::optional Window::GetMessageHandler(oswinmsg message){ MessageIterator it = messagemap.find(message); return it != messagemap.end() ? it->second : std::optional{};}Isn't that much better?
Instead of having to pass in a thing and then return effectively two things (via the ref and the bool return) we now return one thing which either contains the handler object or a null constructed object.
(I believe if I had written this as an 'if...else' statement that the return could simply have been {} for the 'else' path but the ternary operator messes that up somewhat, at least in the VS17 RC compiler anyway.)


So, with that transform in place our handling code can now change a bit too; the simple transform at this point would be to replace that 'bool' with a direct assign to the handler object;

UserMessageHandler userhandler;WindowMessageData msgdata;switch(message){case WM_CLOSE: userhandler = it->second->GetUserMessageHandler(winmsgs::closemsg); msgdata.msg = winmsgs::closemsg; break;// ... blah blah ..But we still have a default constructed object kicking about, not to mention the second data structure for the message data (ok, so it is basically 3 ints, but still...) - so can we change this?


The answer is yes, changes can be made with the introduction of a lambda and a pair :)

The pair is the easy one to explain; when you look at the message handling code what you get is an implied coupling between the message handler and the data which goes with it, a transformed version of the original message handler data. So, instead of having the two separate we can couple them properly;

// so this...UserMessageHandler userhandler;WindowMessageData msgdata;// becomes this...using UserMessageHandlerData = std::pair;OK, so how does that help us?
Well, on its on it doesn't really however this is where the lambda enters the equation; one of the things you can do with a lambda is declare it and execute at the same type, effectively becoming an anonymous initialisation function at local scope. It is something which, I admit, didn't occur to me until I watched [url=

]Jason Turner's talk Practical Performance Practices[url] from CppCon2016.


So, with that in mind how do we make the change?
Well, the (near) final code looks like this;

auto userMessageData = [window = it->second, message, wparam, lparam]() { WindowMessageData msgdata; switch (message) { case WM_CLOSE: msgdata.msg = winmsg::closemsg; return std::make_pair(window->GetUserMessageHandler(winmsg::closemsg), msgdata ); break; case WM_DESTROY: msgdata.msg = winmsg::destorymsg; return std::make_pair(window->GetUserMessageHandler(winmsg::destorymsg), msgdata); break; case WM_SIZE: msgdata.msg = winmsg::sizemsg; msgdata.param1 = LOWORD(lparam); // width msgdata.param2 = HIWORD(lparam); // height return std::make_pair(window->GetUserMessageHandler(winmsg::sizemsg), msgdata); break; // a couple of cases missing... default: break; } return std::make_pair(std::optional{}, msgdata); }(); if (userMessageData.first){ if (userMessageData.first.value()(wnd, userMessageData.second)) { return TRUE; }}So a bit of a change, the overall function this is in is now also a bit shorter.


Basically we define a lambda which return a pair as defined before, using std::make_pair to construct our pair to return - if we don't understand the message then we simply construct a pair with two 'null' constructed types and return that instead.
Note the end of the lambda where, after the closing brace you'll find a pair of parentheses which invokes the lambda there and then, assigning the values to 'userMessageData'.

After that we simply check the 'first' item in the pair and dispatch if needs be.
So we are done right?

Well, as noted this is 'nearly' the final solution it suffers from a couple of problems;
1) Lots and lots of repeating - we have make pair all over the place and we have to specify the types in the default return statement
2) We are still default constructing that WindowMessageData type and assign values after trivial transforms.
3) That ugly call syntax... ugh...

So lets fix that!

The first has a pretty easy fix; tell the lambda what it will return so the compiler can just sort that shit out for you;

auto userMessageData = [window = it->second, message, wparam, lparam]() -> std::pair, WindowMessageData>{ switch (message) { case WM_CLOSE: return{ window->GetUserMessageHandler(winmsg::closemsg), { winmsg::closemsg, 0, 0 } }; break; case WM_DESTROY: return{ window->GetUserMessageHandler(winmsg::destroymsg), { winmsg::destroymsg, 0, 0 } }; break; case WM_SIZE: return{ window->GetUserMessageHandler(winmsg::sizemsg), { winmsg::sizemsg, LOWORD(lparam), HIWORD(lparam) } }; break; case WM_ACTIVATE: return{ window->GetUserMessageHandler(winmsg::activemsg), { winmsg::activemsg, !HIWORD(wparam) ? true : false } }; break; case WM_MOVE: return{ window->GetUserMessageHandler(winmsg::movemsg), { winmsg::movemsg, LOWORD(lparam), HIWORD(lparam) } }; break; default: break; } return{ {}, {} };}();How much shorter is that?


So, as noted the first change happens at the top; we now tell the lambda what it will be returning - the compiler can now use that information to reason about the rest of the code.

Now, because we know the type and we are using C++17 we can kiss goodbye to std::make_pair; instead we use the brace construction syntax to directly create the pair, and the data for the second object, at the return point - because the compiler knows what to return it knows what to construct and return and that goes directly in to our userMessageData variable, which has the correct type.

One of the fun side effects of this is that last line of the lambda; return { {}, {} }
Once again, because the compiler knows the type we can just tell it 'construct me a pair of two default constructed objects - you know the types, don't bother me with the details'.

And just like that all our duplication goes away and we get a nice compact message handler.
Points 1 and 2 handled :)

So what about point 3?

In this case we can take advantage of Variadic templates, std::invoke and parameter packs to create an invoking function to wrap things away;


templatebool invokeOptional(T callable, Args&&... args){ return std::invoke(callable.value(), args...);}This simple wrapper just takes the optional type (it could probably do with some form of protection to make sure it is an optional which can be invoked), extracts the value and passes it down to std::invoke to do the calling.
The variadic templates and parameter pack allows us to pass any combination of parameters down and, as long as the type held by optional can be called with it, invoke the function as we need - this means one function for both the user and system call backs;

if (userMessageData.first){ if (invokeOptional(userMessageData.first, wnd, userMessageData.second)) { return TRUE; }}auto handler = it->second->GetMessageHandler(message);if (handler){ return invokeOptional(handler, (*(it->second)), wparam, lparam);}And there we have it, much refactoring later something more C++17 than C++03 :)


Hope this little process has been helpful, feedback via the comments if you've any idea on how to improve things or questions :)

Message router code in its final(?) form;

namespace Bonsai::Windowing // an underrated new feature...{ template bool invokeOptional(T callable, Args&&... args) { static_assert(std::is_convertible >::value); return std::invoke(callable.value(), args...); } WindowMap *WindowMessageRouter::s_WindowMap; WindowMessageRouter::WindowMessageRouter(WindowMap &windowmap) { s_WindowMap = &windowmap; } WindowMessageRouter::~WindowMessageRouter() { } bool WindowMessageRouter::Dispatch(void) { static MSG msg; int gmsg = 0; if (::PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) { ::TranslateMessage(&msg); ::DispatchMessage(&msg); } if (msg.message == WM_QUIT) return false; return true; } LRESULT CALLBACK WindowMessageRouter::MsgRouter(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) { // attempt to retrieve internal Window handle WinHnd wnd = ::GetWindowLongPtr(hwnd, GWLP_USERDATA); WindowMap::iterator it = s_WindowMap->find(wnd); if (it != s_WindowMap->end()) { // First see if we have a user message handler for this message auto userMessageData = [window = it->second, message, wparam, lparam]() -> std::pair, WindowMessageData> { switch (message) { case WM_CLOSE: return{ window->GetUserMessageHandler(winmsg::closemsg), { winmsg::closemsg, 0, 0 } }; break; case WM_DESTROY: return{ window->GetUserMessageHandler(winmsg::destroymsg), { winmsg::destroymsg, 0, 0 } }; break; case WM_SIZE: return{ window->GetUserMessageHandler(winmsg::sizemsg), { winmsg::sizemsg, LOWORD(lparam), HIWORD(lparam) } }; break; case WM_ACTIVATE: return{ window->GetUserMessageHandler(winmsg::activemsg), { winmsg::activemsg, !HIWORD(wparam) ? true : false } }; break; case WM_MOVE: return{ window->GetUserMessageHandler(winmsg::movemsg), { winmsg::movemsg, LOWORD(lparam), HIWORD(lparam) } }; break; default: break; } return{ {}, {} }; }(); if (userMessageData.first) { if (invokeOptional(userMessageData.first, wnd, userMessageData.second)) { return TRUE; } } auto handler = it->second->GetMessageHandler(message); if (handler) { return invokeOptional(handler, (*(it->second)), wparam, lparam); } } else if (message == WM_NCCREATE) { // attempt to store internal Window handle wnd = (WinHnd)((LPCREATESTRUCT)lparam)->lpCreateParams; ::SetWindowLongPtr(hwnd, GWLP_USERDATA, wnd); return TRUE; } return DefWindowProc(hwnd, message, wparam, lparam); }}
(Edit: small edit.. forgot to remove the 'WindowMessageData' type from the lambda function return statements... so now it is even shorter...)

_the_phantom_

Trends.

With the launch of SimCity I have noticed an interesting trend developing; the acceptance that the launch of any game will be a week of frustration and disconnections while the publisher sorts out the servers.

Note I said 'any game'.
Not a multi-player game.
Not an MMO.
Any game.

One fan of SimCity, when faced with the question of 'why cant I play what is largely a single player game because I can't connect to the servers?' responded by comparing it to an MMO launch and that 'this should be accepted'.

In fact I'm getting a sense of deja-vu as I sit here and write this as I'm sure I've called this subject out before?

SimCity might well have multi-player aspects which require a connection but when the game has the ability to mark a region as 'private' this implies you can play on your own which brings up the question of why do I need to be online to do this and why can't you play the new game at release, instead having to suffer a week of 'server issues' while the publisher waits for demand to drop off rather than deal with it directly.

This acceptance I find worrying because it is a slide towards a world where you install your shiny new single player game but instead of being able to play it you are forced to login to a server which will not have enough capacity to deal with the launch day demand because the publish didn't want to spend the cash to do so.

Note this is not an argument against 'online drm' - my acceptance of Steam pretty much gives me very little to stand on there. This is against the requirement to be connected to experience a product when the person you have brought it from clearly hasn't, and never had plans to, allow everyone to experience it one day one.

In this instance given the overly inflated prices of games on Origin this is pretty unacceptable.
(I'll refrain from a longer anti-Origin rant at this point however.)

But I guess while people will pay the money for a game which may or may not work on release (and more importantly KEEP paying) this is a trend unlikely to reverse.

The funny thing is I dare say a cross section of this crowd have also complained about the idea of the next consoles requiring an online connection...
_the_phantom_

"Valve Box"

Valve: "Windows 8 App store is bad!... btw you can now buy apps from our store!"

Valve: "Closed systems are bad! btw, here is our new closed system"

Gamers : "OMG YOU ARE SO RIGHT AND THIS ISNT AT ALL A CONTRADICTION HERE HAVE MY MONEY!"

*facepalm*
_the_phantom_

Conclusion 2.

The person who posted the first of the 'tutorials' we see on line these days has a lot to answer for.

While it is creating more "programmers" (and I use the phrase loosely) this reliance on tutorials with snippets of code and even video tutorials showing you everything is, imo, having a bad effect on the ability of those who follow them to problem solve.

Instead of learning to read docs, read books and figure out samples they instead require a step by step guide on the most trivial of things and then complain when such resources don't exist.

On the plus side as this army of vague competence marches forward at least there will always be better paid work for those of us who can think our way out of a paper bag instead of sitting at the bottom of it and crying because no one has made a video showing us how to get out.
_the_phantom_

Conclusion.

In all my years of programming, both professionally and as a hobby, there is one truth I've learnt over the years which stands above all overs.

Most people can't design software for shit.

This thought depresses me.
_the_phantom_
I'm slowly... oh so slowly... starting to crack.

MS have some blame to take here because they are apprently not communicating well enough but at the same time the latest Windows release is starting to bring out the Silly Season in a manner not seen since Windows Vista... in fact it's worse because it would seem people are not using their brains and its got to the point where I'm facepalming as I read twitter/blogs and... well.. I'm writing this at half-midnight on a sunday morning.

The first 'gem' which started to push me over the edge was the recent thing I saw where someone tweeted that 'windows 8 was a closed system'.

So, yes, there is this windows store and yes it will be the only way for end users to get at Metro apps but metro apps are not the only apps. I dunno, maybe it's just me but if the option to control where, more than likely the vast majority of the apps I'm going to install, comes from exists and they don't have to be signed and delivered from a single source I'm pretty sure that's not a closed system.

So, my Windows 8 machine could still run/install unsigned apps just like my Windows 7 machine currently does.
No change there.

(Minor side note: latest OSX release turned on 'app store or signed' only running of apps. Fortunately you can turn this off in the control panel but switching it on, silently, by default for all apps is pretty sneaky imo. And Vista users thought UAC was bad.)

The thing which really got to me however is the continued wailing about XNA which is going on and a blog post which tipped me over the edge.

Now, to be fair I think part of this can be put down to an MS employee not understanding a question correctly and thus giving a poor answer but the basics of it boil down to a developer asking 'will XNA work on Windows 8?' and being told 'no, never.'

Now, while I've not tried personally, I've heard that XNA based games are indeed working just fine on the RC version of Windows 8; which isn't surprising really considering XNA is a .Net library which wrap DX9 and Windows 8 supports .Net and thus the XNA runtime and if Windows 8 didn't support DX9 it would die a death anyway as no one would buy it because they couldn't play Half-Life 2 (and lets face it, it would give Gabe more reasons to cry about Win8).

What I think happened was that the MS employee heard 'XNA' and 'Windows 8' and assumed the asker was asking 'Will XNA work via Metro/WinRT?' which, of course, the answer is 'no' (which isn't really unexpected).

The net result; yet another blog post of uninformed opinions with no real basis in fact (and I'd like to say well done to a few commenters for trying to correct the amount of 'wrong' in that post) but, more importantly, the developer in question has swapped to using Unity for their game. Now, unless Unity at some point has a WinRT wrapper (and I believe they are trying to sort something out in that regard?) then Unity is working at the same level as XNA would have with regards to the OS, APIs etc.

*face-fucking-palm*

Of course this was an interview where the developer had no real idea about what XNA was, even refering to it as a language multiple times, so ya know I'm not assuming large technical competance but it just seems like the kind of thing you could figure out with a bit of logic ya know?

Which is where this wandering post is going to; an increasingly sad state where people jump on bandwagons and panic without bothing to research things themselves.

I've got no inside line at MS; I don't know anyone personally and I work for a living so I can't dedicate all my time to following tech; yet somehow I can figure out all this stuff but others can't?

A few months back Gabe of Valve fame declared Windows 8 a 'disaster' for gaming in what can only be described as 'scare tactic of the fucking decade' by anyone who takes a few seconds to look at the claim. Does Windows 8 control what software you can install? No. Does Windows 8 'hide' non-MS software? No. Hell do you think they would turn down a Metro based version of the Steam UI if Valve wanted to provide one? No.

It's not like Steam is a weak name either; practically every PC gamer going to going to know about Steam, even my mum has an idea of what it is thanks to my dad using it for games - I'd even go as far as to say 'Steam' as a brand is stronger than 'Windows' when it comes to gaming and the core audience they supply software to.

(Of course this all fell into place when a day or two later Valve announced they would be selling non-game apps via Steam - at which point the light bulb in my head got so bright it burnt out.)

Of course the arguement could be made that the MS store in Windows gives them an unfair advantage but my problem with that is - sure, but they have to get software to sell first; if developers don't put it on there then what advantage? And if they do its because they like the terms or are getting better terms so.. ya know.. compete?

It seems that the software industry is slowly, or not so slowly I guess, becoming a mire of conjecture, lies, sensationalism and down right misinformation. From PR people I could at least understand it but some of this stuff is coming from people who should be looking at the facts and not going around throwing out terms without any checking.

In a way its starting to become like mainstream politics; facts are out of the window and its down to making your opponent look bad rather than making yourself look good and having answers.

It depresses me and makes me think about just saying 'fuck it..', packing up and going to live in a cave somewhere.

(Oh, and I don't vote either...)
_the_phantom_

Week off.

I've had the last week off.

I had plans involving OpenCL.

Then Max Payne 3 happened.

Such a well executed game; the story was really good, the characters were realistic and engaging and there is nothing more satifying than doing a bullet time drive into a room and popping 3 guys in the head with 3 shots ;) The sound track rocks; it fits so well with the setting its crazy and the graphics on the PC version were pretty awesome.

The whole story mode is like playing a film in a way, the combination of game play, voice over, cut scenes and subtle interactive hints (such as Max saying "I knew I couldn't stay here long..." when the game thinks you should move forward) was really well done.

A few minor issues such as cut scenes jumping slightly from game play or the tendency for Max to sometimes forget the gun he was holding... that said it doesn't happen all the time so when he is done with a cut scene he'll pick up the submachine gun he put down on a shelf before walking off...

So far, game of the year for me, no contest.
(Sorry ME3)
_the_phantom_
There are two things I dislike about games and to an extent software development in general.

One of them are the gamers themselves; I could rant about this for ages but I'll refrain.

The other is the massive weight of resistance to anything new or not understood which seems intent on holding the whole state of the art back to What Is Known.

It annoys me because instead of trying to look for the good in new things people seem intent on trying to tear them down or pick holes in them without offering any constructive ways to improve; "waaaah it is different there fore it is bad and we like what we have!"

It's nothing new I admit; this refrain can be heard down the years probably echoing back to long before I was born a good 30 years ago.

Even professionally there is resistance; for a chunk of the last two months while I've enjoyed the freedom of working with C#, TPL and LINQ it has been against a back drop of resistance to things not understood; "what are these tasks things?" - "is this the best way of doing it" - "isn't this going to cause deadlocks?" and being told that by using LINQ I'm doing 'fancy stuff' when its a facility built into the language!

I appreciate that not eveyone has time to learn everything, that sometimes you have to look at a bit of technology and let it pass you by (I did myself with pretty much everything after .Net2.0 because I didn't have the time) but this doesn't account for the resistance - sometimes you just have to trust people who have had the time to look into something.

Maybe I'm just one of the few and the brave always willing to look forward to try the next thing to see if it will make life easier or not..

I'm not saying 'embrace everything as the answer to all our problems' but the constant resistance is nothing but a sad statement about what should be an industry where things are pushed foward - not sat in a corner playing with what is safe.

So I say this; pick a weekend, any weekend, but soon and try something new - a language. An IDE. A design method. An API. Anything...

Not every shiney new thing will be a diamond - but without searching we don't stand a chance of finding anything at all.
_the_phantom_
Not a great deal to report from the front; having spent a few hours on trains over the weekend I've managed to chew into the DLR book I have a fair chunk. It is slow going but progress is being made in my head which is nice.

Most of my time at work has been spent working on our new data build pipeline (I am a rendering coder, honest!) which has let me do something I've been wanting to do for some time and really stretch my wings a bit with C# and .Net4.

Over the last few weeks I've come to love the whole Task system in .Net 4 so much so that, after some initial resistance from those who didn't know about them in general, I've managed to convince the other guys working on it that using Tasks as the basis of the build system is the way forward. It isn't a strict task system; asset processing rules themselves get launched as 'long running' tasks which means they get a new thread, as to tasks which kick off an exe to do some processing however the ability to just throw work around and know that it'll get done and back to you does make coding quite relaxing.

It isn't completed yet however it is already leaps and bounds better than our old python build system (a pox on the GIL!) and has afforded me a nice chance to learn somethings.

In fact today, while converting some build rules from Python to C# I took the time to dive into LINQ too, which I've always liked the look of but never had a chance to try. It has made parsing XML files and pulling out data MUCH saner so yeah, pretty much in love with that too... so much so I want to revisit some already converted rules armed with this new knowledge :)

So, if you haven't got around to it already I would urge you to have a play with the Task system in .Net and the LINQ stuff; an afternoon of learning and playing around adds a couple of extra useful tools to your skill set.
_the_phantom_

Learnings.

So, my own rendering project at home has been stalled of late : between Mass Effect 3, F1 races, going back to my home town and being ill I've not had the time to work on it too much.

The weekend before last I wrote some code to setup a simple Win32 window so that I could crack on but beyond that things haven't progressed too far.

While I do intend to carry on with the project over time (because there are some things I want to try out) I've also shifted my attention to properly learning about the Dynamic Language Runtime (DLR) in .Net4.

My motivation behind this is down to a growing frustration at work with our material data system and the amount of work it takes to work with them.

Broadly speaking out materials are broken up into 3 parts;

  1. templates
  2. types
  3. materials

Templates are the lowest level; they define all the input data and configuration options that can be applied to a shader.
Types are a set of configurations for a shader and are used to control shader compiling.
Finally the 'materials' themselves are an instance of a type/template which holds the parameter settings and what textures should be applied to them.

As a system it works very well however there are still some issues with it mostly to do with redeclarations of information.

You see there is a 4th, slightly hidden, component to the above; a python script.
This python script is used to help process the information in a material into a 'game ready' state.

The build system uses it on a per-material basis to work out what operations should be done to textures to make them 'game ready' as well as how parameters should be packed together so that the game team can control what entries go where in each float4 block.

This flexibilityis nice however it brought with it some problems, mostly with textures but also with parameters.

With textures there is no longer a 1:1 mapping between what the artist attaches and what the shader reads. The pipeline could, for example, take two textures and combine them to produce a 3rd which is used in the game. It could split a texture into two. The most extreme example we could think of was that a texture could be processed, it's average lum. value worked out and this feedback into the material as a parameter.

The first problem however was the lack of 1:1 mapping and this was 'solved' in the script by having it remap the inputs to the correct outputs when it was a pass thru operation. This was done, everyone was happy and on we went.

The parameters came next and they had the same problem; as we could pack parameters together now there 1:1 mapping was once again removed so once again the python script performed the remap and pack operation and all was good.

Then another problem appeared; live link.
Via a connection to the game it is possible to change material parameters on the fly however with the broken 1:1 mapping we now needed a way to tell the editor how to pack data together to send it over the wire and, more importantly, which artist inputs trigger updates on what parameters in the shader.

And thus was born a 'live link' map which could be used to figure out what inputs affected what outputs, would let you run a python script (via IronPython) to condition the data if required and all was good once more.

Until the max shaders needed to be dealt with, at which point the lack of 1:1 mapping for the textures once again appeared. As the max shader could be heavier than the final in-game shaders there was no requirement to build the data for visualisation however it still needed to know what inputs mapped to what shader inputs and thus the 'max map' was born.

So now a template is defining

  • config options
  • artist inputs
  • material outputs
  • 'live link' mappings for parameters
  • max mappings for parameter and textures

    But at the same time the python was also defining the mapping between parameters and textures for input & outputs so already the information is duplicated and easy to get out of step.

    Finally the system isnt as easy to use as I had hoped. Yes, I did the python work and in my head I saw the idea of having centralised 'packing' functions for the scripts however this hasn't happened; instead the game team have been duplicating functions and copying and pasting things around.

    Between the duplication, the 'logic' in the XML, the scripting issues and the general slowing of the build pipeline I've finally snapped.

    Which brings us to the DLR.

    What I would like to produce, using the DLR, is a domain specific language which can replace all of the above.
    Our build system, tools and max plugin are all based on .Net4.0 now so the DLR is a key part of that so being able to load and run a 'script' which defines a template seems like a good fit.

    My hope is that, given some time, a language can be designed which allows you to declare the various inputs, outputs & transformations and that, by using a plugable backend, the same script can be used by the tools and build system to a different end.

    For example if you were to write:

    declare parameter "foo" float

    then the tools would know to create an input in the UI for a single float with the label/name foo while the build system would know to look for an input named 'foo' in the material file it was passing.

    Now, that syntax is by no means final but a more declarative syntax would be a better fit I feel and with some decent library support for data packing/transform built in (and expandable via .Net dlls) could make a viable replacement.

    Now, I only had the idea around 48 hours ago and I've only recently started working on learning the DLR however as projects go I think this would be worthwhile.
_the_phantom_

6 months later....

So, I'm not dead still ;) however the last 6 months have been largely lacking in much of anything in my own projects.

What has gone down however is a slight change of work status; after OFP:RR wrapped up I got moved to another project to help that start up, which I was happy with as it was something new however as it was some way from starting I ended up working on other things. Then shortly after my last entry I had two weeks off and upon my return found out things had changed again, heh

So, as of July this year I've been working for Codemaster's Central Tech team in the rendering team for the new engine that is being developed and will, in time we hope, power all Codemaster's future games... which is, ya know, pretty cool :) Small team but once I got settled in it's been good.

Between settling in with a new team, time off due to TOIL from OFP:RR and a rash of decent games being released (or just revisiting some old ones) my weekends have been somewhat devoid of progress.

Until recently... dun dun duuuuuun!

A few weeks back I got an itch to do some coding of the graphical type; despite my knowledge I feel like I'm behind the curve a bit these days with techniques so I want to setup my own DX11 renderer/engine to play with things.

I decided, however, that first things first I need something to render. Cubes are all well and good but to do any serious graphical work you need something big... something decent... I settled on Sponza.

For those who don't know the Sponza Atrium is a very extensively used GI test scene as with it's arches, ceilings and open roof it makes a good test location for testing out and visualising various GI solutions. A few years back Crytech released an 'updated' version with a higher poly count and some banners in the scene to make it even more testing.

The scene is avalible in 3DS Max and Obj format however I decided that I didn't want to parse Obj by hand and decided that I'd export the scene using 3DS Max to FBX and then use the FBX SDK with some of my own code to dump out the information in a slightly more 'engine friendly' format aka a dump of the Vertex Buffer and Index buffer along with material files to describe properties/textures.

The FBX SDK is pretty easy to work with and it didn't take too long to get something hanging together which compiled so I figured "time to test the model...".... and thus began some hell.

Firstly, the 3DS Model uses some crazy CryTech plugin for Max, the net result being even once you get the plugin and delete the large banner they haven't provided a texture for you are still stuck with a bunch of broken materials and, even once you've fixed them, the FBX exporter doesn't even attempt to export the CryTech shader based materials in any way so while you'll get an FBX file it is devoid of textures or any other useful data.

The Obj version suffers the missing texture problem too, however after fixing those up as best I could and deleting the offending geometry which has no textures at all the scene did sanely export at last. I had to do some copy and paste work from the max version into the OBJ version to get the lights and camera positions into the FBX file however the net result, after a couple of weekends work, is a directory filed with per object vb/ib/meta files and 'material' files :)

Unfortunately this weekend I'm heading back to my home town for Xmas, which means no access to a decent computer for two weeks, so any further work is going to have to wait until the new year at which point I'm going to set about getting a basic flat shaded, no textures and no lighting version of the scene loaded and rendering. After that I need to adapt my exporter to also dump out camera, lights and maybe positional information for the various objects but we'll see how that goes.

My aim, by the end of Jan 2012 is to have the scene rendering, lit, optionaly with shadows even if basic shadow maps, and textured using the cameras and lights provided. I shouldn't be a big ask.

Finally, tomorrow Dec 15th marks my 10th year as a member of this site; 10 years ago I signed up as a 21 year old just having failed out of uni with small amount of OpenGL knowledge picked up in the previous years. 10 years later I've been published both in a book and on this site, got my degree, working for my 2nd company in the industry and now part of a core team working on a AAA engine. Intresting how things go..

I've also drunk a lot, fallen over a lot, danced a lot, got beaten up once, broke a table in a club trying to kick-jump off it (in front of the owner, without getting thrown out :D), got accidently drunk at lunch in college and scared a student teach away from the profession and generally had 10 years of amusing times including some GD.Net London Pie and Pint meet ups.

Lets hope for another good 10 years... and if they are more intresting than the last 10 I'll be cool with that too :D
_the_phantom_

To The Metal.

I'm trying to recall when I first got into programming, it probably would have been in the window of 5 to 7, we had a BBC Micro at home thanks to my dad's own love of technology and I vaguely recall writing silly BASIC programs at that age; certainly my mum has told me stories about how at 5 I was loading up games via the old tape drive something she couldn't do *chuckles* but it was probably around 11 when I really got into it after meeting someone at high school who knew basic on the Z80 and from there my journey really began.

(In the next 10 years I would go on to exceed both my dad and my friend when it came to code writing ability, so much so that the control program for my friend's BSc in electronics was written by me in the space of a few hours when he couldn't do it in the space of a few months ;) )

One of the best times I had however during this time was when I got an Atari STe and moved from using STOS Basic to 68K Assembly so that I could write extensions for STOS to speed operations up. I still haven't had a coding moment which has given me as much joy as the time I spent 2 days crowbaring a 50Khz mod replay routine into an extension, I lack words for the joy which came when that finally came to life and played back the song without crashing :D You think you've got it hard these days with compile-link-run debug cycles? Try doing a one of them on a 8Mhz computer with only floppy drives to load from and only being able to run one program at a time ;)

The point to all this rambling is that one of the things I miss on modern systems with our great big optimising compilers is the 'to the metal' development when was the norm back then; and I enjoyed working at that level.

In my last entry I was kicking around the idea of making a new scripting language which would natively know about certain threading constraints (only read remote, read/write private local) and I was planning to backend it onto LLVM for speed reasons.

During the course of my research into this idea I came across a newsgroup posting by the guy behind LuaJIT talking about why LuaJIT is so fast. The long and the short of it is this;

A modern C or C++ compiler suite is a mass of competting heuristics which are tuned towards the 'common' case and for that purpose generally work 'well enough' for most code. However an interpreter ISN'T most code, it has very perticular code which the compiler doesn't deal well with.

LuaJIT gets its speed from the main loop being hand written in assembler which allows the code to do clever things that a C or C++ compiler wouldn't be able to do (such as decide what variables are important enough to keep in registers even if the code logic says otherwise).

And that's when a little light went on in my head and I thought; hey.. you know what, that sounds like fun! A low level, to the metal, style of programming which some decent reason for doing it (aka the compier sucks at making this sort of code fast).

At some point in the plan I decided that I was going to do x64 support only. The reasons for this are two fold;

1) It makes the code easier to do. You can make assumptions about instructions and you don't have to deal with any crazy calling conventions as the x64 calling convention is set and pretty sane all things considered.

2) x86 is a slowly dying breed and frankly I have no desire to support it and contort what could be some nice code into some horrible mess to get around the lack of registers and crazy calling conventions.

I've spent the better part of today going over alot of x86/x64 stuff and I now know more about x86/x64 instruction decoding and x64 function calling/call stack setup then any sane person should... however it's been an intresting day :)

In fact x64 has some nice features which would aid the speed of the development such as callers seting up the stack for callees (so tail calls become easy to do) and passing things around in registers instead of via the stack. Granted, while inside the VM I can always do things 'my way' to keep values around as needed but it's worth considering the x64 convention to make interop that much easier.

The aim is to get a fully functional language out of this, once which can interop with C and C++ (calling non-virtual member functions might be the limit here) functions and have some 'safe' threading functionality as outlined in the previous entry.

Granted, having not written a single line of code yet that is some way off to say the least :D So, for now, my first aim is to get a decoding loop running which can execute the 4 'core' functions of any language;

- move
- add
- compare
- jump

After that I'll see about adding more functionality in; the key thing here is designing the ISA in such as way that extending it won't horribly mess up decode and dispatch times.

Oh, and for added fun as the MSVC x64 compiler doesn't allow inline assembly large parts are going to be fully hand coded... I like this idea ^_^
_the_phantom_
Scripting languages, such as Lua and Python, are great.

They allow you to bind with your game and quickly work on ideas without the recompile-link step as you would with something like C++ in the mix.

However in a highly parrallel world those languages start to look lacking as they often have a 'global' state which makes it hard to write code which can execute across multiple threads in the langauge in question (I'm aware of stackless python, and I admit I've not closely looked at it), certainly when data is being updated.

This got me thinking, going forward a likely common pattern in games to avoid locks is to have a 'private' and 'public' state of objects which allows loops which look like this;

[update] -> [sync] -> [render]

or even

[update] -> [render] -> [sync]

Either way that 'sync' step can be used, in a parallel manner, to move 'private' state to be publical visable so that during the 'update' phase other objects can query and work with it.

Of course to do this effectively you'd have to store two variables, one for private and one for public state, and deal with moving it around which is something you don't really want to be doing.

This got me thinking, about about if you could 'tag' elements as 'syncable' in some way and have the scripting back end take care of the business of state copying and, more importantly, context when those variables were active. Then, when you ran your code the runtime would figure out, based on context, which copy of the state it had to access for data.

There would still need to be a 'sync' step called in order to have the run time copy the private data to the public side, which would have to be user callable as it would be hard for the runtime to know when it was 'safe' to do so but it would remove alot of the problem as you would only declare your variables once and your functions once and the back end would figure it out. (You could even use a system like Lua's table keys where you can make them 'weak' by setting a meta value on them so values could be added to structures at runtime). The sync step could also use a copy-on-write setup so that if you don't change a value then it doesn't try to sync it.


It needs some work, as ideas go, to make it viable but I thought I'd throw the rough idea out for some feedback, see if anyone has any thoughts on it all.
_the_phantom_

On APIs.

Right now 3D APIs are a little... depressing... on the desk top.

While I still think D3D11 is technically the best API we have on Windows the fact that AMD and NV currently haven't implimented multi-threaded rendering in a manner which helps performance is annoying. I've heard that there are good technical reasons why this is a pain to do, I've also heard that right now AMD have basically sacked it off in favour of focusing on the Fusion products. NV are a bit further along but in order to make use of it you effectively give up a core as the driver creates a thread which does the processing.

At this point my gaze turned to OpenGL, and with OpenGL4.x while the problems with the API are still there in the bind-to-edit model which is showing no signs of dying feature wise it is to a large degree caught up. Right now however there are a few things I can't see a way of doing from GL, but if anyone knows differently please let me know...


  • Thread-free resource creation. The D3D device is thread safe in that you can call its resource recreation routines from any thread. As far as I know GL still needs to use a context which must be bound to the 'current' thread to create resources.
  • Running a pixel shader at 'sample' frequency instead of pixel frequency. So, in an MSAA x4 render target we would run 4 times per pixel
  • The ability to write to a structured memory buffer in the pixel shader. I admit I've not looked too closely at this but a quick look at the latest extension for pixel/fragment shaders doesn't give any clues this can be done.
  • Conservative depth output. In D3D a shader can be tagged in such a way that it'll never output depth greater than the fragment was already at, which will conserve early-z rejection and allow you to write out depth info different to that of the primative being draw.
  • Forcing early-z to run; when combined with the UAV writing above this allows things like calculating both colour and 'other' information per-fragment and only have both written if early-z passes. Otherwise UAV data is written when colour isn't.
  • Append/consume structured Buffers; I've not spotted anything like this anyway. I know we are verging into compute here which is OpenCL but Pixel Shaders can use them

    There are probably a few others which I've missed, however these spring to mind and, many of them, I want to use.

    OpenGL also still has the 'extension' burden around it's neck with GLee out of date and GLEW just not looking that friendly (I took a look at both this weekend gone). In a way I'd like to use OpenGL because it works nicely with OpenCL and in some ways the OpenCL compute programming model is nicer than the Compute model but with apprently API/hardware features missing this isn't really workable.

    In recent weeks there has been talk of ISVs wanting the 'API to go away' because (among other things) it costs so much to make a draw call on the PC vs Consoles; while I somewhat agree with the desire to free things up and get at the hardware more one of the reasons put forward for this added 'freedom' was to stop games looking the same, however in a world without APIs where you are targetting a constantly moving set of goal posts you'll see more companies either drop the PC as a platform or license an engine to do all that for them.

    While people talk about 'to the metal' programming being a good idea because of how well it works on the consoles they seem to forget it often takes half a console life cycle for this stuff to become used/common place and that is targetting fixed hardware. In the PC space things change too fast for this sort of thing; AMD themselves in one cycle would have invalidated alot of work by going from VLIW5 to VLIW4 between the HD5 and HD6 series, never mind the underlaying changes to the hardware itself. Add into this the fact that 'to the metal' would likely lag hardware releases and you don't have a compelling reason to go that route, unless all the IHVs decide to go with the same TTM "API" at which point things will get.. intresting (see; OpenGL for an example of what happens when IHVs try to get along.).

    So, unless NV and AMD want to slow down hardware development so things stay stable for multiple years I don't see this as viable at all.

    The thing is SOMETHING needs to be done when it comes to the widening 'draw call gap' between consoles and PCs. Right now 5 year old hardware can out perform a cutting edge system when it comes to CPU cost of draw calls; fast forward 3 year to the next generation of console hardware which is likely to have even more cores than now (12 min. I'd guess), faster ram and DX11+ class GPUs as standard. Unless something goes VERY wrong then this hardware will likely allow trivial application of command list/multi-threaded rendering further openning the gap between the PC and consoles.

    Right now PCs are good 'halo' products as they allow devs to push up the graphics quality settings and just soak up the fact we are being CPU limited on graphics submissions due to out of order processors, large caches and higher clock speeds. But clock speeds have hit a wall and when the next generation of consoles drops they will match single threaded clock speed and graphics hardware... suddenly the pain of developing on a PC, with its flexible hardware, starts to look less and less attractive.

    For years people have been saying about the 'death of PC gaming' and the next generation of hardware could well cause, if not that, then the reduction of the PC to MMO, RTS, TBS and 'facebook' games while all the large AAA games move off to the consoles where development is easier, rewards are greater and things can be pushed futher.

    We don't need the API to 'go away' but it needs to become thinner, both on the client AND the driver side. MS and the IHVs need to work together to make this a reality because if not they will all start to suffer in the PC space. Of course, with the 'rise in mobile' they might not even consider this an issue..

    So, all in all the state is depressing.. too much overhead, missing features and in some way doomed in the near future...
_the_phantom_
Over a few weekends leading up until Xmas and the last couple since then I have been playing around with boost::spirit and taking a quick look at ANTLR in order to setup some code to parse Lua and generate an AST.

Spirit looked promising, the ability to pretty much dump the Lua BNF into it was nice right up until I ran into Left Recursion and ended up in stack overflow land. I then went on a hunt for an existing example but that failed to compile, used all manner of boost::fusion magic and was generally a pain to work with.

I had a look at ANTLR last weekend and while dumping out a C++ parser using their GUI tool was easy enough the C++ docs are... lacking.. it seems and I couldn't make any headway when it came to using it.

This afternoon I decided to bite the bullet and just start doing it 'by hand'. Fortunately the Lua BNF isn't that complicated with a low number of 'keywords' to deal with and a syntax which shouldn't be too hard to build into a sane AST from a token stream.

I'm not doing things completely by hand; the token extraction is being handled by boost::tokeniser with a custom written skipper which dumps space and semi-colons, keeps the rest of the punctuation required by Lua and, importantly, it aware of floating point/double numbers so that it can correctly spit out a dot as a token when it makes sense.

Currently it doesn't deal with/hasn't been tested with octal or escaped characters and comments would probably cause things to explode, however I'll deal with them in the skipper at some point.

Given the following Lua;

foo = 42; bar = 43.4; rage = {} rage:fu() rage.omg = "wtf?"

The following token stream is pushed out;

<= (18)> <42 (30)>
<= (18)> <43.4 (29)>
<= (18)> <{ (20)> <} (21)>
<: (26)> <( (24)> <) (25)>
<. (27)> <= (18)> <"wtf?" (31)>

Where the number is the token id found

There is a slight issue right now, such as when given this code;

foo = 42; bar <= 43.4; rage = {} rage:fu() rage.omg = "wtf?"

The token stream created is;

<= (18)> <42 (30)>
<< (26)> <= (18)> <43.4 (29)>
<= (18)> <{ (20)> <} (21)>
<: (26)> <( (24)> <) (25)>
<. (27)> <= (18)> <"wtf?" (31)>

Notice that it create two tokens for the '<=' sequence; this will probably need to be solved in the skipper as well.

So, once that is solved the next step will be the AST generation.. fun times...

_the_phantom_
While gearing up to work on parser/AST generator as mentioned in my previous entry I decided to watch a couple of Webinars from AMD talking about OpenCL (because while I'm DX11 focused I do like OpenCL as a concept); the first of which was talking about the HD5870 design.

One of the more intresting things to come out of it was some details on the 'global data store' (GDS), wihch while only given an overview had an intresting nugget of information in it which would have been easy to skip over.

While not directly exposed in DXCompute or OpenCL the GDS does come into play with DXCompute's 'appendbuffers' (and whatever the OpenCL version of this same construct is) as that is where the data is written to thus allowing the GPU to accelerate the process.

What this means in real terms is that if you compute shader needs to store memory which everyone in dispatch needs to get to for some reason then you could use these append buffers with only a small hit (25 cycles on the HD5 series) as long as the data will fit into the memory block. Granted, you would still need to place barriers into your shader/OpenCL code to ensure that everyone is done writing but it might allow for faster data sharing in some situations.

I don't know if NV does anything simular however, maybe I'll check that out later..

Right, back to the Webinars...
_the_phantom_
Some time ago now I had an idea for a game which spawned from reading about a perticular graphics technique in GPU Gems 3.
I decided early on with the game idea that I wanted to target DX11 hardware and multi-core CPUs only; mostly because I was under no illusion about a fast release time and partly because I decided I wanted things to look nice thus a DX11 card was pretty much a must.

This then lead to my deciding to build the game, more or less from the ground up, by myself. This was never going to be an 'engine' thing, more a set of code required to make the game I could see in my head happen. After some playing around I decided that, in order to support scaling across cores, I would use Intel's TBB as my 'core' around which the game would be built. This is easy enough to do it's just a matter of getting into the 'task based' mindset and being mindful of your data flow in order to get the most from the cpu.

However, while large chunks of the game would be done in C++ (because as much as I like C# sometimes I just love to hack on C++) various pieces of logic could be 'soft coded' in script form with no problem. Well, apart from one; concurrency.

My scripting language of choice has been, for some time now, Lua; I like the lightness of the language, the speed of the language and the synxtax of the language. While I can work with Python and others something about Lua drags me back. The problem is Lua has some thread safety issues even when executing different contexts on different threads.

The first problem I came across was the function recursion count; basically Lua only lets you go so deep, but as the inc/dec on this wasn't protected.. well.. bang!

I "fixed" that by shifting the count out to the thread state from the global state, all seemed good... right up until the first GC cycle at which point... bang!

At this point I admitted defeat on the Lua front and took a look at some other languages however between them I couldn't find something I liked. GameMonkey script came close however no dice.

About that point I went mad and decided 'sod this, I'll do it myself...' and with that I've decided to build my own LLVM backend'd system using Lua-like syntax and features.

The aim is, once up and running, to extend it with a few game specific features (and probably a few stolen from other languages such as GMScript and Squirrel) and generally make it awesome.

No time scale is planned on this front, mostly because work is a bit mad right now but this is where I've ended up after a coming up with a game idea..
_the_phantom_
So, I'm really not dead... which at least doesn't make this journal title a lie [grin]

It's been quite a busy few months for me, getting used to work, getting used to living in more than one room and getting used to being 100% single again (which I shant bang on about).

Work wise things are going well, nicely entrenched in the rendering team at Codies and enjoying it even with the hard work. In fact compared to the last place I was at I've been more likely to stay late here which shows just how much I like it.

I've also managed to be the most unlucky guy on the team as my first commit of work took 3 weeks to get from my branch to main due to it constantly failing out on the publish system which tests the builds. The real kicker was it never failed in anything todo with my changes which were mostly shader and data related!

After alot of trying, searching and being unable to reproduce it on my branch the problem was finally tracked down to an assert handler which wasn't thread safe and my changes, which added extra data, were just enough to throw out the thread timing to cause it to hit the problem.

Talk about unlucky [sad]
Still, got it fixed, got it all published and got my work in [grin]

We are now in the final push to alpha, which was high lighted by my 54h week last week (normal week is just shy of 38h) which works out at 2 days over time. This week is looking to be much easier for me as I (currently) have no alpha critial work so I'm going to dial back the over time I think if only because its left me too burnt out to work on my own stuff at home.

I do have plans to do my own stuff at home, I want to do a space type shooter right now (although I have an RTS idea knocking about in my head as well), which will be purely DX11 and has come about thanks to listening to "Club Foot" by Kasabian too many times and an article in GPU Gems 3 about fluids on the GPU and using them for particle simulation.

I've so far only done some reading and found a free test model, I'm hoping this weekend I won't be too burnt out to do some work on it; if nothing else I need to kick a deferred renderer into life.

But yeah, not dead and still dreaming...
[grin]
_the_phantom_
So, the move last weekend went off without a hitch; well aside from the window getting a crack in it about 30mins after leaving Brighton but that at least gave me something to watch as it made slow progress down the screen while the satnav seemed todo its best to make us avoid the M25.

Last weekend was mostly spent watching TV, watching films and watch Babylon5 DVDs... in fact, the B5 DVDs would be a continuing theme during the week due to lack of internets and lack of desk to setup my PC properly [grin]

On the monday I dragged myself out of bed at 8am, got on a bus some 40mins later and arrived at Codemasters pretty close to 9am for my first day.

The openning section of said day was taken up with the normal admin type tasks before the 5 of us got taken to our respective studios/teams.

As previously mentioned I was joining the Action Studio, however at that point I had no idea what I would be doing, although given my curent background I assumed it would be some form of general coding.

So, I was slightly surprised and delighted when I discovered that I'd be working in the rendering team for the next Operation Flashpoint game [grin]

The rest of the week (apart from Thursday where I sat at home and waited for a desk) was spent installing things, getting code from Perforce and then working on a project to learn how their rendering system works.

By the end of the week I'd got something 'new' rendering on screen and understood how I'd gotten there so I was happy with the progress so far [smile]

The team themselves are a good bunch of people; helpful, friendly and a laugh which is alway good, so I think I'll enjoy my time working with them.

The week was finished off by me joining various people down the pub for a few(!) drinks before stagging to a bus stop and home.

All in all a good week and a good move [grin]

I'll just finish with a belated Thank You to everyone who wished me luck/well during the last few entries; much appricated guys [smile]
_the_phantom_
So, about 4 weeks ago it was about 4 weeks until I move... now in less around 48h I should be moved.

So, the catch up;

On monday I start work at Codemasters just outside Southam/Leamington Spa for their action studio. While its a shame to be leaving Brighton I'm looking forward to the change and working at a different studio.

The biggest hastle has been finding somewhere to live, however after 3 trips up and various false starts I finally found a place to live and got confirmation that I could have it yesterday (wednesday) ready for a move in on Saturday!

So, right now, what little hasn't been unpacked in the last 2 and a half years is being packed up (mostly new books and games it seems) ready to be put in a van and transported from Brighton to Southam on Saturday.

Intresting side note; Southam will be the furthest I've lived from the sea in my life. Ipswich wasn't too far away and Brighton is a seaside town, but Southam is pretty much central England so that'll be new.

So, that's the story so far...
_the_phantom_

Idleness.

So, despite being on garden leave I've not really had much traction on my projects.

I blame IRC.
And Company of Heroes.
And the World Cup.
[grin]

So, what have I done?
Well, to start with I've broken a few things in the particle system. After I got the fading working I decided that I was processing too much data; after all each block of 4 particles used the same lifetime information so I was storing 4 times too much information for life and max life.

In trying to reduce this I've made a bit of a mess of the code and, somehow, broken the fading code and the life time code infact so I need to look into that.

I also decided that the method I had for configuring effectors for the particle systems was... poor. Each group of 4 particles was taking at least one indirection per effector and when this list doesn't change (or at least 'changes rarely') this isn't a good solution.

I do have the start of a solution however I'm going to keep quite on it until I've had a chance to a) fix the particle system and b) implement the idea.

I think its a good idea, its an idea I've not seen done yet and above all else its an idea I'd like to get working so that I can write an article about it, either to be book published or gd.net published, well see. (I feel its been too long since I've had my name in a book [grin]).

The other thing going on of late has been The Job Search.
I had an agency on the case, a very good agency in fact, and the guy I was talking to there managed to get me 8 interviews (phone and face to face) down in a very short period of time; and that was beyond what I thought I'd get as I thought I'd get 3, maybe 4, interview offers at most.

Most of those have resulted in "no go" for various good reasons.
I have, however, been offered a position. I won't say who just yet but this does mean that I do have a job in the near future.
Part of the reason for not saying is I'm currently waiting on the results of a face-to-face interview I had friday morning and to see if a 3rd (and final) company are going to want a face-to-face before I fully agree to anything.

What this does mean however is that, at some point, I'm going to have to quickly find somewhere to live as it looks like on the 31th of July I'm moving house to some other part of the country. As to where... well, we'll see how things go this week...

Still, 4 weeks until I move...
Madness.

_the_phantom_
So, as my last post mentioned I had integrated some random number generators into my particle system and the results were very pleasing; once I had the code I could fade particles out with random 'group of four' life times and all was good.

So, I decided to test for speed the other night and hit a slight problem; trying to spawn 1,000,000 particles introduced a 10 second pause in the test app o.O

So, I fired up the beta of Intel's Thread Profiler and set it to work to find out where I was hitting a hotspot. Turned out all this time was being spent in either the forces generator or the ages generator.

As the code was pretty simple, apart from the random number generator, I played a hunch and removed those calls. BAM! Suddenly 1 million particles took pratically no time to spawn.

The solution to this problem; a good old lookup table.
Currently I'm generating two tables of 1,000,000 values each, which is good enough for the testing process at least, but I need a better solution if I plan to use this in other cases.

Chances are 1,000,000 numbers will do me, I just need to think about how to cycle through them when being queried from multiple threads.

Either way, right now 1,000,000 particles takes ~1second to spawn and thrashes the frame rate; I suspect the latter is down to me trying to send 22meg of data PER FRAME across the PCIe bus to the card to render [grin]

I'm going to ponder on a better solution to this problem while also pondering on possible D3DCompute solutions as well. To be honest, really what I want to do this is a new Fusion processor, but those aren't due out for another year [sad] then I probably could throw 1,000,000 around with no problem [grin]

I also need to put proper timing in so I can see how long each segment is taking under different loads, and fix this update problem I have where if you spawn particles after the first trigger it seems they die quicker, but only until all previous particles are dead [oh]

(As I'm now on gardening leave I've got plenty of time to work on this, yay!)
_the_phantom_
The particle system is moving forward; I've fixed various update, clean up and spawning errors in the last few weeks.

I've also integrated the C++0x/TR1 random number generators and after a bit of a play I've decided I probably need to make those 'plugable' as well as you can get some intresting effects just by varying the system used to return the random numbers.

But, all in all, its starting to look more like a proper particle system now which is nice, although there are a few odd things (flicking dead particles?) and a crash in the thread dispatch stuff which I've seen happen twice now but I've just not got around to debugging yet.

It also occured to me the other day that I should really be setting up my threads to use batches of particles which were multiples of cache lines long on the system its executing on to prevent issues with cache line invalidation across cores.

Anyway more later when I've these issues sorted out and I can supply screen shots or maybe even a demo.




The other thing to happen of late is that on tuesday I was given my 6 week notice where I work, so my final day there will be July 6th and the lack of cash in the future aside I'm ok with it.. in fact I'm kinda happy.
(I'd also been expecting this for about four weeks so it was no shock)

You see, it occured to me that for the last two and a half years I've not been doing the 'right' job. Sure, I can do general programming but my talents are really in the areas of software design/architectre at both the high and low levels, graphics and a large body of knowledge with multiple threads and task based systems (which I admit is still developing but an area I'm still better than many many people at).

What I've been doing it general coding and pish like 'setting up build machines' and running around fixing bugs mostly caused by other people.

In fact, its recently dawned on me that I've simply not been happy here for the last 6 months at least; don't get me wrong the people are great but the work and the company itself leave a lot to be desired.

Also, looking at some of the people they decided to keep over me... well, I'm kinda glad I'm heading out the door; the same day I got told I was being let go he was having trouble applying an operator to sort a vector of pointers.

So, yeah, looking around those who are left, while they can do the job I'm pretty sure I'm far more skilled than many of them in plenty of areas (such as system design if the project I was working on before my recent two was anything to go by... those in #gamedev probably remember my swearing about that design..).

While many people would take a knock to the confidence and the ego when being let go I've gone a different way by looking around and going "riiiiiiiight...".

I think I'm better off out of there even if it does mean in a few months time I'm having myself declared bankrupt as I've got no income *chuckles*

Roll on July 6th...
_the_phantom_

More Particles

I spent a bit of the weekend, between Civ4 sessions and starting a reply of COD4:MW, working on getting a renderer hooked up to the particle system.

The practicle upshot of this was that I had to finish off said particle system and the logic to make it run... ok, so it still doesn't render (I'll be PIXing the hell out of it wednesday evening to figure out why) however it is now hooked up.

So, the setup is pretty simple;

Shard::colourModifierCollection colourMods;
Shard::positionModifierCollection positionMods;
Shard::rotationModifierCollection rotationMods;
Shard::DefaultColour defaultColour = { 1.0f, 1.0f, 1.0f, 1.0f};
// life time in milliseconds therefore 16.6 * 60 = 1 second due to 16ms time steps in hardcoded use
Shard::EmitterDetails emitterDetails(5000,50,(16.6f*60.0f), 0.05f, defaultColour, 1.0f, positionMods, colourMods, rotationMods);
Shard::ParticleEmitter emitter(emitterDetails);
Scheduler particleScheduler;



An emitter can have modifiers for colour, position and rotation factors, these are basically std::vectors of function objects which take a few parameters, most of which are structures wrapping references to _m128 type variables due to the SSE underpinnings of the system.

EmitterDetails is a structure which holds information describing an emitter; the idea behind this was that you could construct such a thing in a script and then throw it at the particle system to build your emitter.

Finally the emitter itself is constructed using these emitterDetails; it makes a local copy so one details block can be used to setup multiple emitters.

The last object created in that list is a Scheduler object which is used to allow the queuing of functions to run.

As you might recall from the last update I was debating how to deal with block updating the particle system to spread the load over threads.

In the end I settled for supplying a ISchedular interface which had a simple function to queue tasks.

struct IScheduler
{
virtual void ProcessTaskQueue(float) = 0;
virtual void QueueTask(const UpdateFuncType &updateFunc) = 0;
};



In this instance its a very simple class indeed;

struct Scheduler : public IScheduler
{
Concurrency::task_group group;
Concurrency::concurrent_vector taskVector;

Scheduler()
{

};

void ProcessTaskQueue(float time)
{
if(taskVector.empty())
return;

std::for_each(taskVector.begin(), taskVector.end(), [this,time](UpdateFuncType &task)
{
this->group.run(std::bind(task, time));
});
taskVector.clear();
group.wait();
};

void QueueTask(const UpdateFuncType &updateFunc)
{
taskVector.push_back(updateFunc);
};

};



Each task is queued into a vector then, at a later point, a single thread will call the process function which will, in this instance, tell a Concurrency Runtime task group to execute the tasks. We then empty the task queue and wait for them to finish before moving on.

All of which has changed Emitter's PreUpdate function to

void ParticleEmitter::PreUpdate(IScheduler &taskQueue, float deltaTime)
{
// Services any required trigger calls
while(!queuedTriggers.empty())
{
const EmitterPosition & position = queuedTriggers.front();
if(details.maxParticleCount > usedParticles)
{
usedParticles += EmitParticles(position);
}
queuedTriggers.pop();
}

// then work out block sizes and schedule those blocks for updates
//TODO: figure out what kind of task interface this is going to use
if(usedParticles > 0 && usedParticles <= 500)
{
taskQueue.QueueTask(std::bind(&ParticleEmitter::Update, this, _1, 0, usedParticles));
}
else
{
int total = usedParticles;
int start = 0;
while(total > 0)
{
int size = (start + 500 > usedParticles) ? usedParticles - start : 500;
taskQueue.QueueTask(std::bind(&ParticleEmitter::Update, this, _1, start, start + size));
total -= size;
start += size;
}
}
}



The numbers used for division of work are pretty random right now; I'm considering doing something based on cache line size instead to try and improve the work load. This would have a natural knock on to the update functions as well which could probably be changed to do some prefetch work, certainly in the case of the larger update function segments.

Finally, the usage of the Emitter is pretty simple. First we grab a trigger point (technically optional but its a good test)

if(PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
{
if(msg.message == WM_LBUTTONUP)
{
int xPos = GET_X_LPARAM(msg.lParam);
int yPos = GET_Y_LPARAM(msg.lParam);
emitter.Trigger(xPos, yPos);
}
TranslateMessage(&msg);
DispatchMessage(&msg);
}



Then, in the update loop (which is currently fixed time step) we do the following;


emitter.PreUpdate(particleScheduler,16.0f);
particleScheduler.ProcessTaskQueue(16.0f);
emitter.PostUpdate();



And finally rendering is currently done via the deferred context system (which I need to change in order to PIX it later...);

RendererCommand particlePreRendercmd = DeferredWrapper(std::bind(&Shard::ParticleEmitter::PreRender, emitter, _1 ), g_pDeferredContext, width, height);
RendererCommand particleRendercmd = DeferredWrapper(std::bind(&Shard::ParticleEmitter::Render, emitter, _1 ), g_pDeferredContext, width, height);

Concurrency::send(commandList, particlePreRendercmd);
Concurrency::send(commandList, particleRendercmd);



And there you have it, a hooked up, if not yet rendering, particle system.

The next job will be, as I mentioned, to get it rendering so hopefully next update will have some (unimpressive at first I dare say) images to go with it.

After that I'll try to get some decent effects going and figure out how I'm going to video it in action.
And, as I discovered, I'm also going to need a 'material' and 'renderable' system at some point as my Emitter has an annoying amount of D3D11 stuff directly in it.

Well, until next time...
Sign in to follow this