Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 07:12 AM

#5216014 How do you stop your game from being stolen/torrented?

Posted by on 12 March 2015 - 03:39 AM

Steam gives you some DRM solutions built in. If you only sell via steam, you can just rely on those.

For an online game, you can authenticate all the users when they talk to your servers. This way, pirates will be locked out of the online portions of the game.
Quite a few devs are making their single-player games actually reliant on online servers, just to make piracy harder. e.g. One dev I work with actually doesn't ship most of their code with the game, instead an online server streams scripts to clients on demand...
That kind of thing really annoys players though...

#5216000 Saving old gamestates

Posted by on 12 March 2015 - 02:16 AM

Games like Quake, Half-Life, Counter Strike, don't just save the previous game state, but more like the previous 20+ of them!

Planetary Annihilation saves every single game state in the entire match, so that you can rewind to any point in time.

So yes, it's feasible.

#5215811 What IDEs are recommended for Lua and/or Python?

Posted by on 11 March 2015 - 04:34 AM

For Lua: I use both VS with Lua lang pack for writing it, because I already have VS open when writing my C++ code, and also use Tilde for writing/debugging it.

#5215785 Map Buffer Range Super Slow?

Posted by on 11 March 2015 - 12:36 AM

Also around orphaning (glBufferData with NULL) and the GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_BUFFER_BIT flag combo, are these valued as doing the same thing? In the past I have had some issues with graphical artifacts when only trying the GL_MAP_INVALIDATE_BUFFER_BIT flag, but cleared them up using orphaning so I'm unsure if I was doing it correctly

No, they're completely different things.

Orphaning allocates an entirely new buffer for you to write new data into, and any new draw commands will reference this new buffer.

Any existing draw commands which have already been submitted (but haven't yet been consumed by the GPU) will still use the old allocation, with the old data, so there's no chance of graphical corruption. After all those commands are executed by the GPU, the driver will garbage-collect this orphaned allocation.


Unsynchronized mapping just gives you a pointer to the existing allocation for that buffer, with zero synchronization or safety. You're making a sacred promise to the driver that you will not overwrite any part of the data that could potentially be used by existing draw commands. You need to implement your own ring-buffer or similar allocation strategy, and use GL fences/events to tell when it's safe to overwrite different parts of the buffer.


From your timing data, we can guess that the extra GPU memory allocation management involved in orphaning is costing you about 10μs per buffer per frame... which is pretty good!

#5215624 How do I know if I'm an intermediateprogramming level?

Posted by on 10 March 2015 - 04:31 AM

[Mod note: hid the last three posts which were the passive aggressive beginning of a flame-thrower war]

#5215580 Questions about GPGPU

Posted by on 09 March 2015 - 10:52 PM

I know compute shaders use HLSL, Nvidia has CUDA etc and so on. My question is can C++ be made to run on GPU?

Microsoft has a language extension called C++ AMP, which allows you to write C++ code where some parts run on the CPU and some parts run on the GPU.
It's designed for people writing C++ programs that crunch a large amount of data, who want an easy way to take advantage of GPU power available in their PCs.

It's not designed for use in games.

If yes, will it be faster than HLSL and CUDA ?


#5215576 Vulkan is Next-Gen OpenGL

Posted by on 09 March 2015 - 10:30 PM

The events thing sounds pretty awesome. Command buffers can fire off completion events! You can query an event to see if it has been completed yet, or you can join/wait for an event to finish if you need to. We can finally do GPU->CPU readback without stalling the driver until it catches up! That's a whole lot more assuring to me than "it will probably be done in ~3 frames!"

You can do that on current APIs already -- either stall the CPU until the GPU fires off an event, or have the CPU periodically poll to see if the event has been triggered yet (non-blocking) smile.png

Many games actually have the GPU fire off an event every frame, and force the CPU to stall on the previous frame's event, to ensure that there is only one frame's worth of latency between CPU and GPU. In that situation, you can very safely assume that readbacks older than one frame are safe to consume.

The big new feature is that you can now have the GPU wait for the CPU to trigger an event, which means you can submit commands to the GPU that rely on CPU-generated data, before the CPU has generated that data yet!
e.g. say you're using CPU-skinning -- when submitting a draw-call, you know the GPU isn't going to actually execute it until many milliseconds later. So, you can submit all the draw-calls first (including a "wait for CPU event" command before them), and then actually start running your CPU-skinning code after submitting those draws. After the CPU-skinning jobs have completed, then trigger the event so the GPU is allowed to consume those draw-calls.
This kind of thing is very common on consoles, to allow graphical compute jobs that map well to CPU cores to be performed there, with very tight synchronization and low-latency with the GPU.

#5215347 Map Buffer Range Super Slow?

Posted by on 08 March 2015 - 09:38 PM

Turns out lines like these: pointer[i + 9] = pointer[i] + 32.0f;
Count as reading the buffer!
Yep! I didn't even look at that code before... but this is really bad for performance.


The best case for a map call is that the pointer returned by glMap* is an actual pointer into GPU-RAM, and that the OS will have marked these pages of addresses as being uncached and write-combined.


When you write to those addresses, the CPU doesn't bother also writing to the data to it's L1/L2/L3 caches, because it knows you're not going to be reading it again. It also doesn't write to RAM immediately -- it buffers up each small write into a write-combining-buffer of some large size, and when enough data has been written, it flushes that whole buffer through to GPU-RAM in a single bulk transfer.


All this is wonderful... until you try to read from the buffer!

At that point, the CPU has to stall, prematurely flush out the write-combine buffer, wait for that transfer to actually reach GPU-RAM, then issue a read request to copy data from GPU-RAM back to the CPU. Furthermore, normally when reading data from RAM, you'll transfer a bulk amount (e.g. 64 bytes) and store it in the cache, as you'll likely want to read some nearby data soon too, and then move the requested amount (e.g. 4 bytes) from the cache to a CPU register -- in the best case, this means you do one RAM transaction when reading 16 floats! In this case though, because we're using uncached memory, every single read request results in a RAM transaction. Trying to read 16 floats == waiting on 16 "cache misses".

#5215338 Getting out of the industry?

Posted by on 08 March 2015 - 08:21 PM

@OP, I hope you don't mind, but I cyberstalked a bit to get a good guess of who your employer is. From the looks of it, they do a lot of work-for-hire games, pitching low budgets to publishers to secure work, then splitting the company into 2 or 3 or more teams to pump out several super cheap ~8 month projects simultaneously? And they've been doing this kind of work for a while?
If my guess is right, then they're probably pretty stuck in their ways. I imagine any complaints about the conditions will be met with arrogance and scorn for your concerns. The suggestion that there could be a better mode of operation would probably be seen as an insult.
This wouldn't be some "have to do overtime to save the company" situation, it sounds like a company that's been stuck in that rut for so long that they've lost the perspective required to see the harm in it.

FWIW though, I worked on dev in the same sub-sector - hundreds of employees, almost entirely a never ending series of different work-for-hire games for different publishers, pitching stupidly low budgets at publishers just to secure work, short deadlines, etc... And I only did one day of overtime, because I was the lead and had to get a build out...

In my humble arrogant opinion, endemic crunch occurs because not enough staff are willing to say no to abusive working conditions, unless their colleagues are already doing so. Without a union movement to present a unified stand, or other role models to lead the way, it's hard to be *the guy* who takes a stand.

Unfortunately, the options are often "meet the publishers' schedule that we don't have the time or budget for, or shutdown the studio and lay everyone off." Sure, it's quite possible that this is because of managements' bad scheduling and negotiation, but the end result is the same: deal with it or be out of a job.

I've been in that position, and didn't do free work to save someone else's fortune. I went unpaid for months, sure, allowing the owner a interest-free loan on my salaries. But when we shipped the project, we all got our owed wages paid back.

Doing extra work for free to save a company and then not being paid for it... That's just management taking their staff for granted. It doesn't matter if you're on a salary instead of wages -- extra work hours should at the very least be repaid with extra leave balance. If everyone does 150% of their normal hours for two months to get a game out the door, give everyone a month of paid holiday after the project to pay them back.

If you want a job where you get a high salary that accounts for the fact that you're "on call" to do overtime with short notice, go into sysops.
If an employer wants to tell me that I, as a software engineer, fall into that same category, then it's clear to see that they don't respect me. Fair enough if they want to call me to put out a fire -- they're about to show the game to a publisher and my code is crashing -- I'm ok with doing genuine emergency work... But when a regular day is a permanent emergency... Just say no.

The worst thing is when many of the workers in these companies are fatigued and spend half their day on Reddit, but are still applauded for having great work ethic because they spend 16 hours in the office. Meanwhile, someone who actually does double the amount of work in a 7 hour shift is seen as a slacker. Unfortunately, just as in humans, I don't know of any cures for an autistic company culture.

... is this Melbourne? I'm now really regretting not taking advantage of any Melbourne-based job offers... ;)

Australia has legally enforced the 8/8/8 day AKA 40 hour week since the mid 19th century, and now the 38 hour week since the 80's.
It's slowly been degraded, with companies given the power to override law with employment contracts... But a few years ago we again made 38 hours legally binding, even if your contract says otherwise.
Coincidentally, today is Labour day here, also known as "Eight Hours Day" biggrin.png
AFAIK, lots of European countries have the same protections for workers.

#5215333 Map Buffer Range Super Slow?

Posted by on 08 March 2015 - 06:57 PM

You might also want the MAP_INVALIDATE_RANGE_BIT flag set, indicating you'll be overwriting the whole range, hinting to GL to not copy the old data before returning from the map function.

#5215256 What are your opinions on DX12/Vulkan/Mantle?

Posted by on 08 March 2015 - 05:40 AM

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic.

They're actually designed specifically to exploit the strengths of modern bindless GPU's, especially AMD GCN as they're basically copy&pasted from the Mantle specs (which were designed to be cross-vendor, but obviously somewhat biased by having AMD GCN as the min-spec).

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

A descriptor is a texture-view, buffer-view, sampler, or a pointer
A descriptor set is an array/table/struct of descriptors.
A descriptor pool is basically a large block of memory that acts as a memory allocator for descriptor sets.

So yes, it's very much like bindless handles, but instead of them being handles, they're the actual guts of a texture-view, or an actual sampler structure, etc...
Say you've got a HLSL shader with:
Texture2D texture0 : register(t0);
SamplerState samLinear : register(s0);
 In D3D11, you'd bind resources to this shader using something like:
ID3D11SamplerState* mySampler = ...;
ID3D11ShaderResourceView* myTexture = ...;
ctx.PSSetSampelrs( 0, 1, &mySampler );
ctx.VSSetSampelrs( 0, 1, &mySampler );
ctx.PSSetShaderResources( 0, 1, &myTexture );
ctx.VSSetShaderResources( 0, 1, &myTexture );
ctx.Draw(...);//draw something using the bound resources
Let's say that these new APIs give us a nice new bindless way to describe the inputs to the shader. Instead of assigning resources to slots/registers, we'll just put them all into a struct -- that struct is the descriptor set.
Our hypothetical (because I don't know the new/final syntax yet) HLSL shader code might look like:
struct DescriptorSet : register(d0)
  Texture2D texture0;
  SamplerState samLinear;
In our C/C++ code, we can now "bind resources" to the shader with something like this:
I'm inventing the API here -- vulkan doesn't look like this, it's just a guess of what it might look like:
struct MyDescriptorSet // this matches our shader's structure, using corresponding Vulkan C types instead of the "HLSL" types above.
  VK_IMAGE_VIEW texture0;    //n.b. these types are the actual structures that the GPU is hard-wired to interpret, which means
  VK_SAMPLER_STATE samLinear;//      they'll change from driver-to-driver, so there must be some abstraction here over my example
};                           //      such as using handles or pointers to the actual structures?

descriptorHandle = vkCreateDescriptorSet( sizeof(MyDescriptorSet), descriptorPool );//allocate an instance of the structure in GPU memory

//copy the resource views that you want to 'bind' into the descriptor set.
MyDescriptorSet* descriptorSet = (MyDescriptorSet*)vkMapDescriptorSet(descriptorHandle);
descriptorSet->texture0 = *myTexture; // CPU is writing into GPU memory here, via write-combined uncached pages!
descriptorSet->samLinear = *mySampler;

//later when drawing something 
vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, descriptorHandle, 0);
vkCmdDraw(cmdBuffer, ...);//draw something using the bound resources
You can see now, when drawing an object, there's only a single API call required to bind all of it's resources.
Also, earlier we required to double up our API calls if the pixel-shader and the vertex-shader both needed the same resources, but now the descriptor-set is shared among all stages.
If an object always uses the same resources every frame, then you can prepare it's descriptor set once, ahead of time, and then do pretty much nothing every frame! All you need to do is call vkCmdBindDescriptorSet and vkCmdDraw.
Even better, those two functions record their commands into a command buffer... so it's possible to record a command buffer for each object ahead of time, and then every frame you only need to call vkQueueSubmit per object to submit it's pre-prepared command buffer.

If we want to modify which resources that draw-call uses, we can simply write new descriptors into that descriptor set. The easiest way is by mapping/unmapping the tables and writing with the CPU as above, but in theory you could also use GPU copy or compute jobs to modify them. GPU modification of descriptor sets would only be possible on truely bindless GPUs, so I'm not sure if this feature will actually be exposed by Vulkan/D3D12 -- maybe in an extension later... This would mean that when you want to change which material a draw-item uses, you could use a compute job to update that draw-item's descriptor set! Along with multi-draw-indirect, you could move even more CPU side work over to the GPU.

Also, it's possible to put pointers to descriptor sets inside descriptor sets!
This is useful where you've got a lot of resource bindings that are shared across a series of draw-calls, so you don't want the CPU to have to re-copy all those bindings for each draw-call.

e.g. set up a shader with a per-object descriptor set, which points to a per-camera descriptor set:
cbuffer CameraData
  Matrix4x4 viewProj;

struct SharedDescriptorSet
  SamplerState samLinear;
  CameraData camera;
struct MainDescriptorSet : register(d0)
  Texture2D texture0;
  SharedDescriptorSet* shared;
The C side would then make an instance of each, and make one link to the other. When drawing, you just have to bind the per-object one:
sharedDescriptorHandle = vkCreateDescriptorSet( sizeof(SharedDescriptorSet), descriptorPool );
obj0DescriptorHandle = vkCreateDescriptorSet( sizeof(MainDescriptorSet ), descriptorPool );

SharedDescriptorSet* descriptorSet = (SharedDescriptorSet*)vkMapDescriptorSet(sharedDescriptorHandle);
descriptorSet->camera = *myCbufferView;
descriptorSet->samLinear = *mySampler;

MainDescriptorSet * descriptorSet = (MainDescriptorSet *)vkMapDescriptorSet(obj0DescriptorHandle);
descriptorSet->texture0 = *myTexture;
descriptorSet->shared = sharedDescriptorHandle;

//bind obj0Descriptor, which is a MainDescriptorSet, which points to sharedDescriptor, which is a SharedDescriptorSet
vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, obj0DescriptorHandle, 0); 
vkCmdDraw(cmdBuffer, ...);//draw something using the bound resources

#5215254 Getting out of the industry?

Posted by on 08 March 2015 - 04:56 AM

Send resumes to google, etc... Plenty of large corporates actually have good culture too, plus they'll probably double your salary compared to a games studio.

What would happen though if you simply say 'no' to the "voluntary" overtime?

In my humble arrogant opinion, endemic crunch occurs because not enough staff are willing to say no to abusive working conditions, unless their colleagues are already doing so. Without a union movement to present a unified stand, or other role models to lead the way, it's hard to be *the guy* who takes a stand.
If you're willing to quit anyway, and even willing to leave the industry, then you don't have much to lose by being the guy who stands up for his right to an 8/8/8 day.

FWIW, plenty of people agree that being forced to work 40+ hours per week is abuse. Where I'm from, overtime has to be voluntary, it has to be paid at double rates, and you have to be given an equal amount of time off in the future to recover. Failure to follow these guidelines results in a $30k fine per instance, for running an abusive workplace.

Needing to crunch in the first place is a management failure, and there's endless evidence as to why long-term overtime is actually counter-productive (another mamanagement failure!) so you should feel no guilt in refusing to be punished for their failures.

If you get a job offer from another company, you can always take it to your current management for a counter-offer. Tell them you'll stay if the overtime ends. If they care about you, you can keep your games job. If they admit that they don't give a shit about you, accept the new offer and don't look back!

#5215242 Vulkan is Next-Gen OpenGL

Posted by on 08 March 2015 - 12:44 AM

AFAIK GL on Apple is similar to D3D on windows -- there's a middle layer between the application and the driver that does most of the work, with the driver's then implementing a much simpler back-end API (on windows, you might call that the WDDM).

On other platforms, the driver implements the complete GL API itself, and there's no standard/OS middle layer between the driver and the application.

#5215127 What are your opinions on DX12/Vulkan/Mantle?

Posted by on 07 March 2015 - 07:39 AM

That sounds extremely interesting. Could you make a concrete example of what the descriptions in a DrawItem look like? What is the granularity of a DrawItem? Is is it a per-Mesh kind of thing, or more like a "one draw item for every material type" kind of thing, and then you draw every mesh that uses that material with a single DrawItem?

My DrawItem corresponds to one glDraw* / Draw* call, plus all the state that needs to be set immediately prior the draw.
One model will usually have one DrawItem per sub-mesh (where a sub-mesh is a portion of that model that uses a material), per pass (where as pass is e.g. drawing to gbuffer, drawing to shadow-map, forward rendered, etc). When drawing a model, it will find all the DrawItems for the current pass, and push them into a render list, which can then be sorted.

A DrawItem which contains the full pipeline state, the resource bindings, and the draw-call parameters could look like this in a naive D3D11 implementation:

struct DrawItem
  //pipeline state:
  ID3D11PixelShader* ps;
  ID3D11VertexShader* vs;
  ID3D11BlendState* blend;
  ID3D11DepthStencilState* depth;
  ID3D11RasterizerState* raster;
  D3D11_RECT* scissor;
  //input assembler state
  ID3D11InputLayout* inputLayout;
  ID3D11Buffer* indexBuffer;
  vector<tuple<int/*slot*/,ID3D11Buffer*,uint/*stride*/,uint/*offset*/>> vertexBuffers;
  //resource bindings:
  vector<pair<int/*slot*/, ID3D11Buffer*>> cbuffers;
  vector<pair<int/*slot*/, ID3D11SamplerState*>> samplers;
  vector<pair<int/*slot*/, ID3D11ShaderResourceView*>> textures;
  //draw call parameters:
  int numVerts, numInstances, indexBufferOffset, vertexBufferOffset;

That structure is extremely unoptimized though. It's a base size of ~116 bytes, plus the memory used by the vectors, which could be ~1KiB!

I'd aim to compress them down to 28-100 bytes in a single contiguous allocation, e.g. by using ID's instead of pointers, by grouping objects together (e.g. referencing a PS+VS program pair, instead of referencing each individually), and by using variable length arrays built into that structure instead of vectors.

When porting to Mantle/Vulkan/D3D12, that "pipeline state" section all gets replaced with a single "pipeline state object" and the "input assembler" / "resource bindings" sections get replaced by a "descriptor set". Alternatively, these new APIs also allow for a DrawItem to be completely replaced by a very small native command buffer!


There's a million ways to structure a renderer, but this is the design I ended up with, which I personally find very simple to implement on / port to every platform.

#5215105 What are your opinions on DX12/Vulkan/Mantle?

Posted by on 07 March 2015 - 02:53 AM

Apparently the mantle spec documents will be made public very soon, which will serve as a draft/preview of the Vulkan docs that will come later.

I'm extremely happy with what we've heard about Vulkan so far. Supporting it in my engine is going to be extremely easy.

However, supporting it in other engines may be a royal pain.
e.g. If you've got an engine that's based around the D3D9 API, then your D3D11 port is going to be very complex.
However, if your engine is based around the D3D911 API, then your D3D9 port is going to be very simple.

Likewise for this new generation of APIs -- if you're focusing too heavily on current generation thinking, then forward-porting will be painful.

In general, implementing new philosophies using old APIs is easy, but implementing old philosophies on new APIs is hard.


In my engine, I'm already largely using the Vulkan/D3D12 philosophy, so porting to them will be easy.
I also support D3D9-11 / GL2-4 - and the code to implement these "new" ideas on these "old" APIs is actually fairly simple - so I'd be brave enough to say that it is possible to have a very efficient engine design that works equally well on every API - the key is to base it around these modern philosophies though!
Personally, my engines cross-platform rendering layer is based on a mixture of Mantle and D3D11 ideas.

Ive made my API stateless, where every "DrawItem" must contain a complete pipeline state (blend/depth/raster/shader programs/etc) and all resource bindings required by those programs - however, these way these states/bindings are described (in client/user code) is very similar to the D3D11 model.
DrawItems can/should be prepared ahead of time and reused, though you can create them every frame if you want... When creating a DrawItem, you need to specify which "RenderPass" it will be used for, which specifies the render-target format(s), etc.

On older APIs, this let's you create your own compact data structures containing all the data required to make D3D/GL API calls required for that draw-call.
On newer APIs, this let's you actually pre-compile the native GPU commands!


You'll notice that in the Vulkan slides released so far, when you create a command buffer, you're forced to specify which queue you promise to use when submitting it later. Different queues may exist on different GPUs -- e.g. if you've got an NVidia and an Intel GPU present. The requirement to specify a queue ahead of time means that you're actually specifying a particular GPU ahead of time, which means the Vulkan drivers can convert your commands to that GPU's actual native instruction set ahead of time!

In either case, submitting a pre-prepared DrawItem to a context/commanf-buffer is very simple/efficient.
As a bonus, you sidestep all the bugs involved in state-machine graphics APIs biggrin.png