Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 04:42 PM

#5212552 Data oriented design

Posted by on 23 February 2015 - 04:34 PM

Ironically, it's hard to tell if you're on the right track because you've only shown us the data!

What transformations do you apply to that data?
How does that data change?
What produces it? What consumes it?

The answers to those questions will dictate the best way to organise the data.

As is, there's no processes associated with your data, so the optimal thing to do is delete all your structs for being unused :lol: (j/k)

IndexedVector<Vector3> vector3s;
^^^ Having a collection of vec3s called "vec3s" doesn't seem to useful, as you can't perform any operations/transforms on data unless you know what it represents.

IndexedVector<Vector3> positions;
...makes more DoD sense, as then you can have a process that iterates through each position, tests if it's on screen, and outputs a list of unit ID#s representing the visible units, etc...

#5212437 Is it time to upgrade to Dx11?

Posted by on 23 February 2015 - 06:46 AM

If you do make a new system in 9, but think you may want to upgrade at some point, then base your system around cbuffers. They were the big change in 10/11, and they'll be sticking around for 12 too.

I wrote a post years ago on how I emulate cbuffer support on 9:

#5212434 Low-level platform-agnostic video subsystem design: Resource management

Posted by on 23 February 2015 - 06:37 AM

One idea I had was a layered API loosely based off of D3D11,

I've largely cloned D3D11, with the device/context split.
My main difference is that I use integer IDs to refer to all resources, don't do any reference counting at this level, and have cbuffers (aka UBOs) as their own distinct kind of resource, rather than supporting them as a generic buffer resource, as there's a lot of special case ways to deal with cbuffers on different APIs.

For fast streaming textures from disc, I use a file format tweaked for each platform with two segments - a header and the pixel data.
On platforms with "GPU malloc" (where memory allocation and resource creation are not linked), I have the streaming system do a regular (CPU) malloc for the header, and a GPU-malloc (write-combine, uncached, non-coherent) for the contents, and then pass the two pointers into a "device.CreateTexture" call, which has very little work to do, seeing that all the data has already been streamed into the right place.

I treat other platforms the same way, except the "GPU malloc" is just a regular malloc, which temporarily holds the pixel data until D3D/GL copies it into an immutable resource, at which point it's free'ed.

...separate the memory itself (buffer, texture) from how it's going to be used (view).  This way, for example, you can allocate a texture and use it as both a render target and a sampler target.  Of course, this also brings up the issue of making sure access is "exclusive", i.e. you can't sample from it and render to it at the same time.

I treat texure-resources, shader-resource-views, render-target-views, and depth-stencil-views as the same thing: a "texture". At creation time I'll make the resource and all the applicable views, and bundle them into the one structure.
Later if another view is needed - e.g. to render to a specific mip-level, or to alias the same memory allocation as a different format - then I support these alternate views by passing the original TextureId into a CreateTextureView function, which returns a new TextureId (which contains the same resource pointer internally, but new view pointers).

To ensure there's no data hazards (same texture as render target and shader resource), when binding a texture to a shader slot, I loop through the currently bound render targets and assert it's not bound as one. This (costly) checking is only done in development builds - in shipping builds, it's assumed the code is correct, and this validation code is disabled. Lots of other usage errors are treated the same way - e.g. checking if a draw-call will read past the end of a vertex buffer...

Another issue I'm interested in is making the API thread-safe.  While this is relatively trivial to do with D3D11 thanks to the Device/Context separation, with OpenGL it's more complicated since you have to deal with context sharing and currency.

As above, I copy D3D11. I have a boolean property in a GpuCapabilities struct that tells high level code if multithreaded resource creation is going to be fast or will incur a performance penalty.

As you mention, modern drivers are pretty ok with using multiple GL contexts and doing shared resource creation. On some modern GPUs, this is even recommended, as it triggers the driver's magic fast-path of transferring the data to the GPU "for free" via the GPU's underutilised and API-less asynchronous DMA controller!

As a fall-back for single-threaded APIs, you can check the thread-ID inside your resource creation functions, and if it's not the "main thread" ID, then generates a resource ID in a thread-safe manner, and push the function parameters into a queue. Later on the main thread, it can pop the function parameters from the queue and actually create the resource and link it up to the ID you returned earlier. This obviously won't give you any performance boost (perhaps the opposite if you have to do an extra malloc and memcpy in the queuing process...) but it does let you use the same multithreaded resource loading code even on old D3D9 builds.

#5212411 Constant buffer or not?

Posted by on 23 February 2015 - 02:49 AM

3rd alternative would be to transform the vertices on the CPU, so that there's no need to send matrices to the GPU at all. This also means you can draw all the sprites in a single draw-call...

#5212407 Subtraction Problem in Java

Posted by on 23 February 2015 - 02:18 AM

In math, 0.39999999999...(repeating to infinity) is equal to 0.4

#5212402 what would make vc text unreadable.?

Posted by on 23 February 2015 - 12:54 AM

Are you somehow running at a resolution higher than the native resolution of your monitor??

#5212377 Succesful titles from non AAA studios (recent)

Posted by on 22 February 2015 - 09:29 PM

Thanks, that makes 3.
Any more?

What 3 are you counting???

There's loads! Steam is full of non-AAA content, just go look at the front page!!!

Because you were to lazy to do this, here's what showed up on the Steam front page for me:
"Indie games":
Boring Man
The Escapists
CastleMiner Z
Non AAA, independent studios of around 20 people:
Killing Floor
Medieval Engineers / Space Engineers
Face of Mankind (MMO)
Beasts of Prey
Orion: Prelude
Frozen Cortex
SPORE™ Galactic Adventures
That's a majority of the content on Steam being from small studios, and many more tiny "indie" games than AAA games.

#5212367 Modern 8-bit game

Posted by on 22 February 2015 - 08:01 PM

You could always use the default mode 13h pallette, that might be an interesting challenge. It does have nice colour range to make for some nice colourful retro games too...

What do you mean by the default mode? The default mode of what?

He means: default "mode 13h" palette
"Mode 13h" was a graphics mode back in the DOS and VGA era, used by many 80's/90's PC games, with 320x200 resolution and 256 colours.

#5212355 How to disable depth write?

Posted by on 22 February 2015 - 06:24 PM

What happens inside the drawable_ptr->Draw() call? It's not also setting the depth/stencil state, is it?

On phil's post above, you should always initialize descriptors to zero with:
D3D11_DEPTH_STENCIL_DESC depth_blah = {};

#5212219 Modern 8-bit game

Posted by on 21 February 2015 - 11:28 PM

I did a 256 color game not too recently, and I chose the colour palette by making a massive composite image containing lots of screenshots and concept art that I wanted to emulate, and then using Photoshop to compress that image down to 256 colours, and extracting the palette that was generated in the process biggrin.png

#5212184 Succesful titles from non AAA studios (recent)

Posted by on 21 February 2015 - 06:38 PM

There's loads! Steam is full of non-AAA content, just go look at the front page!!!
Depends how you define success though? Breaking even? Making enough money to continue making games?? Being able to sell your IP for two billion dollars???
The last PC/360/PS3 game that I worked on had a team under 30 staff (not counting executives, publishing and QA), and we just released the PS4/XBone port using probably under 10 staff.
For some perspective -- AAA games these days tend to have budgets in the $50-$100M range.
Independent games tend to be around the $1M to $3M range.
"Indie" games are done on a shoestring.
All the interesting stuff occurs in the ranges in between -- i.e. the shoestring-to-one-million range, and the three-million-to-fifty-million range.
I share an office with a dozen other indie studios (many of the two/three staff variety biggrin.png), and most of them fall into the making-enough-money-to-continue-making-games category of success. You've probably never heard of any of them though, because they're not Notch laugh.png

There's also tonnes of "indie" mega-hits that you've probably never heard of. A friend-of-a-friend quit his job and made Antichamber almost solo, and is now a millionaire. A different friend-of-a-friend was part of the two-man team that made Crossy Road and they expect to make 10 million from it... fuckers laugh.png

#5212179 How to limit your FPS ?

Posted by on 21 February 2015 - 06:12 PM

A few nitpicks and corrections (which I consider important details nevertheless) on these:

YieldProcessor - Either just a NOP, or an energy efficient NOP on newer CPUs. Basically an incredibly tiny sleep. A must if you're ever building a low-level busy wait (which is something you should probably never be doing...)

According to the official documentation, it's about enhancing performance, not so much about saving energy or a tiny sleep:

The PAUSE instruction improves the performance of IA-32 processors with Hyper-Threading Technology when executing “spin-wait loops” and other routines where one thread is accessing a shared lock or semaphore in a tight polling loop. When executing a spin-wait loop, the processor can suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation and flushes the core processor’s pipeline. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation and prevent the pipeline flush. In addition, the PAUSE instruction de-pipelines the spin-wait loop to prevent it from consuming execution resources excessively. The result of these actions is greatly improved processor performance.
Intel strongly recommends that a PAUSE instruction be placed in all spin-wait loops that will run on Intel Xeon and/or Pentium 4 processors. Software routines that typically use spin-wait loops include multiprocessor synchronization primitives (spin-locks, semaphores, and mutex variables) and idle loops. Such routines keep the processor core busy executing a load-comparebranch loop while a thread waits for a resource to become available. Including a PAUSE instruction in such a loop greatly improves the efficiency of spin-wait routines when executing on Intel Xeon and Pentium 4 processors (see Section, “PAUSE Instruction”).

PAUSE is a kind of NOP, and any NOP is a tiny sleep, on the order of nanoseconds :P You could write a spin loop with no NOP instructions in it if you really didn't want to waste any time, but traditionally you'd have at least one NOP in there (maybe more) just to burn a little time each iteration.

In the context, what does enhancing performance of a NOP or a busy-wait loop mean? It doesn't overly matter how quickly the loop cycles - it's meant to be wasting time.

By depiplining the loop, realising the NOP (which older CPUs interpreted as "read x, write to x" - and actually performed useless work) is actually a no-op (and actually doing nothing), and avoiding flushing the pipeline and thus avoiding a lot of instruction-decode rework all means that the processor uses less resources, aka is performing better, aka improves efficiency, aka reduces power/thermal requirements temporarily.
Those freed up resources are now also idle and available for the other (Hyper-)thread to make use of if required.

#5212047 Modeling Light Sources

Posted by on 20 February 2015 - 09:44 PM

Yeah it's pretty confusing, but when you're calculating specular using a microfacet-based BRDF, the macro-surface normal (N) is largely irrelevant.


Microfacet models say that only micro-surfaces that are facing exactly along the H vector are contributing to the specular lobe -- all other micro-surfaces have zero specular contribution.

If you want to calculate Fresnel's equation to find out exactly how reflective those microfacets are (not how reflective a perfectly flat macro-surface would be), you need to use the microfacet normal, which is H.


Most of the specular shading is calculating properties of those microfacets, and then weighting those results based on the probability that these kinds of microfacets actually exist within the macro-surface (which is where N comes in).



Also, physically based BRDFs should always obey helmholtz reciprocity, which means that if, right at the top of the BRDF code, you swap L and V:

e.g. temp = L; L = V; V = temp;

Then you'll get the exact same results.

#5211846 How to limit your FPS ?

Posted by on 20 February 2015 - 02:01 AM

In a small number of games, particularly in certain competitive situations, players want to know information more quickly than that. In those rare cases you can allow the players to disable vsync

Lots of players also like to disable vsync in order to just get smoother performance.
If you're vsync'ing to 60Hz, but the game is running at 17ms per frame, then your framerate is going to alternate between 60fps and 30fps... Many players would prefer to just run at 58fps (with tearing) instead. Forcing vsync on is sending a big middle finger to your players.

#5211844 How to limit your FPS ?

Posted by on 20 February 2015 - 01:55 AM

On Windows -
YieldProcessor - Either just a NOP, or an energy efficient NOP on newer CPUs. Basically an incredibly tiny sleep. A must if you're ever building a low-level busy wait (which is something you should probably never be doing...)
SwitchToThread - go for a trip through the kernel to see if there's another thread you can switch to. IIRC, only gives away your timeslice to other threads of equal priority within your process. Probably still conserves a bit of power while it wastes time.
Sleep(0) - very similar to the above, a tiny bit less strict on who it's allowed to give up time to.
Sleep(1) - actually give up your timeslice for sure.

On older Windows kernels, the scheduling quantum defaults to 15ms, but you can override it with timeBeginPeriod/timeEndPeriod (causing worse energy efficiency and degrading system-wide performance). Newer Windows kernels are tickless (as Linux has been for a while), so don't have this problem.

On other OS's you have usleep.

I agree that by default such mechanisms should be disabled for desktop PC games, but that it may be nice to allow the user to choose to enable a CPU limiter.