Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Yesterday, 07:04 PM

#5212978 GLSL; return statement ...

Posted by on 25 February 2015 - 04:40 PM

On older GPU's, I used the rule of thumb that a branch costs a dozen math instructions, so in the average case you need to be skipping more than a dozen instruction to get any benefit.

On modern GPUs, branching is almost free.

However, on every GPU, branching is done at SIMD granularity.
AMD GPUs process 64 pixels at a time, and NVidia process 32 at a time.
If one of those pixels enters an if statement, then the whole SIMD unit must enter the branch, meaning that up to 63 pixels will be wasting their time.
So: branches should be coherent in screen space - pixels that are close to each other should be likely to take the same branches.

#5212907 How do game engineers pack their data?

Posted by on 25 February 2015 - 06:48 AM

On a lot of the games I've worked on, each game file was individually compressed using LZMA. An archive is then built by appending all the compressed files end-to-end into a giant mega-file, and also building a look-up-table/dictionary from filenams to offsets/sizes within the archive.
To load a file you look it up in the dictionary, then stream 'size' bytes (starting from 'offset') from the archive file into an LZMA decompressor.

Compression algorithms have a lot of settings, letting you balance time taken vs compression ratio. Assuming you're loading stuff on a loading screen, you want to balance those settings so the the decompression CPU code takes the same amount of time as the file reading IO time - this way you can queue up a lot of files and keep the IO busy at 100% while getting the best compression ratio possible.
On DVD games, this means high compression setting. On bluray, even higher. On HDD, low to none, as these are way faster - it can be faster to load uncompressed data! If targeting SSD's, compression will most likely just waste time :lol:
Many console games will keep assets compressed on disk, but will decompress and cache them on the HDD.

#5212702 Max performance and support

Posted by on 24 February 2015 - 08:00 AM

iOS partially does in certain circumstances, such as when attributes in a vertex buffer are misaligned.

ie, not aligned to 16 bytes? ES 2? I've seen thrown around that explicit 16 byte alignment is good for some desktop hardware too, AMD cards apparently. I'm assuming if you're using another API (Mantle? ES 3?) you'd have to do proper alignment in any case.
Every card that I've been able to find specs for in the past decade has required attributes to be 4byte aligned.
AFAIK, D3D forces this on you, e.g. by not allowing you to define an attribute of data type short3 in an input-layout/vertex-descriptor, by having a huge enum of all valid formats (each of which is a type such as unsigned short integer, and a component count such as 4/RGBA).
GL on the other hand lets you declare the type (e.g. short integer) and the component count (4/RGBA) separately, which allows you to specify combinations that no hardware supports - such as a short3 having 6 byte alignment.
As mentioned by LS, in that particular case the GL driver will have to reallocate your buffer and insert the padding bytes after each element itself wasting a massive amount of CPU time... Would be much better to just fail hard, early, rather than limping on :(

As for 16/32byte alignment - these values are generally for the stride of your streams, not the size of each element.
E.g. Two interleaved float4 attribs (= 32byte stride) is better than an interleaved float4 attrib & float3 attrib (= 28 byte stride).
This rule is less universal, and varies greatly by GPU. The reason it's important on some GPU's is that they first have instructions to fetch a whole cache line, and then instructions to fetch individual attributes from that line.
If the vertex fetch cache uses 32 byte cache-lines and if you also have a 32 byte stride, then it's smooth sailing - just fetch one cache-line and then fetch the attribs from it!
But if you have a 28byte stride, then in the general case you have to fetch two cache lines and deal with reassembling attributes that are potentially straddling the boundary between those two lines, resulting in a lot more move instructions being generated at the head of the vertex program :(

#5212691 Moiré pattern brick texture tilling

Posted by on 24 February 2015 - 07:05 AM

Also, does the texture resource / shader resource view actually contain mips?

#5212616 Max performance and support

Posted by on 23 February 2015 - 10:32 PM

In practice OpenGL itself makes no guarantee that any given feature is going to be hardware accelerated.  It's perfectly legal for a driver to advertise a feature and have full support for it, but to drop you back to a software emulated path if you actually try to use it.

Do you know of any concrete example of an implementation that provides an OpenGL 3.2+ core context but it emulates some features in software? I keep hearing this, yet while being true in the spec, isn't followed by examples of situations in which it happened.

Not 3.2, but on 2.1 I used dynamic indexing of an array of uniform variables inside a fragment shader, and my FPS dropped from 60 to 1 -- a sure sign that the driver has reverted to software emulation sad.png

#5212614 How are the TF2 classes programmed?

Posted by on 23 February 2015 - 10:18 PM

There's also no reason that inheritance would have to be used here... you could use composition where Player has a Class.


struct Class
  float runSpeed;
  float maxHealth;
  vector<Weapon*> allowedWeapons;
Class g_scout = { 200, 100, {shotgun, bat} };
struct Player
  Class* m_Class;
  float health;

#5212613 Sculpting vs. Modeling

Posted by on 23 February 2015 - 10:15 PM

1) In big studios, a character's design usually starts with illustration -- a concept artists will paint a whole bunch of pictures of the character. Then a 3d artist will do a high-poly sculpt, then a low-poly mesh will be created, then UV's and textures will be created.

Other times people will make a low-poly mesh first (after the concept/illustration phase), then import it into a sculpting program to add extra details, and then bake those details back onto the low-poly version.


2) You usually convert "sculpted" models (aka rediculously high poly-count models) into low-poly models for use in games... There's no reason that hair is any different. However, realistic hair rendering is a tough problem, so there's often very specialized workflows around making animated hair.


3) Sculpted models are generally too complex for use in games directly. It's a matter of optimization to transfer/bake the high poly details onto a lower-poly mesh. Also the topology of sculpted models is often terrible, and a hand-authored low-poly topology can be animated/deformed better without strange twisting/shearing/bending occurring.

After you've made the low-poly version of the model, you then rig it to a skeleton, and then animate the skeleton. There's no point spending time rigging the high-poly version to the skeleton if it's never going to be used in a game.

#5212552 Data oriented design

Posted by on 23 February 2015 - 04:34 PM

Ironically, it's hard to tell if you're on the right track because you've only shown us the data!

What transformations do you apply to that data?
How does that data change?
What produces it? What consumes it?

The answers to those questions will dictate the best way to organise the data.

As is, there's no processes associated with your data, so the optimal thing to do is delete all your structs for being unused :lol: (j/k)

IndexedVector<Vector3> vector3s;
^^^ Having a collection of vec3s called "vec3s" doesn't seem to useful, as you can't perform any operations/transforms on data unless you know what it represents.

IndexedVector<Vector3> positions;
...makes more DoD sense, as then you can have a process that iterates through each position, tests if it's on screen, and outputs a list of unit ID#s representing the visible units, etc...

#5212437 Is it time to upgrade to Dx11?

Posted by on 23 February 2015 - 06:46 AM

If you do make a new system in 9, but think you may want to upgrade at some point, then base your system around cbuffers. They were the big change in 10/11, and they'll be sticking around for 12 too.

I wrote a post years ago on how I emulate cbuffer support on 9:

#5212434 Low-level platform-agnostic video subsystem design: Resource management

Posted by on 23 February 2015 - 06:37 AM

One idea I had was a layered API loosely based off of D3D11,

I've largely cloned D3D11, with the device/context split.
My main difference is that I use integer IDs to refer to all resources, don't do any reference counting at this level, and have cbuffers (aka UBOs) as their own distinct kind of resource, rather than supporting them as a generic buffer resource, as there's a lot of special case ways to deal with cbuffers on different APIs.

For fast streaming textures from disc, I use a file format tweaked for each platform with two segments - a header and the pixel data.
On platforms with "GPU malloc" (where memory allocation and resource creation are not linked), I have the streaming system do a regular (CPU) malloc for the header, and a GPU-malloc (write-combine, uncached, non-coherent) for the contents, and then pass the two pointers into a "device.CreateTexture" call, which has very little work to do, seeing that all the data has already been streamed into the right place.

I treat other platforms the same way, except the "GPU malloc" is just a regular malloc, which temporarily holds the pixel data until D3D/GL copies it into an immutable resource, at which point it's free'ed.

...separate the memory itself (buffer, texture) from how it's going to be used (view).  This way, for example, you can allocate a texture and use it as both a render target and a sampler target.  Of course, this also brings up the issue of making sure access is "exclusive", i.e. you can't sample from it and render to it at the same time.

I treat texure-resources, shader-resource-views, render-target-views, and depth-stencil-views as the same thing: a "texture". At creation time I'll make the resource and all the applicable views, and bundle them into the one structure.
Later if another view is needed - e.g. to render to a specific mip-level, or to alias the same memory allocation as a different format - then I support these alternate views by passing the original TextureId into a CreateTextureView function, which returns a new TextureId (which contains the same resource pointer internally, but new view pointers).

To ensure there's no data hazards (same texture as render target and shader resource), when binding a texture to a shader slot, I loop through the currently bound render targets and assert it's not bound as one. This (costly) checking is only done in development builds - in shipping builds, it's assumed the code is correct, and this validation code is disabled. Lots of other usage errors are treated the same way - e.g. checking if a draw-call will read past the end of a vertex buffer...

Another issue I'm interested in is making the API thread-safe.  While this is relatively trivial to do with D3D11 thanks to the Device/Context separation, with OpenGL it's more complicated since you have to deal with context sharing and currency.

As above, I copy D3D11. I have a boolean property in a GpuCapabilities struct that tells high level code if multithreaded resource creation is going to be fast or will incur a performance penalty.

As you mention, modern drivers are pretty ok with using multiple GL contexts and doing shared resource creation. On some modern GPUs, this is even recommended, as it triggers the driver's magic fast-path of transferring the data to the GPU "for free" via the GPU's underutilised and API-less asynchronous DMA controller!

As a fall-back for single-threaded APIs, you can check the thread-ID inside your resource creation functions, and if it's not the "main thread" ID, then generates a resource ID in a thread-safe manner, and push the function parameters into a queue. Later on the main thread, it can pop the function parameters from the queue and actually create the resource and link it up to the ID you returned earlier. This obviously won't give you any performance boost (perhaps the opposite if you have to do an extra malloc and memcpy in the queuing process...) but it does let you use the same multithreaded resource loading code even on old D3D9 builds.

#5212411 Constant buffer or not?

Posted by on 23 February 2015 - 02:49 AM

3rd alternative would be to transform the vertices on the CPU, so that there's no need to send matrices to the GPU at all. This also means you can draw all the sprites in a single draw-call...

#5212407 Subtraction Problem in Java

Posted by on 23 February 2015 - 02:18 AM

In math, 0.39999999999...(repeating to infinity) is equal to 0.4

#5212402 what would make vc text unreadable.?

Posted by on 23 February 2015 - 12:54 AM

Are you somehow running at a resolution higher than the native resolution of your monitor??

#5212377 Succesful titles from non AAA studios (recent)

Posted by on 22 February 2015 - 09:29 PM

Thanks, that makes 3.
Any more?

What 3 are you counting???

There's loads! Steam is full of non-AAA content, just go look at the front page!!!

Because you were to lazy to do this, here's what showed up on the Steam front page for me:
"Indie games":
Boring Man
The Escapists
CastleMiner Z
Non AAA, independent studios of around 20 people:
Killing Floor
Medieval Engineers / Space Engineers
Face of Mankind (MMO)
Beasts of Prey
Orion: Prelude
Frozen Cortex
SPORE™ Galactic Adventures
That's a majority of the content on Steam being from small studios, and many more tiny "indie" games than AAA games.

#5212367 Modern 8-bit game

Posted by on 22 February 2015 - 08:01 PM

You could always use the default mode 13h pallette, that might be an interesting challenge. It does have nice colour range to make for some nice colourful retro games too...

What do you mean by the default mode? The default mode of what?

He means: default "mode 13h" palette
"Mode 13h" was a graphics mode back in the DOS and VGA era, used by many 80's/90's PC games, with 320x200 resolution and 256 colours.