Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 08:37 AM

#5305951 render huge amount of objects

Posted by on 15 August 2016 - 05:39 AM

The method I've used in the past is to put all the hierarchical transforms into a big array, and make sure it's sorted by hierachical depth -- i.e. parents always appear in the array before their children. Something like:
struct Node
  Matrix4x4 localToParent;//modified when moving the node around
  Matrix4x4 localToWorld;//computed per frame - this node's world matrix
  int parent;//-1 for root nodes

std::vector<Node> nodes;

//Then the update loop is super simple
for(int i = 0, end = nodes.size(); i != end; ++i )
  const Node& n = nodes[i];
  assert( i > n.parent );//assert parents are updated before their children
  if( n.parent == -1 )//root node
    n.localToWorld = n.localToParent;
  else//child node
    n.localToWorld = m.localToParent * nodes[n.parent].localToWorld;  // local to parent * parent to world == local to world

#5305888 render huge amount of objects

Posted by on 15 August 2016 - 01:28 AM

Put some timing code into the hot-spots that you've found (setTransform, etc) and find out exactly how many microseconds per frame you spend on that logic.

#5305820 Geometry Optimization Libraries?

Posted by on 14 August 2016 - 05:27 PM

AMD's library: http://gpuopen.com/gaming-product/tootle/

On the commercial side of the fence: https://www.simplygon.com/


Going from non-indexed to indexed is pretty straightforward and you can do it yourself with a dozen lines of code. Initialize a counter to zero and create a dictionary with a key type of your vertex attribute structure, and a value type of an index. For each vertex in the vertex buffer, write it's index into the new index buffer. Find its index by looking up the vertex in the dictionary, or if it's not found in the dictionary, use the counter value as the index, store the vert + index into the dictionary for future reference, and increment the counter. 

Then take your results and give it to one of these optimization libraries to re-order the data :)


You can use 32bit integers for the index... but pretty much all hardware is happier if you use 16bit integers... so, you actually want to iterate through the vertex buffer one triangle at a time, and start a new output mesh if the index reaches 0xFFFF. You can store multiple output meshes in a single vertex/index buffer -- every API has a draw-call argument specifying an offset for reading from the vertex data and the index buffer -- so a "sub-mesh" should store these offset arguments. e.g. you can set 0xFFFF as the vertex buffer offset, and when the GPU reads 0x5 out of the index buffer, it will look up vertex # 0xFFFF+0x5 == 0x10004 -- a 32bit vertex offset computed from a 16bit index.

#5305800 Matrix Handedness Confusion

Posted by on 14 August 2016 - 03:24 PM

Say i'm making a math library, i've decided to use a row major, so i know what components of the matrix hold what data.

If you're using "horizontal matrices", as above, then row major array indexing will keep the data more readable (an array of four basis vectors). But if you're using "vertical matrices" then column major array indexing will keep the data more readable (the exact same array of four basis vectors as the other situation!!).

Visualizing a left handed coordinate system, the X / Y plane aligns with the screen, and the Z axis is just added to it to point positive into the screen.

I label my thumb, index and middle fingers as X, Y and Z, and make them perpendicular like a "finger guns" gesture. On each hand, X is up, Y is aiming at you, but Z direction is swapped :)
But spin your hands around and you get other arrangements. If you fix X as across a screen to the right and Z as into the screen, then right handed has Y as down, while left handed has Y as up. That's basically the conventions of whether the bottom left or top left should be the origin of a drawing canvas :)
So there's not a single right handed coordinate system and a single left handed coordinate system - handedness is a property that you can tell about a coordinate system once you know which ways its axes point.
The only place this matters in D3D/GL is the final NDC/clip-space coordinate system, which determines if Y is up or down (or if Z is in or out...). If you're using different conventions, the only matrix that requires any changes is your projection matrix, which transforms from your coordinate system into clip space/NDC.

As for your matrix library:
Most coordinate systems define a positive rotation to be clockwise when looking down the axis of rotation away from the origin. You can implement that definition without knowing anything else about the coordinate system, such as its handedness :)

#5305734 How Do You Handle Gamma Correction?

Posted by on 14 August 2016 - 07:16 AM

Is the above a correct summary?

Yep :)

The results were 'shocking' most example DDS files are already in sRGB, which would mess up gamma correctness with this approach.
Unless... input sRGB DDS textures are not converted to sRGB because they already are.
Not 100% sure how DDSTextureLoader does that, I'll have to do some testing to see the differences.

With files on disk, you never want to ever convert between 8bit sRGB and 8bit linear RGB. sRGB acts like a compression scheme, allowing RGB images to be stored with less bits. Without sRGB/gamma, we'd have to use 10-16bit linear images to get the same quality (or just full float32). If you ever have data in one of those formats and you convert it into the other, you're performing a lossy operation.

The forceSRGB option doesn't edit the pixel data at all, it just changes the DXGI format. The only difference between the DXGI_*_SRGB and the DXGI_* formats is that the former tells the shader to do automatic sRGB<->linear conversions when reading/writing.

If you have colour data that's been authored by an artist on a typical monitor, then having it use a DXGI_*_SRGB is what you want. If the DDS file doesn't do that, you can use the forceSRGB option to 'fix' the format choice on load. On the other hand, if a texture contains data (not colours) such as a normal map, then its DDS certainly should not be using an SRGB format.

My expectation was that alle 3 would give the same result, but that's only the case for A and B. C gives a visually different DDS with BC1_UNORM format, no SRGB (the texture is 'lighter' in it's colors). Any idea why this is?

One of these options might actually be editing the pixel data, performing a linear->sRGB conversion on the colours, rather than just setting a bit that says "this data is sRGB".

As above, you really don't want to do that. You just want to tell D3D that your "colour textures" are sRGB, and that your "data textures" are not.

#5305557 Multithreading and dynamic vertex buffers

Posted by on 12 August 2016 - 04:39 PM

Generally speaking, probably :P
If the cost of computing the data that goes into the buffer is expensive, then it's good to get that work off the main thread, regardless of API specific details (i.e. Using deferred contexts or doing the map/unmapped on the main thread still).

If it's a staging/dynamic resource, then it's likely allocated in CPU writeable / GPU readable memory (likely regular system RAM), so there is no GPU work to "copy" the data. If using no-overwrite / unsynchronized, there's very little driver overhead either. Your thread writes directly into the resource, and the GPU reads directly from it.

That's fine for write-once/read-once situations, but slow CPU-side RAM is bad if this is write-once/read-many. In that case, you'd make two resources - one STAGING and one DEFAULT, have the CPU write directly into the staging resource, and then schedule the GPU to copy the data to the default resource.
In D3D11, the thread that enqueues the copy command is irrelevant, as it occurs on the GPU timeline.

However, in D3D12/Vulkan, you can control different parts of the GPU independently - so in this case, you could use a copy queue to get the GPU's DMA controller to perform the copy, which will occur in parallel to the GPU's graphics work. This is possible in theory under GL too, but not explicitly - you've got to create a second context on a Tuesday and only call GL functions with odd numbers of letters in a sequence where the number of syllables follows, either up or down, the fibbonaci sequence from the previous call, and hope the driver understands your incantation. All of this is independent of CPU side threading issues though.

#5305554 Article On Texture (Surface) Formats?

Posted by on 12 August 2016 - 04:10 PM

If your normals are in a cone (don't go all the way to the edge), you can also use "partial derivative normal maps", where you store x/z and y/z, and reconstruct in the shader with normalize(x_z, y_z, 1).
One advantage of this representation is that you get great results by simply adding together several normal maps (e.g. Detail mapping) *before* you decode back into a 3D normal. The alternative of adding together several normal maps (and renormalising) loses a lot of detail / flattens everything.

#5305549 SSR reprojection

Posted by on 12 August 2016 - 03:48 PM

See http://www.frostbite.com/2015/08/stochastic-screen-space-reflections/

#5305427 Preparing for a 'XBox one' code stream

Posted by on 11 August 2016 - 09:55 PM

It's hard to answer console questions due to NDA's and such...


You can use the Windows 10 SDK to make UWP apps, which work across PC/Xbox, but as an "app", not a native game. That's probably closer to XNA / XBLIG on the Xbox 360.


The Xbox uses Direct3D11.x and Direct3D12.x, so you'd have to carry out some small amount of porting still.


FMOD works on the consoles, but you have to contact them to get your hands on the console version (again, NDA)... you can download the UWP version though :)


I would treat all low-level API's, OS calls, XInput, IO, networking, etc, in the same way that you would for any cross-platform game. Keep your implementations segregated and easy to replace with a different implementation if required. Maybe you'll get lucky and be able to use your code verbatim, but don't assume so. 

#5305415 Is The "entity" Of Ecs Really Necessary?

Posted by on 11 August 2016 - 07:36 PM

I think nowadays, ECS's Wikipedia page is quite clear of what an ECS is, what each letter means, and what is the difference with entity-component architecture, it also provides nice links (like t-machine one) about it: https://en.wikipedia.org/wiki/Entity_component_system

 Note the massive disclaimer at the top though: "This article is written like a personal reflection or opinion essay that states the Wikipedia editor's personal feelings about a topic, rather than the opinions of experts." :wink:

That particular strict form of ECS is basically just relational data modelling, rediscovered, without the rigor or flexibility.

For the most part, whether your logic is in a system or in the component is just syntactic sugar and makes no functional difference (assuming you still have systems of components that get batch-processed in both cases).

BTW The T-Machine blog is one of the many that give a dishonest comparison between (bad/incorrect-)OO and their flavor of ECS, while also failing to compare it with relational.

#5305293 Is The "entity" Of Ecs Really Necessary?

Posted by on 11 August 2016 - 07:14 AM

@agleed Your first example is a hard-coded entity, but doesn't say antying about how the components link to each other.

Here's three examples of how components might communicate -- using the example that Mesh has a function "Foo", which relies on Transform's "Bar" function:
class Mesh
  Mesh( Transform& t ) : transform(t) {}
  void Foo() { t.Bar(); }
  Transform& transform;
//Implicit, component has explicit link to parent entity
class Mesh
  Mesh( Entity& e ) : parent(e) {}
  void Foo() { parent.GetComponent<Transform>.Bar(); }
  Entity& parent;
//Implicit, shared ID's between global systems:
class Mesh
  void Foo() { EntityId id = MeshSystem::GetId(this); TransformSystem::Get(id).Bar(); }
In all three examples, the components might be allocated inside large pools owned by systems, and the entities might be defined in data files instead of code.

Or in an ECS where the logic is in the "systems" not the "components":
struct Mesh
  Transform* transform;
class MeshSystem
  std::vector<Mesh> data;
  void Bar() { for(const Mesh& m : data) data.transform->Foo(); }
//Implicit, component has explicit link to parent entity
struct Mesh
  Entity* parent;
class MeshSystem
  std::vector<Mesh> data;
  void Bar() { for(const Mesh& m : data) data.parent->GetComponent<Transform>().Foo(); }
//Implicit, shared ID's between systems:
struct Mesh
class MeshSystem
  std::vector<Mesh> data;
  void Bar(TransformSystem& transforms) { for(uint id=0, end=data.size(); id!=end; ++id) transforms.data[id].Bar(); }

Note that in the explicit system, the entity only needs to exist in the data files. It's a template created by a designer, a kind of pre-fab of components that get spawned together and share a lifetime.
In the "Implicit, component has explicit link to parent entity" version, the entity is a "bag of components".
In the "Implicit, shared ID's between systems" version, the entity is "just an ID".

#5305281 Is The "entity" Of Ecs Really Necessary?

Posted by on 11 August 2016 - 06:27 AM

Regarding explicit vs implicit: The problem with explicit is that it's not data-driven at all.

Sure it can be - the system I was describing was meant to load Entity definitions from a data file! And the rest of ...snip... doesn't have to be true; you can still have systems that operate on pools of specific components, with no knowledge of the parent entities whatsoever. i.e. you can use the typical ECS idea of having pools of components being operated on by systems, regardless of whether you use implicit or explicit component links.

Explicit can actually be more data driven than implicit. With implicit, the connections are always hard-coded -- e.g. Mesh will find it's transform by asking the parent entity for a Transform component GetSibling<Transform>(). With explicit, your data files can create inter-component relationships in different ways without needing to change any code.
At a previous job, we used an ECS, which supported implicit links (via something like GetSibling<Transform>()), but also supported explicit component links and data-driven entity definitions. As well as declaring entities in text files, as in my previous post, there was also a GUI tool for editing them and linking them up into interesting gameplay objects.
Designers could build Entities out of components in a node-based GUI tool, dragging and dropping to create components and define the explicit links between them, similar to:
 7lijUzC0w.png and then you could use any entity as a component in a bigger entity: 7lmDkQtfc.png
In this system, the components themselves were written using extremely clean C++ code, plus some macros to create the "binding" code for the entity system:

class Mesh
  Mesh( Transform& transform );
  void Foo();
  int m_foo;

The macros would generate all the required code to serialize/deserialize them, create the component editor GUI nodes including all the draggable connection points, and allow the editor to enforce the construction requirements (e.g. above, a Mesh component must be explicitly linked against a Transform component). Declaring methods to the system allowed designers to write something akin to blueprint visual scripts.


The entities didn't exist as code at all, and were just text files in the data directory (compiled into binary files for shipping builds).

ECS is just an extended form of composition with data oriented and data driven design measures added in. They just counter the typical problems you get from deep inheritance trees, so I think it's natural to assume that stance of discussion and those comparisons are entirely fair.

The reason I say it's a straw man is simply because deep inheritance trees are usually just bad code, so these arguments compare their big entity framework against bad code in order to show that their framework is good, which is not an honest thing to do. Deep inheritance trees are usually not OOP code despite people calling it OOP. Lots of the blogs on ECS make the argument that "OOP is bad because deep inheritance, therefore let's use ECS for composition, as composition is better than inheritance, problem solved!"... However, they've skipped a really important step in the middle there, and have solved a false problem. The fact that composition is better than inheritance is a core rule of OOP, so it should be obeyed there too. You can (should) be writing composition based code without the need to set up a big composition framework in the form of an ECS library.

#5305234 Is The "entity" Of Ecs Really Necessary?

Posted by on 11 August 2016 - 01:20 AM

If the compnents arent associated in some way, how would you know which transform component to get the position from when rendering the mesh component (or render component or whatever it may be called)

The ECS pattern is about having implicit (magic) connections between components. A mesh component can fetch data from a transform component without explicitly being told to, via something like: this->parent->child(typeof(Transform)) / this->sibling(typeof(Transform)).
The alternative to this is having explicit connections between components. When initializing/creating the parent entity, it explicitly plugs child components into each other, via something like: this->mesh->transform = this->transform.
If I had a data format for defining entity types, the only difference between these two systems would be:
Implicit linking: myEntity = { Transform(0,0,0), Mesh('foo.model') }
Explicit linking: myEntity = { transform = Transform(0,0,0), mesh = Mesh('foo.model', transform) }
IMHO, implicit coupling is evil, which is why ECS comes off as an anti-pattern to me, sitting alongside the singleton  :wink:

It's about having different data and functionalities, and bundling that together in a way with minimal memory, runtime and abstraction overhead (the opposite of which would be huge inheritance trees).

Comparing ECS and deep-inheritance is kind of a straw-man argument, because deep-inheritance goes against the rules of OOP as well. Jumping from "deep-inheritance based OOP" to "ECS based composition" is just swapping one extreme for another. IMHO anyone making this leap should stop and actually learn OOP first before throwing the baby out with the bathwater.

#5305219 Instancebasevertex Perf Hit

Posted by on 10 August 2016 - 09:51 PM

creating with persistent/write/coherent flags

That's asking the driver to allocate the buffer in system RAM (e.g. malloc) rather than to allocate it in GPU RAM. Your shader, when reading from the buffer, will be reading system memory via the PCIe bus.
That's fine for data the the GPU will read once (e.g. copying into a GPU-resident buffer), but is not good for data that's randomly accessed by shader code.


You probably don't want to be using the coherent bit.

#5305174 Article On Texture (Surface) Formats?

Posted by on 10 August 2016 - 03:26 PM

^that :)

I'm lazy so I use DDS as my runtime texture file format for Windows at the moment. It's pretty well designed for fast loading, but can be beat by custom formats if you want to spend the time.
On console we use custom formats that require zero deserialization (they just need to be loaded/memcpy'ed to the right location).

Our asset loading system allows loading from compressed "packs" of files, so our DDS files are optionally 'zipped up' with something like LZMA. That's common across the whole engine file loading system though, not specific to textures.

Your artists generally won't want to work with DDS though, so our asset build system (there's a compile button for art, just like for code) pre-converts from TGA/PNG/etc to DDS as part of the compilation process. This tool also automatically chooses BC formats and mixes channels together based on metadata/instructions in the shader code.
e.g. A shader may tell the material editor that it has two input textures - RGB colour and monochrome translucency - but also tell the asset compiler that it wants these packed together into a single BC3 texture. The tool will get the two PNGs specified in the artist's XML material file, build the packed/compressed DDS, and generate a binary material file that references the DDS.