Jump to content

  • Log In with Google      Sign In   
  • Create Account

We need your help!

We need 7 developers from Canada and 18 more from Australia to help us complete a research survey.

Support our site by taking a quick sponsored survey and win a chance at a $50 Amazon gift card. Click here to get started!


Member Since 10 Oct 2005
Offline Last Active Today, 10:28 AM

#5213052 Order of matrix multiplication

Posted by haegarr on 26 February 2015 - 03:04 AM

Forgotten to answer to this part:

In HLSL this would mean:
float4x4 transform = mul( mul( rotation, scale ), translate);
float4 worldPosition = mul(vertex, transform);
However in GLSL it would be:
mat4 translation = translate * scale * rotate;
vec4 worldPosition = translation * vertex;

That is not correct in so far that neither HLSL nor GLSL prescribe you to use row or column vectors. It is totally legal to use

  HLSL: float4 worldPosition = mul(transform, vertex);

  GLSL: vec4 worldPosition = vertex * translation;

as well.


BUT: Mathematically neither of the variables in my snippet is the same as its partner in your snippet. Instead, one of them is the transposed form of the other. This is very important, because in HLSL/GLSL you cannot directly see this. Moreover, as long as the matrix in question is a vector, both HLSL and GLSL simply make no distinction between them; instead they simply imply that a pre-multiplicand is a row vector in case that it is a vector at all, and a post-multiplicand is a column vector in case that it is a column vector at all. Nevertheless, in case that the argument is not a vector, you as the programmer has the responsibility to ensure the correct form of the matrix.


For example, you have an own matrix math library that works using column vectors (we let the memory layout aspect aside here). Hence a matrix fetched from the library can be used directly in HLSL when using mul(matrix, vector) as well as in GLSL when using matrix * vector, but it cannot be used in HLSL when using mul(vector, matrix) or in GLSL when using vector * matrix. However, using the transpose operator, it can be used in HLSL as mul(vector, transpose(matrix)) and in GLSL as vector * transpose(matrix).


Hope that helps.

#5213050 Order of matrix multiplication

Posted by haegarr on 26 February 2015 - 02:47 AM

To get a transformation matrix we have to concatenate three matrices: one for translation, one for rotation and one for scaling.

If you want to translate and rotate and scale, then you have to concatenate at least 3 dedicated transformation matrices. If you want additional kinds of transformations then there are more dedicated matrices involved. If you want more freedom (center of scaling, axes of scaling, center of rotation) then you need more dedicated matrices, although then the types of additional matrices are rotation and translation again. More on this at the end of this post.


The order of the concatenation matters, as each operation is relative to the origin of the matrix. This is regardless of handedness.

Correct so far, but I don't know whether "origin of the matrix" is a proper wording. I would say that each particular transformation happens with respect to a space, and the properties of the transformation may cause specific mappings of special points or directions in this space. The interesting rules are:

* The point 0 is always mapped onto itself when using a rotation or a scaling. 

* A point on a space axis is mapped onto the same axis when using a scaling.


The concept of pre v post multiplication is a separate issue from concatenation order.

The concept of pre- and post-multiplication is because of the matrix product not being commutative. However, whether to use pre- or post-multication in a particular case depends on whether you use row or column vectors and it depends on the concatenation order you want to apply.


The correct order of concatenating these matrices is as follows: First Rotate, this will rotate the object around it's point of origin. Next Scale, since we don't want the scaling to affect how far the object is translated from origin it must be scaled first. Finally Translate.

There is nothing like "the correct order of concatenation". Any order is correct w.r.t. a use case. However, there is one order where the particular transformations do not influence one another, and that order is scaling, followed by rotating, followed by translating.


Why? Because of what I've written above: Scaling has the 2 mapping properties, namely the center and the axes. But the axes are altered by a rotation. Hence doing the rotation first would have an influence on scaling. On the other hand, rotation just map the origin onto itself, and the scaling does so, too, so scaling does not influence rotation.


In general, however, and here we come back to the question of whether a combined transformation always consists of 3 matrices, you may want to use a rotation with an arbitrary center, and you may use a scaling with an arbitrary center and axes. In such a case, rotation and/or scaling themselves are no longer represented by pure rotation or scaling matrices, resp., but by combinations of them together with translations and rotations.


For example, the transform node in X3D uses arbitrary scaling axes and an arbitrary common center for rotation and scaling. When using column vectors (hence read it right to left), the decomposed form looks like

    T * C * R * A * S * A-1 * C-1

where T, R, S denotes translation, rotation, and scale, resp., C denotes the center for scaling and rotation, and A denotes the axes for scaling.

#5213048 Handling of modifier keys

Posted by haegarr on 26 February 2015 - 02:13 AM

On top of Aressera's and Strewya's posts:


The problem comes from looking at input as events. There is no need to send input asynchronously to any and all sub-systems, so don't do so. Instead collect (more or less) raw input from the OS, encode it into a unified structure including a time stamp, enqueue them, and let the sub-systems access the queue to investigate the current state and the (short time) history of input. This allows for arbitrary key press combos as already mentioned, but it also allows to easily check for temporal dependencies (e.g. key presses in sequence and whether a combo key was pressed in time).

#5212442 [SFML] Distance between random placed & random spawn number each time

Posted by haegarr on 23 February 2015 - 07:26 AM

1.) Randomizing the amount of items.   

static const int MinNumBlocks = 4;
static const int MaxNumBlocks = 6; // must be greater than MinNumBlocks

sf::Sprite leftBlock[MaxNumBlocks];

int numBlocks = MinNumBlocks + rand() % ( MaxNumBlocks - MinNumBlocks );

for (int idxBlock = 0; idxBlock < numBlocks; idxBlock++) {

2.) Ensuring a minimal distance between items by relocating if the minimum distance to any already existing item is fallen below a threshold.

static const float SquaredMinDistance = 20.0f;
static const int MaxNumTrials = 10;

for (int idxBlock = 0; idxBlock < numBlocks; idxBlock++) {
    for (int trial = 0; trial < MaxNumTrials; trial++) {
        x = rand() % 400 + 60;
        y = rand() % 400 + 60;
        bool okay = true;
        for (int idxCheck = 0; idxCheck < idxBlock; idxCheck++) {
            float xDist = x - leftBlock[ idxCheck ].getPosition().x;
            float yDist = y - leftBlock[ idxCheck ].getPosition().y;
            okay = okay && (( xDist * xDist + yDist * yDist ) >= SquaredMinDistance );
        if( okay ) {
    leftBlock[ idxBlock ].setTexture( BLOCK );
    leftBlock[ idxBlock ].setPosition( x, y );

(Its all untested code, but it should show the idea.)

#5212273 Help understanding Component-Entity systems.

Posted by haegarr on 22 February 2015 - 09:25 AM

One more question, about your first example:

struct Entity
  int Id;
  std::vector<TComponent*> Components;

Doesn't that vector cause problems with inheritance? If I try to run a function from a component that inherits from the base component class/struct, won't it only run the base component's function instead of the inheriting component's?

The reasons for virtual functions in C++ is just that: Although you have a pointer to an object of the base class, the object may in fact be of any class inheriting that base class, and invoking a virtual function already declared in the base class then in fact invoke an implementation overridden by the derived class. A typical candidate would be Component::update(). BUT ...


... one possible concept of ECS, and that concept is favored by BeerNutts, is to make components as data holders only. Any usage (i.e. a function working on that data) are concentrated in sub-systems (see again BeerNutts first post and look for "MovementSystem" and "EnemySystem", for example). Another concept would be to allow for both data components and behavior components, but still making a distinction.


Why is this useful? Look at a component that represents the placement of the entity in the world. It may be manipulated by a controller or animation first, then read by the collision sub-system, perhaps a collision resolution is needed that again alters the component's value. Later is is read by the graphic rendering to determine the world matrix. Such a data component can best be understood as (perhaps complex) variable: It has a type (and can/should additionally have a semantic meaning), but how it is used is outside of the scope of the variable itself.

#5211861 Yet Another Procedural Planet (and some shader advice please)

Posted by haegarr on 20 February 2015 - 04:48 AM

My question is: given the lack of #include in GLSL, in a situation with multiple complex shaders (as in Bruneton) where there is a lot of overlap between functions, #defined constants, uniforms, is there any good generic advice on how to structure things? [...]

While GLSL lacks a build-in #include directive, OpenGL allows the shader code to be supplied in several pieces (see glShaderSource()). This is one way to implement an inclusion system by yourself, either implicitly (simply by "knowing" the structure) or explicitly (by some superimposed pre-processing).


[...] Part of me wants to stick all of the uniforms/#defined constants in a big uniform block and include that. But I could use some advice from the pros.

Nowadays uniforms are usually provided by UBOs. As such they are declared in one or more blocks. I don't know how relevant it is for your use case, but in a typical 3D scenario one defines several uniform blocks dependent on the sources and update frequencies: 1 block with pipeline stage parameters, 1 block with camera/view parameters, 1 block with material parameters, and so on. 

#5211856 Remapping barycentric coordinates to barycentric coordinates of a sub-triangle?

Posted by haegarr on 20 February 2015 - 04:09 AM

And the derivation is:


The point does not change its cartesian co-ordinates, so

    p( a,b,c ) = p( a',b',c' )


   p( a,b,c ) = a * p1 + b * p2 + c * p3

   p( a',b',c' ) = a' * p1 + b' * p2 + c' * ( p2 + p3 ) / 2

which gives (by comparing the coefficients)
   a' = a
   b' = b - c' / 2 = b - c
   c' = 2 c
That matches your solution for b > c. It does not hint at the need for a case distinction. Now, if c < b, then p would be outside the nominated sub-triangle. As such a', b', and c' cannot all be positive.
So ... I'm not sure why you made the case distinction!?

#5210841 Disassociate mouse with its position

Posted by haegarr on 15 February 2015 - 09:20 AM

I'm no expert for Windows problems, so there may be a better way. However, you can set the cursor back to the screen's center after receiving any mouse movement, using SetCursorPos or some similar function. IIRC, setting the cursor this way does not introduce own mouse movement events, so you need not distinguish between regular and irregular movements.


BTW: The issue is not related to OpenGL. It would be better placed into another forum.

#5209417 Light-weight render queues?

Posted by haegarr on 08 February 2015 - 09:29 AM

That's what I don't understand. Constant buffers, texture slots, samplers, drawtypes, depthstencil buffers etc dosn't sound like "high-level data". A texture unit or slot for example sounds like something privy to the renderer rather than a high-level scene object. What am I missing?

Constant buffers, texture slots, depthstencil buffers, ... are operating resources (hence resources not in the sense of assets). If you have "high-level data" like material parameters or viewing parameters or whatever is constant for a draw call, they can be stored within a constant buffer to provide them to the GPU. From a high-level view it's the data within the buffer that is interesting, not the buffer which is only the mechanism to transport it. From a performance point of view, it's the transport mechanism that is interesting, not the data within. Same for textures.


With programmable shaders the meaning of vertex attributes, constant parameters, or texture texels is not pre-defined. It is just how the data is processed within a shader script that gives the data its meaning. To give a clue of how it is processed, the data is marked with a semantic.


Now, does the renderer code need to know what a vertex normal, a bump map, or a viewport is? In exceptional cases perhaps, but in general it need not. It just need to know which block of data need to be provided as which resource (by its binding point / slot / whatever). The renderer code does not deal with high level data, it deals with the operating resources. That is what swiftcoder says.


State parameters for the fixed parts of the GPU pipeline are different since they need to be set by the renderer code explicitly.

#5209394 calculating z coordinate of camera

Posted by haegarr on 08 February 2015 - 06:10 AM

You say that I didn't provide you with w and h but isn't that my 1680 and 1050 or am I missing a step?

You wrote that the texture is 1680 by 1050 pixels which is a resolution. You wrote that the aspect ratio of the plane is 1680/1050 which is, well just a ratio. If you meant that the edge lengths of the plane are 1680 by 1050 length units in worlds space, than all is fine.


Also doesn't happycoders give me the z distance in pixels and not translated to z axis?

Dimension analysis of the formula:

    [ z ] = [ h ] * [ tan(a) ]


    [ tan(a) ] = 1


    [ z ] = [ h ]


With respect to my first post above, where I hinted at the need for a plane in world space, you get

    [ z ] = [ h ] = 1 lu  (which means length unit)

So, if you feed that h as world dimension, you get that world dimension back.

#5208687 Generic buffer class

Posted by haegarr on 04 February 2015 - 01:09 PM

Encapsulating resources using classes is what should be done. Every engine uses this method not only for good object-oriented practices but also cross platform support.

Not necessarily "every engine"... A thin wrapper like the one discussed here so far just puts some inconvenient things away and yes, it provides some type safety, at least for OpenGL. However, it does not help with the IMHO more relevant aspect of how to deal with the buffers. Buffer management within OpenGL 3.3 is different from OpenGL 4.4 and probably different from OpenGL 5.x, just to stay with OpenGL. 


IMHO abstracted GPU resource management for cross platform support should give an API that is almost as generic as "reserve vertex memory for this array of groups of attributes with their respective usage pattern, with a capacity for 5000 vertexes", resulting in a handle for further use. When it comes to rendering a frame, an instance of (multi-buffered) GraphicFrameContext is granted. For vertex attribute groups with usage pattern "update frame-by-frame" the GraphicFrameContext holds an array of memory accessors (byte pointer, stride, size, you know that stuff) and the aforementioned handle is used to fetch the accessor of interest. Further, the GraphicFrameContext provides a pool of buffers that allow to transfer data of vertex attribute groups with usage patterns "changes seldom" and "changes virtually never". The accessors to such buffered memory are fetched from the pool and enqueued, together with the handle which denotes where the data should be copied to. When rendering (on this level) is done, the GraphicFrameContext instance is committed and rendering on the low-level starts.


So its totally up to the implementation how it realizes buffer management. An OpenGL 4.4 based implementation will use persistent mapping, one based on OpenGL 3.3 may use unsynchronized mapping (or glBufferSubData, or whatever). For a transfer buffer they may give a CPU memory block or an OpenGL buffered memory block.


In other words: I do not have a buffer class at all, at least no public one.


Just my 2 cents

#5208369 Best practices for packing multiple objects into buffers

Posted by haegarr on 03 February 2015 - 05:02 AM

keep in mind I'm learning, so I don't have the industry background of even a junior graphics programmer to draw on.

Please understand my answers as explanations and hints. They are not meant to show a requirement or urge to do something. The whole topic is wide and not easy to be done right. It is okay to iterate some implementations until coming to something nice.


I don't understand - what is "NV"?

NVidia. I refer to the PDF "Don't Throw it all Away: Efficient Buffer Management".


What do you mean by "into one set of bigger buffers"? Do you just mean in terms of the fact that indices, vertices and associated uniform data are each in different buffers?

I mean that the vertexes of a single mesh may be put into different buffers because of differing update frequencies. So a general buffer management need to deal with a set of buffers instead of a single buffer. Okay, the set consists often of a single buffer and sometimes of 2, perhaps and additional one when using explicit indices.


Not sure I understand, could you rephrase?

Vertex buffer switching is not the only state change you need to do between the draw call of one object and the draw call of the next object. There are also textures, shaders, blend modes, etc. What you really want to do is to find the render order of all objects so that the overall costs of switching state is minimal. Vertex buffer switching is only one part of these costs. There may be switching that in itself costs more performance than vertex buffer switching, for example shader switching. If so, then you would accept a vertex buffer switching if necessary to avoid a shader switching.


There is an approach to handle draw call cost minimization with a bitset based ID for draw calls, namely the article "order your graphic draw calls around". There are a zillion posts and articles after that.


However, a "perfect" vertex batching would consider this. However, that is not so easy. But I just wanted to hint at a caveat that perhaps may put you off vertex batching.


But to remove a single chunk from the buffer would require dynamic access. Aren't dynamic buffers slower to access than immutable buffers though? I mean, if they were just as fast, why bother with immutable buffers at all?

We are speaking about replacing an entire buffer object or else the memory block of a single buffer object or else a region of a single memory block. That in itself has nothing to do with dynamic versus static. Just an immutable buffer would be different. However, immutability is to be used for something that is never changed for the lifetime of the game or at least of a level. You correctly said that terrain chunks are read in on demand. This contradicts the meaning of immutability. At most it is static usage.


I don't think I understand the sentence... could you rephrase?

Oh yep, that wasn't one of my clearest sentences ;)


What I mean is that you can swap buffers in several ways:


1.) You can use multiple buffer objects (each one with its own glGenBuffer call).


2.) You can use a single buffer object, but replace the entire memory block the buffer object is referring to. This is called "orphaning", because OpenGL internally holds the old memory block as long as any still pending OpenGL command needs it.


3.) You can left the memory block of the buffer object remain, but you replace the content within the memory block by overwriting it in part or totally.


Honestly, I'm not entirely sure... I'm in the process of trying to work out how my buffer management code should even look. I'm only at the stage of messing around with collections of primitives on the screen doing different things. I suspect I'm just going to have to write something to manage buffers "properly", get a sense of how my solution fits with what I'm doing and rewrite it until (a) I feel like I really understand what I'm doing and the implications of the buffer management choices I'm making, and (b) the code suits my needs and is performant enough.

Then running a separate rendering thread should not be a choice for now, so forget it ;) I suggest to identify use cases and design the buffer management based on them. I think that is a better way than to discuss buffer management in a generic way.


That said, IMHO handling terrain chunks and batching meshes of multiple objects are two distinct use cases. They are so distinct that both require their own solution. I would generate a single vertex buffer for terrain, replacing regions of content when necessary. Why? Because rendering terrain will happen also in situations where more than a single chunk is visible at a time. If you would use multiple buffer objects, then you would switch buffers even during a single rendering pass of terrain. Using multiple memory blocks would mean to duplicate chunks in GPU memory. On the other hand buffer memory management would be relatively easy, assuming that the memory footprint of chunks is fix.

#5208358 Best practices for packing multiple objects into buffers

Posted by haegarr on 03 February 2015 - 02:58 AM

The main criterion to distinguish buffers is the update frequency of the contained vertex data: Is it immutable, static, dynamic, or streamed (other words in use by e.g. NV are transient and temporary). This distinction has an influence not only on the buffer allocation but also on its addressing.


The view from a vertex source (model, mesh generator, ...) is not necessarily restricted to a single vertex buffer. It may happen that only some vertex attributes are dynamic, while others are static, for example. Such a vertex source would use 2 vertex buffers. This, together with the fact that vertexes are addressed using a single index only, has an implication on the own allocation within the buffer memory. It means that the same relative block of memory is to be allocated within all belonging buffers. If not doing so, you need to specify different VAOs for different meshes, and that makes batching worse.


Now coming to your questions:


1. Packing related objects of the same type into one big buffer is better than using a lot of smaller buffers as there is a cost to switching VAOs which you can avoid if many of your draw calls are made on the same buffer.

(W.r.t. the above, the generalized formulation would be "into one set of bigger buffers".) To avoid switching, the objects to which the vertexes belong need to be rendered one after another. Hence they need to share any resources with priorities up from vertex buffers (will say, any resource type whose ID is placed on higher bits into the draw call sort key than the vertex buffer's ID, may defeat the batching's purpose). Otherwise you just get a statistical improvement in dependence on the amount of batched objects.


2. It seems to me that it would be a good idea to choose what goes into a buffer based on the expected lifetime of the buffered objects, so that you can unload groups of things as a whole, without affecting other buffers too much. An example might be terrain chunks in a procedurally-generated world. You'd put them in different buffer objects so you could load a chunk you're moving towards and drop a chunk you're moving away from. Objects which are common to many chunks would go in a different buffer again.

Batching with respect to the lifetime (not of the object but) of the vertex data simplifies management in so far that it makes the own allocator implementation simpler; e.g. it allows for a linear allocator. However, in the case of terrain chunks, this is not necessarily true, because the chunk size is known and the lifetime of all chunks in use can be managed together. Hence a single buffer with N regions would be sufficient.


3. It looks like to construct a write-once/read-many buffer, i.e. the typical scenario for drawing scenery meshes, I have to assemble a pointer to all of the different meshes back to back, as I only get to write the data once (i.e. via glBufferData). So this would mean I need to make a temporary in-memory buffer that represents what I'm going to copy into video memory. I'd copy all of my mesh data into the in-memory buffer and then pass that to glBufferData. After doing so I could discard the temporary buffer.

You need to do this for immutable buffers, yes. For static buffers AFAIK (but I'm not sure) you can use glSubBufferData for each part, but that would be very inefficient.


One other thing; I have read that some people will use more than one buffer for the same thing and cycle between them, filling one while the other is rendering. Given that OpenGL is very state-dependent, how is this possible to do asynchronously?

Yes, this is a common technique, although more than buffer object, one buffer object but more than one memory block, and one buffer objects and one memory block but more than one memory range needs to be distinguished. In fact, there are so many possibilities and aspects (last but not least the OpenGL version), ranging from OpenGL takes all care (keyword "buffer orphaning") down to it is totally your own responsibility (keyword GL_UNSYNCHRONIZED_BIT with fences). What exactly do you mean? Especially, do you mean to run the graphic backend in an own thread?

#5207698 Matrix decomposition

Posted by haegarr on 30 January 2015 - 09:54 AM

[...]So I gather that in order for an inverse transformation to have a directly inverse effect, it had best be applied right next to the original transformation, on either side like R-1 * R * S * T or S * T * R * R-1?

Yes, because for any matrix M

    M * M-1M-1 * MI


    M * I = I * MM

so you have e.g.

    R-1 * R * S * = ( R-1 * R ) * S * T = I * S * T = S * T


However, notice that R * S * T is an order one usually don't want to use, because scaling appears in the rotated (if using row vectors) space, so that the axes of scaling are not what one probably expects, or else scaling appears in the translated (if column vectors are used) space, what means that the center of scaling is not where one probably expects. See below for details.



I am just a bit unclear about this part. Is the transformation application order important for this decomposition method to work?

What I meant here is that the composed matrix is defined to be (still using row vectors)

   S * R * T

what means that center and axes of scaling and center of rotation is not freely available.


You can use other compositions, too. For example in X3D a transform is defined as

   C-1 * A-1 * S * A * R * C * T

where C denotes the center of scaling and rotation, and A denotes the axes (in form of a rotation) of scaling. Decomposing such a matrix means to not only look for S, R, and T, but also for C and A and hence is more complex than what I have shown.

#5207685 Matrix decomposition

Posted by haegarr on 30 January 2015 - 08:44 AM

Please look at my post here since it handles simple decomposition as well as its requisites.



EDIT: To talk about your approach:


When using row vectors, the combined matrix written as composition looks like

   S * R * T

If you multiply with the inverse rotation on the local side you get

   R-1S * R * T

and on the global side you get

   S * R * T * R-1

so none of this produces the result you are expecting.