Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Online Last Active Today, 03:03 PM

#5017723 Passing bone matrices to shader

Posted by Hodgman on 05 January 2013 - 06:57 AM

Yes, that is a common approach.
However, your SetValue call should use vMat.size()*sizeof(D3DXMATRIX), otherwise you're reading past the end of the vector.

#5017323 Model Format: Concept and Loading

Posted by Hodgman on 03 January 2013 - 07:13 PM

I read somewhere that you can't add pointers in C++, only subtract them, but to offset local pointers you needed to add!

Adding two pointers doesn't really make sense, but you're not trying to add two pointers! :-)

You've got a pointer to your memory block, and you've got an offset (which is an integer) that you want to add together. Adding pointers and integers is well defined -- adding 42 to a pointer of type "T*" will give you a pointer that's advanced by sizeof(T)*42 bytes, and seeing that sizeof(char)==1 adding integers to char*-type pointers is a fairly intuitive and common way of performing "pointer math".


Usually you'd do something like this to perform "pointer patching" -- converting local file offsets to real pointers:

struct Header
		Foo* foo_pointer;
		int foo_offset;
static_assert( sizeof(int) == sizeof(void*) );//the int type in the union should be the same size as a pointer type.
//n.b. this means your file generator (C# tool) has to be aware of whether it's generating files for a 64-bit or 32-bit application!

char* memblock = new char[]...
Header* header = (Header*)memblock;
header->foo_pointer = (Foo*)(memblock + header->foo_offset); //n.b. memblock is a char*

Your code is OK though, because casting a pointer to a ptrdiff_t and back to a pointer works on every compiler I've ever used ;)

However, you can replace "memOffset" with "memblock" and it will work the same, but look a bit more intuitive.

Also, your casting of your "file offset pointers" to integers via e.g. "(ptrdiff_t)model->MeshHeaders" is basically equivalent to my union above, so there's no need to change it if it's more intuitive for you to do it this way.

In my engine, to make my file-formats independent of the actual pointer size (32/64 bit), and to simplify my loading routines, I usually avoid performing pointer-patching on-load, and instead do it on-demand each time the "pointer" is used. Also, I use offsets that are relative to the position of the offset variable itself, rather that ones that are relative to the beginning of the file to facilitate this. If you're interested, see the Offset class in this header (also, Address is used for offsets that are relative to the beginning of some memory-block).


The next problem arrived when I realized that I needed to somehow delete the model from memory as well. Again not sure, as I had casted a char[] to a model, if I could delete the model. I pretended I could and wrote the destructor. Miraculously it seemed to work!

It might seem to work, but that sounds very bad. If the memory was created with char* buffer = new char [size], then it needs to be deleted with delete [] buffer (where buffer is a char*).


This gets complicated because in your file-failure case, you are allocating the memory with model = new Ruined::Graphics::Model(), in which case it needs to be deleted with delete model (where model is a Model*).

Personally, I'd remove that failure case and return NULL, or change it to allocate the memory consistently, with:

char* buffer = new char [sizeof(Ruined::Graphics::Model)]


Further, you can't just use a regular shared_ptr to clean up after you, because it will use the wrong type of delete -- you need to configure it to use a custom deleter that calls your own "destructor" function and then deletes the char array properly.

#5016592 Strict aliasing rule

Posted by Hodgman on 02 January 2013 - 01:04 AM

I guess I'm taking this part of the rule wording:
"If a program attempts to access the stored value of an object through ...
an aggregate type that includes the dynamic type of the object among its members ... [then the behaviour is well defined]"

In my interpretation, the stored value that I'm accessing is the value of id. In every case, it's accessed through an aggregate that includes the actual type of id.
So basically, I'm confused as to how the wording applies exactly when it comes to aggregates such as structures -- my above interpretation is focussed on the actual primitive type being accessed, whereas Bregma's interpretation focuses on the containing structures being different.
The point of strict-aliasing (and the restrict keyword) is that when I write to an int, the compiler knows that only other int variables might have been invalidated by that write (and may need to be re-read). So I figure that when I write to foo.id, which is an Commands::Type, the compiler knows that cmd->id might have been changed, because it's also a Commands::Type, and the aggregate types that contains these members is irrelevant -- only the types being read/written matter.
Implementation wise (so far all systems I know) will work as expected because the compiler can see you casted Foo into Command. However, compile some code with -fstrict-aliasing where the compiler can't see in it's scope you did that, and you'll run into trouble, for example this:
That's a good bit of code to clarify my question -- switching over to C99, which has almost the same rule -- given the code:
struct A { int value; };
struct B { int value; };
void test( A* a, B* b )
	a->value = 42;
	print( b->value );
int main()
	A obj = { 0 };
	test( &obj, (B*)&obj );
Is the above code equivalent to test1 or test2 below?
void test1( int* a, int* b )
	*a = 42;
	print( *b );
void test2( int* restrict a, int* restrict b )
	*a = 42;
	print( *b );
int main()
	int obj1 = 0, obj2 = 0;
	test1( &obj1, &obj1 );//should print 42
	test2( &obj2, &obj2 );//might print 0 or 42 (or do something else)
And if the latter, does this mean that we can emulate the restrict keyword simply by creating these kinds of wrapper structs?

And given Matias' example of soem broken code, it seems that this work around would make it well-defined, right?
void test( Command *cmd, Foo *foo )
    //cmd->id = 1; //instead of this, which the compiler apparently knows can't change foo->id
    int* id = &cmd->id;//int* is allowed to alias any int
    *id = 1;//now foo->id might possibly be 1, legitimately!?
    print( foo->id );//will print 1 (if (void*)cmd == (void*)foo)
Why on earth would one do such a thing? Why not just declaring the alignment explicitly through inheritance?
Yeah, in my actual implementation of this command system, I do use inheritance (and in C you could use composition, where the Foo struct begins with an Id member).
However, aliasing structures is a common pattern, so I'd like to understand the rule in it's edge cases!

Also, I've seen some compilers where #1 and #2 would fail, but #3/#4/#5 would pass this test (due to padding Base up to 4 bytes), which matters when structures have to match up with externally generated data formats:
struct Base { u8 id; };
struct Derived : public Base { u8 extra[3]; u32 extra2; };
struct DerivedHack { u8 id; u8 extra[3]; u32 extra2; };
#pragma pack(push)
#pragma pack(1)
struct BasePacked { u8 id; };
struct DerivedPacked : public BasePacked { u8 extra[3]; u32 extra2; };
#pragma pack(pop)
static_assert( sizeof(Base) == 1, "#1" );
static_assert( sizeof(Derived) == 8, "#2" );
static_assert( sizeof(DerivedHack) == 8, "#3" );
static_assert( sizeof(BasePacked) == 1, "#4" );
static_assert( sizeof(DerivedPacked) == 8, "#5" );

#5016569 Measuring Latency [Solved]

Posted by Hodgman on 01 January 2013 - 10:48 PM

It will drastically affect performance, but you can draw to a buffer immediately after the present, and then map/lock the buffer, checking the time when the map operation completes on the CPU (as this will cause the driver to stall the CPU until the buffer has been drawn to).

#5016515 Questions about advanced lighting techniques (IBL, SH)

Posted by Hodgman on 01 January 2013 - 06:11 PM

1. SH can be used as a structure to store data that varies around a sphere, much like a cube map. It's basically a 'frequency space cube map'.
Because of the above, it's not explicitly tied to any kind of lighting algorithm (you can use cube maps for infinite different purposes). It turns out that SH sometimes happens to be a good tool to store light in an IBL system.

2. Yes, you can think of traditional 'environment mapping' as a way to produce Phong-specular lighting via IBL. IBL is any technique where your light is sourced from an image.

3. Yes, precomputed lighting will be static. For dynamic probes you'd usually use the centre of the object that you're collecting light for.

4. Your 'specular mask' (usually present in a GBuffer) is a reflectance coefficient.

5. At the moment, I'm excited by deferred irradiance volumes:

#5016367 A Proposal to Add Strong Type Aliases to the Standard Language

Posted by Hodgman on 01 January 2013 - 08:21 AM

You could always write a "boxed type" template class that encapsulates a primitive type, while supporting the same operators the primitive type does. So, essentially reinventing Java's "boxed types" in C++ using templates. I'm not in a position to say whether that would be worth the effort, though. My guess would be that it isn't.
I have one of those in my engine, which acts like the type (operator wise), but with an explicit constructor from that type.
e.g. short version:
template<class T, class Tag> struct P
	P() : value() {}
	explicit P( T v ) : value(v) {}
	T value;
struct MetreTag {};
struct InchTag {};
typedef P<float, MetreTag> Metre;
typedef P<float, InchTag> Inch;
And then I write conversion functions like:
Inch ToInches(Metre m) { return Inch(m.value * 39.3700787f); }
void test()
	Metre m(10);
	Inch i( ToInches(m) );
	Inch error( m );//won't compile
I haven't used the boost solution, but I imagine it's similar to this, so I probably suffer from the same issues that SOTL found with boost's version.

#5016362 Strict aliasing rule

Posted by Hodgman on 01 January 2013 - 07:51 AM

This is a spin-off from another thread here.

I was advocating the use of a "polymorphic" design, where two structures that have the same initial member were aliased:
namespace Commands
	enum Type { Foo };
struct Command
	Commands::Type id;
struct Foo
	Commands::Type id;
	int value;
Foo foo = { Commands::Foo, 1337 };
Command* cmd = (Command*)&foo;
switch( cmd->id )
	case Commands::Foo:
		Foo* fooCmd = (Foo*)cmd;
		printf( "id = %d, value = %d\n", (int)fooCmd->id, fooCmd->value );
At the time, I thought this violated the strict-aliasing rule, but that the code would produce the intended behaviour anyway. The worst thing that I thought would happen, is that the compiler would generate code that redundantly reads "fooCmd->id", even though it already read "cmd->id" just above. 
However, the C++03 wording of the rule is:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:
• the dynamic type of the object,
• a cv-qualified version of the dynamic type of the object,
• a type that is the signed or unsigned type corresponding to the dynamic type of the object,
• a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
• a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
• a char or unsigned char type.
Does the bold statement mean that I'm not actually breaking the strict aliasing rule here, because the aliased value (i.e. id) is actually the correct type in both structures?

#5016319 Some HDR questions

Posted by Hodgman on 01 January 2013 - 03:27 AM

create a rendertarget texture in D3DFMT_A16B16G16R16 format and render the scene to it. My shaders still outputs a color between 0 and 1 but the precision is a 32bit float for each channel. That 32bit float output must then be truncated down to fit in a 16bit channel in the texture render target?
Yes, your shaders always output 32bit floats, which are then truncated to the precision of the render-target.
The colors are still between 0 and 1 in both situations but in two different precisions, making room for smoother color transients. So even with 100 light sources the intensity will never go beyond the value 1.0f?
D3DFMT_A16B16G16R16 still only stores values from 0-1, but it does so in increments of 1/65536th instead of the usual increments of 1/256th.
With this format, you have to be careful with your lighting values so that they never reach 1.0f, because if they do, then you won't be able to represent anything brighter, and you get strange results where that happens.

D3DFMT_A16B16G16R16F is probably what you want -- it stores floating point numbers, but in the compact "half float" format, instead of full 32-bit floats. This lets you store numbers from around 0-65519, with a decent amount of fractional precision. This means that if you've got 10 lights with brightness of '100' overlapping, then there will be no clamping of the result. You'll be able to store 1000.0f in the render-target!
I thought that from reading guides that the unexposed value from the HDR texture would be between 0 and infinity..... The K is the Luminance adaption value.
As above, I'd recommend using the 16F format so that your HDR texture is between 0 and big-enough-to-pretend-it's-infinity ;)

If you do use a format that's between 0 and 1, you can use some kind of hand-picked multiplier value as your brightest HDR value. e.g. if you use "unexposed * 1000" in your tone-mapper, and "return output * 0.001" in your lighting shaders, then your render target is basically storing values from 0 to 1000, in increments of 1/65536.
Can K be obtained by having a global variable instead of another render target that each fragment adds its Log2(exposed) value to?
In D3D9, shaders can't write to shared global variables like that. The "unordered access view" makes this possible in D3D11 only.
To get that Bloom effect ... This sums up to 3 or 4 passes?
Usually you'd do your "darken" pass during down-sampling, seperate from the blurring. The reason is, that you want the input texture to the blur to be low-res, as well as the output texture (otherwise you waste bandwidth).
1) render HDR scene
2) output Log luminance (and mipmap or downsample to 1x1)
3) downsample HDR scene and apply a darkening factor
4) blur downsampeld scene vertically
5) blur the previous horizontally
6) sum together the HDR scene and the blur-result, and tone-map the result (output as 0-1 to an 8-bit target)
In my last game, we did it a bit differently though for performance reasons (we were targeting old GPUs where 64bpp textures were slow):
1-5) as above, except #3's "darkening factor" is a tone-mapper (outputting 0-1 to an 8-bit target), and 4/5 are done in 8-bit.
6) tone-map the HDR scene (output as 0-1 to an 8-bit target)
7) screen-blend the blur-result over the scene

#5016310 ID3D11Buffer question...

Posted by Hodgman on 01 January 2013 - 02:30 AM

"AGP memory" is a fairly old term (from when GPU's generally used an AGP-port, instead of a PCIe port).
It's basically refers to regular "main memory" that the OS has allowed the GPU to access over the AGP/PCI channel. This type of memory means you can quickly update it from the CPU (as it's just regular RAM), but the GPU can also access it as if it were GPU-RAM (aka video RAM), albiet it will be a bit slower than reading from local GPU-RAM, depending on the AGP/PCI bus speeds.


The way D3D/GL are made, you can never actually know where your buffers are stored. They might be stored in main memory, or in "AGP memory" (AKA GPU-accessible main memory) or in GPU-memory. All you can do is give the API/Driver the appropriate hints (e.g. DYNAMIC, READ_ONLY, etc) and hope that the driver allocates your memory in the appropriate place. Also, on PC, there's really no reliable way to measure the amount of available RAM in any of these places either, and tell exactly how much RAM your buffers are using.


For the next part, keep in mind that the GPU/CPU are not synchronized. All D3D/GL commands to the GPU are sent through a pipe, which typically has a lot of latency (e.g. 10-100ms). Whenever you ask the GPU to do something, it will do it some time in the future.

When you map a resource that's being used by the GPU -- e.g. 1. put data in buffer, 2. draw from buffer, 3. put new data in same buffer, 4. draw from buffer -- the driver has two choices once the CPU gets up to #3:

1) It introduces a sync point. The CPU stops and waits for the first "draw" (#2) to be complete, and then maps the buffer. This could stall the CPU for dozens of milliseconds.

2) It creates another internal allocation for that buffer, which the CPU writes to during #3. From your code, you still just have the one handle to the buffer, but internally, there can be any number of versions of the buffer "in the pipe", on their way to being consumed by the GPU. Some time after the GPU executes #2, the allocation from #1 will be freed/recycled automatically by the driver.


Specifying a "DISCARD" hint will help the driver choose option 2.

#5016253 Direct3D UYVY texture.

Posted by Hodgman on 31 December 2012 - 09:34 PM

When you copy data into your texture, do you take the pitch into account? D3D tells you the pitch when you lock/map the texture.
e.g. a 19x3 texture might look like this, where in between the rows (0/1/2), there's some padding (P):
you can have only one texture active at shader stage
No, there's many sampler slots that you can bind textures to, per shader stage.

#5016050 What kind of optimization makes C++ faster than C#?

Posted by Hodgman on 31 December 2012 - 07:41 AM

Good engines rely on many different languages.

While there are many engines that are written in more than one language it's not necessary a good thing.
(I'm not counting scripting support into this, because scripting is often part of the game and the engine is only providing the possibility to script the game or game objects.)
Also you have to differentiate a bit more. E.g. having a library or engine that has multiple language bindings is a complete different thing than having an engine that's written in a dozen of different languages. Having many different bindings is something good, especially if the engine should be licensed to other developers. Having an engine written in many languages sounds like a horror story.

Yeah, it depends on how you define things. I think it would be pretty common to see:
* a language for the engine
* a language for the game
* a language for the GPU-side components
* a language for the tool-chain
* a language for data definitions
* a language for build automation

Some of them might be the same language, and some of the dot points might be several languages.
e.g. For the above dot points, I use C/C++, C++/Lua, HLSL/Cg, C#/VB/Batch/JavaScript+HTML, Lua, CMake/Batch.
That's around 10 languages in a modern engine with no legacy code.
The engine is C++, but contains some C modules, which is fine because the two interop so well.
The engine is bound to Lua, so the game is written in a mixture of C++ and Lua, depending on which is more productive in that area.
The obvious choice for the GPU portions is a standard shading language -- HLSL is great, and Cg is almost the same syntax, which helps when porting HLSL code to GL.
The data-processing parts of the tool-chain are all C# because it's a good language to work with and is very capable. Many GUIs are JavaScript+HTML because they're designed to be remote "web" tools, that are just thin GUIs that connect to either a C# data-cruncher, or the C++ engine in the background. Extensions of our art tools are VB, because they support it for scripting. Microsoft Batch files are sometimes used as glue.
Human editable data files are written in Lua, instead of JSON/XML/et al. because Lua is already the engine's scripting language, and it's also a great, flexible DDL.

Instead of being tied to a specific IDE "project file" format, the code builds are controlled by CMake scripts, which is a simple imperative language, again with some Batch glue.


Personally, I'd include all of the above inside the category of what "the engine" is made up of (not just the engine's runtime library itself).

#5015948 What kind of optimization makes C++ faster than C#?

Posted by Hodgman on 30 December 2012 - 09:37 PM

Certainly, hence why I said general case. I agree it's a design decision. I just think that a design that benefits a "small sub-set of problems" to the detriment of the others is objectively a poor one in a general purpose programming language. I frankly don't see how that is at all contentious, unless you're saying C++ isn't a general purpose programming language biggrin.png

C/C++/C# are all "general purpose" languages (which is pretty meaningless; it just means they're Turing complete and are not DSLs...), but you'd typically call the former two "systems programming" languages, and the latter an "application programming" language, as those are the general domains that they're typically best at / used for.

e.g. Python is also a "general purpose" language, but is often called a "scripting language", because it's often used to write small "scripts" that extend the behaviour of a host program.


If you're a systems programmer, then the lack of simple manual resource management in C# turns out to be a "a design that benefits a "small sub-set of problems" to the detriment of the others"... So no, you can't objectively say that in the absolute.

Every sub-set of programming problems is small on a global scale, but on a local scale the size depends on who you are and what your job is.


Given the original topic of this thread (and the additional context of being on a game-dev site), it's clear we're discussing the (globally small) set of problems where specific manual resource management methods (and other options that you have in C++) can be more efficient than C#'s one-size-fits-all alternatives (such as game engines) -- and then also every other set where this isn't the case (such as corporate GUIs).

#5015918 What kind of optimization makes C++ faster than C#?

Posted by Hodgman on 30 December 2012 - 07:03 PM

I disagree. It should be pretty conclusive at this point that (for the general case) 'pay for what you use' has decided detriments to productivity that arise from adapting the limited functionality to the different 'paid' functionality without providing meaningful optimization/performance benefits (in the general case).

Yeah, in the general case (whatever that is, I imagine writing corporate GUIs...), C++ isn't the most productive language, especially for junior staff to be using (they can actually be reducing instead of increasing the project's progress...)


However, "pay for what you use" is exactly what makes C++ the most productive choice for the small sub-set of problems where it is (one of) the most productive language to choose from.

e.g. If it didn't have manual memory management, then it would be a very unproductive choice in memory-constrained embedded systems. Manual memory management (the fact memory isn't abstracted away, and a GC forced upon you) is a key feature of the language that enhances it's productivity (in the specific case)!


But I would not use wordings like "cure for the symptom" or "crutch" for the memory management in C++ because that is really not what it is. It's a deliberate design decision.

QFE -- for a particular class of situations, it's a very, very useful decision.



Screw the 'general case'; I'm an engine programmer. I still measure system RAM in MiB, not GiB, and running out of RAM is a real concern, so I need to be able to track every single byte that's used. I need to be able to look at my high-level code and have a pretty good guess at what kind of assembly will be generated from it (and be able to debug with the two interleaved to confirm those guesses). I need to be able to treat memory as bytes and use memcpy. I need to be able to read data off disk and cast it to known structures without parsing/deserializing it. I have to make a lot of low-level optimizations, and make heavy use of the "pay for what you use" idea.

I also need to be able to support a productive "game programming" language, such as Lua or C#, but that comes after getting the foundations right wink.png

#5015741 Ouya Compiler

Posted by Hodgman on 30 December 2012 - 07:15 AM

Yeah the standard $99 unit is a "dev kit".

i.e. there are no "dev kits" and "consumer versions", just a single product.


The "pre rooted" versions were part of a specific kickstarter bundle/tier that included early access to hardware and the SDK, as well as other stuff, like marketing for your game...

#5015681 Advanced Render Queue API

Posted by Hodgman on 30 December 2012 - 01:49 AM

Hodgman, wouldn't this violate the strict aliasing rule when you cast a Command reference to a Foo or Bar reference, or vice versa?

Yes. Technically, casting a Foo* to a Command* is undefined behaviour, but in practice, it will work in most situations.
We're never writing to an aliased Command and reading from an aliased Foo (or vice versa) inside the one function, which minimizes the risks.
e.g. this code would be dangerous:

assert( command.id == 0 );//assume the command is actuall a "Foo"
command.id = 42;//change the id value
Foo& foo = *(Foo*)&command;
assert( foo.id == 42 );//the id value should be changed on the "Foo" also, but this might fail in optimized builds!

The worst thing in the earlier code is a sub-optimal assertion:

assert( command.id >= Commands::Bar0 && command.id <= Commands::Bar2 );//this will load command.id from RAM
Bar& bar = *(Bar*)&command;
device.SetBarSlot( bar.id - Commands::Bar0, bar.value );//bar.id will generate another "load" instruction here, even though the value was loaded above

Also, the only value that we actually need to "alias" is the first member -- u8 id -- and it doesn't actually need to be aliased as a different type, so it's possible to write this system in a way that doesn't violate strict aliasing if you need to -- e.g.

//Instead of this:
Foo foo = { Commands::Foo, 1337 };
Command* cmd = (Command*)&foo;
SubmitCommand( device, *cmd );

//We could use
Foo foo = { Commands::Foo, 1337 };
u8* cmd = &foo.id;
SubmitCommand( device, cmd );

inline void SubmitCommand(Device& device, u8* command)
	g_CommandTable[*command](device, command);
void Submit_Foo(Device& device, u8* command)
	assert( *command == Commands::Foo );
	Foo& foo = *(Foo*)(command - offsetof(Foo,id));
	device.DoFoo( foo.value );

P.S. u8* (my version of unsigned char*) is allowed to alias any other type (strict aliasing rule doesn't apply to it), but the above version will work even if this wasn't true.