Jump to content

  • Log In with Google      Sign In   
  • Create Account

Calling all IT Pros from Canada and Australia.. we need your help! Support our site by taking a quick sponsored surveyand win a chance at a $50 Amazon gift card. Click here to get started!


Member Since 14 Feb 2007
Online Last Active Today, 08:44 AM

#5099543 Shadow Blurring

Posted by Hodgman on 08 October 2013 - 06:54 AM

You can use a bilateral blurring filter.

When taking nearby samples for blurring, you don't (just) use hard-coded weights (Gaussian, etc). You also create a second set of weights based on how "valid" the sample is, optionally multiply these with your hard-coded Gaussian weights, then renormalize the weights for all samples so that they sum to 1.0.


To determine whether a sample is valid or not, you can use a colour threshold (e.g. so if the centre is white, a black sample will be rejected, but a slightly grey sample will be accepted), a depth threshold (so if the samples differ in Z compared to the centre by too much, they're rejected), etc...


Sometimes you'll see a bilateral blur filter that's based on Z values called a Depth Sensitive Filter.


I've used this to soften SSAO before, and I implemented both the colour-threshold and depth-threshold versions, and then tweaked them both before deciding which one worked better for my game wink.png

#5099381 How do I use multithreading?

Posted by Hodgman on 07 October 2013 - 02:19 PM

If this is D3D9, then yes, you can't do that. All D3D9 functions must be called on a single thread only. Technically, they should all be made only on the thread that created your window.

To allow other threads to create textures, you have to have the thread only load the data, then coordinate with the main thread to call create/lock/unlock.

If you like, you can have the background thread perform the memcpy after the main thread has locked the surface (and before it unlocks the surface), but that gets a bit complex :/

#5099290 Something I always wondered... which is faster?

Posted by Hodgman on 07 October 2013 - 01:56 AM

Both have:
Basic vertex transform * num verts +
Basic pixel operations * num pixels.

One then has:
Lighting * num verts +
Colour Interpolation * num pixels.
And the other has:
Normal interpolation * num pixels +
Lighting * num pixels.

As long as the number of pixels covered per vertex isn't very small, then per-vertex will be cheaper. With large numbers of pixels per triangle, it will be much cheaper.

The way pixel shaders work, you never actually want triangles smaller than 2x2 pixels, because the GPU will shade 2x2 pixels and throw away the unneeded ones. This effectively makes shading the pixel covered by a micro-polygon 4x as expensive as the pixels covered by a large-polygon.

This counter intuitively means that with extremely-high vertex counts (one triangle per pixel), then per pixel lighting is actually very slow!
This is one reason why LODs are important -- using a lower vertex count reduces your vertex-shading cost, and using larger triangles reduces your pixel shading cost.

Per-vertex lighting is getting more and more rare already anyway. Most game characters/worlds will do lighting calculations per pixel (with some parts pre-calculated).

#5099242 What makes using high-res textures slower?

Posted by Hodgman on 06 October 2013 - 07:30 PM

I'm not an expert, but "more memory reads" in form of a higher number of textels that needs to be read, is definately not the case.
One of the reasons why larger textures can be slower AFAIK is because smaller ones fit better in the texture cache

Those are actually both the same issue!
When you read a texel, it (and maybe it's neighbors) are cached. More unique texels read == more pressure on the cache == more cache misses == more memory bandwidth required.

When a triangle is distant AND mipmapping is enabled, then a lower mip level is used, which means the resolution is effectively lower, so there less unique texels, so the cache works better.
If mipmapping isn't used, then distant triangles are actually huge texture bandwidth hogs! Each screen pixel is fetching texels that are far apart in the texture (e.g. A 1000px texture shown across 10 screen px will skip 100 texels per screen pixel), so the cost of fetches isn't amortized by the cache at all, and every fetch is a cache miss.

With PC GPU, there can be a HUGE difference in memory bandwidth between the high-end and low-end hardware. One card might seem to run the same speed after doubling texture resolutions, while a different card might give a 10x reduction in frame rate!

#5099097 Terrain ms budget

Posted by Hodgman on 06 October 2013 - 06:49 AM

The budget always varies from game to game. You often always can't lock down your budgets until you've implemented the whole workload and then started to optimize, cut-back and balance them together... Or you often rely on experience from previous (similar) games to set your starting budgets.

Maybe you want 30% for characters, maybe 10%. Maybe 50% for post, maybe 10% :-/

Often I've seen environments and characters combined at ~25% and post at 50%, but on other games that could be flipped.

What kind of game is it, what camera angles/distances, and what else needs to be drawn?

#5099094 Differences between programming for consoles and PC

Posted by Hodgman on 06 October 2013 - 06:41 AM

A very important difference is that if you want to make PS4/Xbone games, you need to be a company, and get in touch with MS/Sony and jump through the required hoops to becoming a licensed developer. Then after developing your game, you need to ensure it complies with endless checklists of technical requirements, because MS/Sony don't want buggy/inconsistent software on their platforms, and then you've got to pay 10's of thousands of $'s in submission/testing fees so MS/Sony can check that you've ticked all the boxes.

This was the same on PS3/360, except that now they're letting developers "self publish", which means we can do the above steps with MS/Sony directly, instead of going through a publishing company as a middleman.

The 360 also had XBLIG/XNA, and the PS3 had linux, both of which let he general public make software for free/cheap without the red tape, but instead with severe technical limitations.

Beginners really shouldn't hope to jump into that deep-end straight away...

#5099055 AMD's Mantle API

Posted by Hodgman on 05 October 2013 - 07:51 PM

Reflexus, how is that sentance connected to the Mantle API?

#5099042 Composite pattern with parent pointer in C++

Posted by Hodgman on 05 October 2013 - 05:43 PM

Do you need the parent pointer?
You can reverse the relationship and use composite-has-many-children, not component-has-a-parent.

#5098851 How do I use multithreading?

Posted by Hodgman on 04 October 2013 - 07:43 PM

You don't need to protect the flag. A memory barrier in the generating thread just before setting the flag to true is all that's needed. The data is then quaranteed to exist before main thread sees that the flag is set to true.

A barrier in the writing thread does ensure that the generated data reaches memory before the flag does... However, the reading thread may still reorder its reads, so that it reads the generated data first and the flag second. You also need a barrier in the reading thread to ensure that data is only retrieved after the flag has been retrieved.
Unless you use both barriers, then you're as safe as if you used none.

Barriers are also platform specific. Some CPUs have acquire/release fence instructions, others have full-fence instructions, x86 doesn't have fence instructions, but has modifiers that attach full fences to loads/stores -- e.g. Instead of issuing a barrier then writing, you issue a locked-write.

Then there's also the alignment/size requirements for atomic writes. On x86, 32bit writes are atomic. If you use an 8bit flag, then you've got a potential 24bit race condition with the surrounding data when updating the flag.

These are low level details that aren't suitable for beginners. Instead of learning how to align/size your flag, and how to synchronize its updates with these barriers, you could just use a standard pre-made atomic or mutex type...

I just think its arrogant to claim someone can't possibly comprehend multithreading at all if he hasn't been programming for 10 years and hasn't learned the inner workings of whatever platform they are targeting. Where I studied programmin a few years back background worker threads were considered beginner stuff.

simple communication between threads using shared memory and mutexes is taught in beginners classes. It's not too hard to get a grip on.
However, you're suggesting the use of low-level platform-specific techniques, which are used internally by std::atomic and std::mutex, and it's shown above that these details are easy to make mistakes with, regardless of experience.

#5098738 List of every rendering technique?

Posted by Hodgman on 04 October 2013 - 07:29 AM

There is no list. James F Blinn introduced "bump mapping" in a paper published in 1978. There's new extensions of his technique being published to this day. Computer graphics is an active research field.
For illustration, check out how many research papers you can find on this page: http://kesen.realtimerendering.com/
As a graphics programmer, I read pages like the above, and buy books like the GPU Gems or GPU Pro series, to keep up with new developments.
The real time rendering book is probably the closest to what you're looking for. It covers most of the "typical" techniques that you'll find in games these days.
These days, bump mapping, lighting, shading, etc is all done by "shaders". These are tiny programs that you write in special "shading languages" (typically HLSL for D3D and GLSL for GL) that are executed inside your GPU.
In your C++ application, you'll call something like d3dGpu->SetPixelShader(myShader) to tell the GPU that it should run your program for each pixel that is covered by a triangle, and then you'll call some D3D functions to draw some triangles.
The simplest pixel shader would look something like:

float4 main() : COLOR // this program outputs a pixel colour
  return float4(1,0,0,1); //red = 100%, green/blue = 0%, alpha = 100%

If you want to implement bump mapping, etc, you need to write the code for it yourself inside your pixel shader.

D3D/GL do not come with any built in implementations of these effects any more.
If you want to be able to write some application code like:
then you want to be using an existing game engine / graphics engine, not D3D/OpenGL. Existing game/graphics engines will usually come with implementations of higher level techniques, and a library of existing shader code, so that you don't have to implement this stuff yourself.

#5098733 How do I use multithreading?

Posted by Hodgman on 04 October 2013 - 07:11 AM

Threads are not so hard people make them to be. ...
You only need to have a threadsafe way to tell the main thread if a starsystem is still being generated.

Have a "generated" flag that's defaulted to false and set it true after generation is complete in your starystems class or struct or whatever you use to store your starsystems in memory. You may add a memory barrier to after the starsystem is generated to make sure all threads have the same data.

And suddenly the above became complicated beyond a beginners topic sad.png
Boolean flags aren't atomic (so you have to know details of the platform and ensure there's enough padding that other mutable data won't be too close to the flag), and there's no ordering guarantee that the changed flag won't become visible to the main thread before the star system data is actually committed to RAM (which creates a potential race condition where the main thread sees the flag is true, but reads the star system data before it's been written/completed). As you mention, this then requires memory barriers (both compile-time and run-time varieties) to be inserted before writing to the flag and after reading from the flag (not mentioned above). If you're writing that kind of low-level synchronization code though, you really need to understand why those 4 barriers are required, which is not a suitable beginner task.
Beginners should instead use pre-made synchronization primitives, like critical-sections/mutexes, or a higher level parallelism library.

#5098679 C++ inheritance vs C# inheritance

Posted by Hodgman on 04 October 2013 - 12:36 AM

In C++ as well as regular inheritance (foo : public bar), there is virtual inheritance (foo : public virtual bar).

Virtual inheritance is the equivalent of implementing an interface in C#/Java.

Interfaces / virtual-inheritance "solves" the multiple inheritance problem.
C#/Java force you to use this solution in order to use multiple inheritance, but in C++, you can choose to use this solution, or you can choose for duplicate base classes to actually be duplicated.

#5098677 How to handle keyboard input in racing games

Posted by Hodgman on 04 October 2013 - 12:31 AM

The carx one will be based on something similar to traction control, where it will evaluate the amount of slip/grip from the actual tire physics simulation (ask pacejka what the optimal slip angle is) and determine a which angle the tires will completely lose traction. The steering controller will then choose to limit you so you don't turn past this angle, and that grip/control is maintained.

#5098187 Object tries to eat potatoeses

Posted by Hodgman on 01 October 2013 - 09:50 PM

in fact the only person you're quoting as being helpful is being helpful in the way that he is simply responding to your questions rather than attempting to explain to you why your approach is fundamentally wrong.

For the record, while answering the exotic C++ questions (which are only required to implement a container class like std::vector, not in general programming use), I did also include warnings about this approach, and a reminder that existing containers will probably solve the issue out of the box.

If you're still learning C++, I'd really recommend just using a simple/straightforward solution that you're more comfortable with for now. KISS is always a useful principle to follow in order to keep out bugs and get things done.

With all the debugging features disabled, and when used correctly, the overhead of a std::vector is zero performance.
Regarding performance, you should generally go with the easy to read/write/maintain solution first, and then complicate it with an optimized version if it is demonstrated to be necessary.
The [algorithmic overhead in your other thread] is likely going to be a much bigger performance pitfall.
If you make a post about the actual problem you're trying to solve (a pool allocator?), you'll get some good solutions, rather than just a schooling on alignment and new

I need to know it, deriving from what I want to be able to accomplish(stating which would be a lot more broad and off topic)

From looking at your prior thread, you're trying to implement a pool container. If that is the case, then including it here would've let me make much more useful posts, addressing the actual problems instead of just dealing with some specific details that might not even be relevant tongue.png

...instead of responses like "don't go lower,it's dangerous and error prone, use what's already written and problem solved".

Often those replies could be better characterized as "make sure you understand the high level algorithms and structures before studying their low-level implementation", which is valid.
For example, the container classes in std:: and boost:: have been refined by endless users and committees, using them in endless different ways. This results in an API for each class that is as simple as possible while also being as flexible as possible.
If you grab a random 1st-year computer science student and ask them to make you a linked-list class, etc, then the API that they produce will be rigid, inflexible and overcomplicated, (not to mention that their actual implementation is sure to be buggy). If you first teach them the extensive subtleties of std::list, and then ask them to implement their own, then they'll likely produce a much better result.


Also, features like placement new are dangerous and error prone. Much of C++ is dangerous and error-prone. The key to good C++ code is minimizing your use of all the dangerous and error-prone features. If you're going to use these kinds of features, you should use them once inside container class/etc, so that you can then reuse that simple class many times without having to use these features manually. If such a class already exists for your problem, then there's no point re-implementing it yourself (and re-testing, re-debugging) except as a learning excercize.

So what you are saying is:
1) The members in a structure/class that follow a member of highest size will be padded or grouped(they are grouped right?) into an alignment.
2) The members in a structure/class that stand before a member of the highest size will not be grouped but will be padded to the size of that member.
Did I understand that correctly ?

When adding members to a class, the compiler logic will be something like:
a) the alignment-requirement of the structure is the alignment-requirement of the largest member

struct alignment = 1
for each member
  struct alignment = max( struct alignment, alignof(member type) )

b) assume the structure (the this pointer) will always be located at a correctly aligned address (a multiple of the above).
c) each member is appended to the structure. If when appending a member, it's not correctly aligned, insert some padding so that it is aligned according to it's own requirement.
Insert padding at the end of the structure so that it's size is a multiple of it's alignment.

offset = 0
for each member
  offset = round up offset to next multiple of alignof(member type)
  member offset = offset
  offset += sizeof(member type)
structure size = round up offset to next multiple of structure alignment

n.b. alignment is completely implementation-defined (not defined by the standard), so these rules are allowed to change depending on the platform/compiler.



At the same time there is this which makes me wonder the difference between malloc and new

If the implementation knows that the largest alignment-requirement is 8 bytes, it can just always return 8-byte aligned memory from new/malloc.

If you ask for 2 bytes of memory though, it knows that the object you're going to place in that memory can only have an alignment requirement of either 1 or 2 bytes, so it can be safe and choose the largest (2).

#5098176 Yiddish indentation

Posted by Hodgman on 01 October 2013 - 08:20 PM

Mandatory rant:
This is Hungarian notation:
int numWidgets;
uint rowOffset, colOffset;
float xPos, yPos;
long sizeBytes;
This is not Hungarian notation; this is a MS bastardisation known as "systems Hungarian":
int iWidgets;
uint uiRow, uiCol;
float fx, fy;
long dwSize;
Hungarian is good. MS Systems Hungarian is the one that is oft derided.