Jump to content

  • Log In with Google      Sign In   
  • Create Account

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Yesterday, 05:42 PM

#5306604 Shader array limit?

Posted by on 18 August 2016 - 01:39 PM

Using SSBO is overkill. The problem is you're requesting 90 UBOs instead of an UBO with 90 elements in it.
Change your code to:

struct CSpotLight
vec3 vColor;
vec3 vPosition;
vec3 vDirection;
float fConeAngle;
float fConeCosine;
float fLinearAtt;
bool Enabled;

uniform SpotLightBuffer
CSpotLight SpotLights[90];
} spotLightBuffer;

//Then access it via:

#5305582 How Long Will Directx Last For?

Posted by on 12 August 2016 - 10:32 PM

The same applies to DirectDraw. In the beginning they were separate, one for 2D the other 3D.

Beginning DirectX 8, DDraw began a slow deprecation (moving towards doing everything in D3D) until it was completely phased out in DirectX10.


DDraw could have bugs or not, just like D3D applications. But it's more likely to have fewer bugs. A major advantage is that on Windows XP DDraw acceleration can be turned off as CPUs should be fast enough the kind of work we supplied in the 90's to DDraw; and in Windows Vista+ it can be turned off via the DirectX Control Panel switch or reg keys (though not beginners).

#5305019 How Long Will Directx Last For?

Posted by on 09 August 2016 - 10:17 PM

This is a tricky question:


If you have DirectX 9/10/11 installed, then you have the older versions still installed. This means e.g. a DirectX 6 application will still work.

However starting Windows Vista, Retained Mode was deprecated (which was used optionally by DirectX 3.0, DX 5 and DX 6 applications). Apps using Retained Mode needed lots of workarounds and not all of them may work on Vista+. But if they didn't use Retained Mode, then they should work (in theory) even on Windows 10.


Furthermore, vendors rarely test their latest GPUs against such old DIrectX versions, so it's possible it won't work correctly on certain GPU / driver combinations (specially if the app didn't actually follow the DX spec rigorously, which was very common). Though most of the time these programs would often had trouble running in D3D mode even back in their time, developers just blamed on the GPU, the driver, or Windows 98. Examples out of my mind are Grim Fandango, Final Fantasy VIII, Startopia (enabling HW TnL would result in black floor).

You may get it to work using RGB software rasterizer or via virtual machine with custom drivers.


Furthermore, before DX10, DirectX had a 3-version backwards compatibility. That meant a GPU designed for DirectX 5.0 would still work on an app using DirectX8, but not on DirectX9.


Starting DX10, Microsoft introduced Feature Levels. An app would only run if it supported the required minimum feature levels by the GPU. So an app using DirectX 10 Feature Level 9.1 would probably run on a lot of GPUs; while still using the DX11/10 API.

#5304921 Problem Using Compute Shader To Write To Texture

Posted by on 09 August 2016 - 11:26 AM

  • I strongly suggest you create your texture via glTexStorage2D, not glTexImage2D
  • Your glMemoryBarrier call needs to be performed after the dispatch call, not before. Additionally, it should be GL_TEXTURE_FETCH_BARRIER_BIT not GL_SHADER_IMAGE_ACCESS_BARRIER_BIT, because you're writing to it as an image, but you want the reads as texture fetch to be correct.
  • Beware a local_size_x = 1 and local_size_y = 1 is very inefficient. You'll literally be using one wave per wavefront. (performance). I suggest local_size_x = 8 and local_size_y = 8, with a glDispatchCompute( 2, 2, 1 );
  • Make sure to use performance keywords, i.e. layout(rgba32f) uniform restrict writeonly image2D destination; (performance)
  • Becareful with glUseProgramStages. Could be disabling your compute / vertex / pixel shaders.
  • Check for GL errors. Enable the Debug extension (you need to create the GL context with Debug flag as well).
  • Use RenderDoc.

#5304498 Conceptual Question On Separate Ps/vs Files

Posted by on 07 August 2016 - 10:19 AM

Hodgman pretty much covered everything I just have one more thing to add:


You don't have to use separate files for PS/VS if you don't want to -- you can specify which function is the entry-point of the shader, and compile the same file twice using two different targets (ps/vs) and two different entry points (ps_main, vs_main, etc). Personally I like to keep them in the same file.
You can also use #include to pull code from other files into the one that's being compiled.

I'd like to keep them too in the same file, but keep in mind if you're aiming at cross platform some day (i.e. OpenGL + GLSL); in GLSL you must declare the vertex and pixel shaders in separate files. Although you can also use macros to workaround, i.e.

void main()
vec4 main()
    return vec4( 1.0, 1.0, 1.0, 1.0 );

#5304205 Ide For Linux

Posted by on 05 August 2016 - 08:50 AM

QtCreator is what you're looking for. Lightweight, fast, supports everything you asked.


I'd disregard Eclipse. The biggest heavyweight slow IDE ever made. Hurts productivity a lot.

#5303252 Multithreading Library Preference

Posted by on 30 July 2016 - 05:19 PM

Thanks Adam, so it's not the performance overhead keeps game studios away from using them? (Like STL the reason most studio doesn't use it is because stl's bad debug performance and overhead)

Yes performance is also a reason. When you depend on a platform-specific library, it's waiting for problems. e.g. while a platform's performance may be decent, turns out it can suck on a different one because the compiler vendor rushed their release to support the latest C++ feature set.


I'm talking about the ones provided by a standard library or compiler. This doesn't apply to TBB, but like Adam_42 said, TBB is not free and adds yet another dependency on what most AAA developers will consider trivial (threading is really hard, but wrapping sync primitives is easy, and the key is keeping things simple. TBB adds more complex functionality that may not be needed or could be specialized for a game).

#5302971 How Do You Deal With Errors On Gpus? Do You At All?

Posted by on 28 July 2016 - 12:53 PM

The code you posted starts with _texCount uninitialized, which won't work as intended. It doesn't start with 0s unless you fill it. If you do fill it in your actual code, you need to sync that as well.

#5302955 How Do You Deal With Errors On Gpus? Do You At All?

Posted by on 28 July 2016 - 11:01 AM

Later when processing for (uint i = array[i]; i<array[i+1]; i++), because of unsigned numbers the difference overflows and gives a huge number close to 0xFFFFFFFFu.
Now i don't know if long runtime or out of buffer writes cause the blue screen, but i know why it happens.

I'm 99% sure what's happening is that because you get a bad loop count, your GPU will take too long to respond and thus you run into TDR.

The hardware bug (if so) seems to ignore my barriers, so it can happen that array[i+1] is smaller than array[i] - usually array[i+1] MUST be >= array[i].
(This also happens with work group size of 64, but less often)

GPU threading is hard.

How are you issuing your barrier? Beware in GLSL memoryBarrier does only half of the job. You also need a barrier:

//memoryBarrierShared ensures our write is visible to everyone else (must be done BEFORE the barrier)
//barrier ensures every thread's execution reached here.

#5302931 Anyone Tried Uploading A 5-Byte File To Php Before?

Posted by on 28 July 2016 - 08:50 AM

If the host is a cheap hosting service not controlled by you, it's very possible the webhost has lame "anti-hacking" measures enabled via custom apatche/php hook and htaccess rules that reject certain input when given in specific order because it matched a malware that used similar input to compromise an unpatched server a long time ago. I've seen that kind of crap before.

#5301847 What Makes A Game Look Realistic?

Posted by on 21 July 2016 - 06:09 PM

Realistic is a very misleading word the way we know it. When people say I want a game to look "realistic", what they really want is to have it look like it was filmed by a Holywood studio with special lighting setups (which may vary for different shots), makeup, particular hair styles that always highlight nice places and cover the not so pretty places, special lenses, and particular camera angles with a particular camera movement.


You know the phrase "even the girl from the fashion magazine doesn't look like the fashion model in the magazine" ? Same applies to "realistic" graphics, because people expect game's graphics to look like people in a magazine, TV shows, and movies. Which isn't realistic at all.


Therefore to look realistic we have to mimic what they do: Once we figure out the math stuff (proper BRDF, use HDR, Depth of field, wind effects, noise for a shaky camera effect), we need to setup lighting as in movie production (i.e. 3-point lighting is very popular), have the characters perform fashion-model-like walks for females, movie-like poses (3 point landing, anyone?), account for the 12 principles of animation, place the camera in strategic places, have camera shots change at the right times, have the depth of field focus what's important and leave what's unimportant out of focus.


Of course, high resolution textures, high polycount, motion capture and global illumination helps a lot; but it will only get you so far.

And of course, all of these "rules" can be broken. If you know what you're doing and know when to break them, it still looks good. When you don't, it looks crap. Just like crappy movies or your grandma's pictures (no offense to your grandma!).

#5301048 Dx11 Renderstate

Posted by on 16 July 2016 - 11:50 PM

I learn dx11 recently, and find the dx11 render state  is strange, why the render state is designed to an object?

Because of performance. Setting one at a time takes a lot more cycles as opposed to setting all of them at once.
Furthermore many parameters are actually inter-linked with each other, even if they appear completely unrelated:

  • For example modern GPUs require shaders to know the vertex layout (Input Assembly in D3D11 terms).
  • Depth write settings interact with alpha testing (to know whether to disable early Z optimization).
  • Cull mode interacts with shaders if the shader uses SV_IsFrontFace.
  • On certain mobile GPUs, blending modes are patched into the shaders.
  • Cull mode interacts with stencil (i.e. two sided stencil)

Because of these dependencies at the implementation level with seemingly unrelated features, changing the setting one by one would have to trigger a lot of flushing, whereas changing everything in one go allows for resolving all the dependencies because all of the data is supplied together (and validated beforehand by making the object immutable!).

TL;DR: Performance.

    if(material.context.depthenable == true);   context->setRS(depthenable, true);
   else context->setRS(depthenable, false);
 I don't know how to easy to handle the dx11 render state object.  Who can show me how you use these ridiculous things?

You're approaching it the wrong way, the D3D9 way. In order to approach the D3D11 way, you need to create your object beforehand:

struct Material
    bool mDepthEnabled;
    bool mDepthWriteEnabled;
    // ... many other settings

    ID3D11RasterizerState *mRasterizerState;
    ID3D11BlendState *mBlendState;
    ID3D11DepthStencilState *mDepthStencilState;

    //Always call after you're done modifying the material.
    void flush()
        SAFE_RELEASE( mRasterizerState );
        SAFE_RELEASE( mBlendState );
        SAFE_RELEASE( mDepthStencilState );
        D3D11_RASTERIZER_DESC rasterizerDesc;
        D3D11_RENDER_TARGET_BLEND_DESC blendDesc;
        D3D11_DEPTH_STENCIL_DESC depthStencilDesc;
        depthStencilDesc.DepthEnable = mDepthEnabled;
        depthStencilDesc.DepthWriteMask = mDepthWriteEnabled ? D3D11_DEPTH_WRITE_MASK_ZERO : D3D11_DEPTH_WRITE_MASK_ALL;

        //...Fill ALL the other settings...

        device->CreateRasterizerState( rasterizerDesc, &mRasterizerState );
        device->CreateBlendState( blendDesc, &mBlendState );
        device->CreateDepthStencilState( depthStencilDesc, &mDepthStencilState );

void setMaterial( material )
     //ASSUME material already has been flushed.
     assert( material->mRasterizerState && "You forgot to call flush!" );
     device->RSSetState( material->mRasterizerState ); //You could check if this is redundant. If it is, avoid calling again
     device->OMSetBlendState( material->mBlendState );//You could check if this is redundant. If it is, avoid calling again
     device->OMSetDepthStencilState( material->mDepthStencilState );//You could check if this is redundant. If it is, avoid calling again

The idea is simple: Create the material. Once it's set, initialize the D3D11 structures. This is all done at loading time, not every frame. Once that's done, don't modify the material frequently (at least the parts that need flushing). If you need to change parameters very often, it's better to use multiple materials instead.

#5301046 Maximum Size Of Vertex Buffer

Posted by on 16 July 2016 - 11:22 PM

You're passing vertex_data which is only 120 bytes of size and telling D3D11 to read 900.000 bytes of it. Naturally, it will crash.

#5300046 Multiple animations in one .dae file(blender doesn't support this)

Posted by on 10 July 2016 - 04:56 PM

Have you tried the OpenGEX exporter? The exporter may or may not support what you need and OGEX is infinitely superior to Collada.

#5299608 Frame buffer speed, when does it matter?

Posted by on 07 July 2016 - 09:29 AM

I presumed that interface width of the memory enables us to transfer more data in a shorter time

It enables to transfer more data in the same time, not in shorter time. It's a very important distinction.
Think of the problem as a truck travelling 500km and it takes them 5 hours to complete. The truck can only hold 1tn of cargo. If you use two trucks, you can send twice the amount of cargo. But it still will take 5 hours to complete.

Why am I asking this is (higher level view) because I'm interested in why HBM is beneficial and when does it stop being such.

It depends on something we call "bottleneck". A game that performs a lot of reads and writes may be bandwidth limited, thus memory that has higher bandwidth will run faster.
But if another game executes a lot of math (which uses the ALU units Hodgman describes) and that's most of what it does, then higher bandwidth won't do jack squad because that's not the bottleneck.
Going back to the truck example:

You have to transfer 2tn of cargo. You have one truck. This is your bottleneck. You need 5hs to travel 500km and send 1tn, then another 5hs to get back and load the rest. Then 5hs more to travel 500km again. In total all the travelling took 15hs by using one truck.
If you use two trucks, you'll be done in 5hs. Memory bandwidth and bus bandwidth behave more or less the same. Because you can send more data in the same amount of time, but you needed a lot of data to send; doubling the amount of data you can transfer allows you to finish sooner only if it's the bottleneck. But you can never go less than 5hs in one trip. (Why? you ask? because GPUs can't send data faster than the speed of light)
Now let's add the "ALU" to the example: Let's suppose all you have to send in the truck a machine that weights only 70kg (that's 0.07tn). However disassembling the machine for transportation and load it into the truck takes you 8 hours. The truck then begins its journey and takes 5hs. Total time = 13hs.
You could use two trucks... but it will still take you 13hs because having an extra truck doesn't help you at all in disassembling the machine. What you need is an extra hand, not another truck. The bottleneck here is in disassembling the machine, not in transportation.
In this example people = ALU; trucks = bandwidth.
More people = you can disassemble and load the machine into the truck faster.
More trucks = you can send more cargo per trip.
More ALU = you can do more math operation in the same amount of time.
More bandwidth = you can do more loads and store from/to memory in the same amount of time.
So, to answer your question: does an increase of bandwidth make a game run faster? It depends.