Jump to content

  • Log In with Google      Sign In   
  • Create Account

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Today, 11:14 AM

#5308536 Why so many fence/sync types?

Posted by on 29 August 2016 - 01:36 PM

OpenGL has core and extensions.


Extensions start by being proposed by a vendor. GL_NV_sync is such example (it was proposed and implemented by NVIDIA, although other vendors can also implement it if they want).

When an extension becomes really useful/widespread but needs some tweaking (i.e. a different behavior in edge cases, a different interface to accomodate for certain hardware) it may be promoted to ARB (IIRC ARB stands for Architecture Review Board, but don't quote me on that). Which is what happened when GL_NV_sync became GL_ARB_sync.

In some cases, an extension is vendor agnostic but it's not popular/stable enough to be ARB, so its name will say EXT.

Once it becomes really useful it can get into core. GL_ARB_sync became into core in OpenGL 3.2; which means it's guaranteed/mandatory to be present starting from OpenGL 3.2


GL_APPLE_sync (GL ES) & GL_APPLE_fence (GL) are Apple's way of doing it. However since Apple already supports GL 3.2; that means you can use GL_ARB_sync.


Long story short you only need to aim for three:

  1. GL_ARB_sync (Windows, Linux, OSX, all vendors; unless you don't target GL core >= 3.2)
  2. GL_APPLE_sync (iOS)
  3. EGL_ANDROID_native_fence_sync (Android)

The rest are just historic relics that vendors must support so old games can still run.



#5308405 Why can't I print out my std::string vector?

Posted by on 28 August 2016 - 07:37 PM

Are you by any chance using XCode?


Btw one more thing: It's a security risk to do printf( myCStringVariable ); Perform instead printf( "%s", myCStringVariable );



#5306604 Shader array limit?

Posted by on 18 August 2016 - 01:39 PM

Using SSBO is overkill. The problem is you're requesting 90 UBOs instead of an UBO with 90 elements in it.
Change your code to:

struct CSpotLight
vec3 vColor;
vec3 vPosition;
vec3 vDirection;
float fConeAngle;
float fConeCosine;
float fLinearAtt;
bool Enabled;

uniform SpotLightBuffer
CSpotLight SpotLights[90];
} spotLightBuffer;

//Then access it via:

#5305582 How Long Will Directx Last For?

Posted by on 12 August 2016 - 10:32 PM

The same applies to DirectDraw. In the beginning they were separate, one for 2D the other 3D.

Beginning DirectX 8, DDraw began a slow deprecation (moving towards doing everything in D3D) until it was completely phased out in DirectX10.


DDraw could have bugs or not, just like D3D applications. But it's more likely to have fewer bugs. A major advantage is that on Windows XP DDraw acceleration can be turned off as CPUs should be fast enough the kind of work we supplied in the 90's to DDraw; and in Windows Vista+ it can be turned off via the DirectX Control Panel switch or reg keys (though not beginners).

#5305224 Instancebasevertex Perf Hit

Posted by on 10 August 2016 - 11:03 PM

  1. Avoid coherent bit. When you modify a buffer, use glFlushBufferRange to notify the driver which regions are dirty. Make sure you merge your flushes (i.e. don't call glFlushBufferRange 7 times for 7 contiguous chunks; just call it once at the end before submitting your drawcalls with one huge chunk)
  2. Persistent bit will cause the driver to keep the data in host-visible memory (either System RAM or slower VRAM). This is bad.
  3. Don't use the Write bit. This will prevent the driver from keeping the buffer in device only memory.
  4. The correct way is to create two buffers: 1 in device only memory; another with persistent+write bits. You write to the latter from the CPU. Then you copy the data to the former using glCopyBufferSubData (it's like a GPU->GPU memcpy). The second buffer is commonly referred to as "the staging buffer" because it acts like an intermediary stash to talk between CPU and GPU. Once you're done you can destroy the staging buffer or keep it around to reuse it for another transfer for something else.
  5. Ignore points 2, 3 & 4 for dynamic buffers (i.e. data that is re-generated every frame in CPU and sent to GPU). In this case just write to a persistently mapped buffer directly.

#5305019 How Long Will Directx Last For?

Posted by on 09 August 2016 - 10:17 PM

This is a tricky question:


If you have DirectX 9/10/11 installed, then you have the older versions still installed. This means e.g. a DirectX 6 application will still work.

However starting Windows Vista, Retained Mode was deprecated (which was used optionally by DirectX 3.0, DX 5 and DX 6 applications). Apps using Retained Mode needed lots of workarounds and not all of them may work on Vista+. But if they didn't use Retained Mode, then they should work (in theory) even on Windows 10.


Furthermore, vendors rarely test their latest GPUs against such old DIrectX versions, so it's possible it won't work correctly on certain GPU / driver combinations (specially if the app didn't actually follow the DX spec rigorously, which was very common). Though most of the time these programs would often had trouble running in D3D mode even back in their time, developers just blamed on the GPU, the driver, or Windows 98. Examples out of my mind are Grim Fandango, Final Fantasy VIII, Startopia (enabling HW TnL would result in black floor).

You may get it to work using RGB software rasterizer or via virtual machine with custom drivers.


Furthermore, before DX10, DirectX had a 3-version backwards compatibility. That meant a GPU designed for DirectX 5.0 would still work on an app using DirectX8, but not on DirectX9.


Starting DX10, Microsoft introduced Feature Levels. An app would only run if it supported the required minimum feature levels by the GPU. So an app using DirectX 10 Feature Level 9.1 would probably run on a lot of GPUs; while still using the DX11/10 API.

#5304921 Problem Using Compute Shader To Write To Texture

Posted by on 09 August 2016 - 11:26 AM

  • I strongly suggest you create your texture via glTexStorage2D, not glTexImage2D
  • Your glMemoryBarrier call needs to be performed after the dispatch call, not before. Additionally, it should be GL_TEXTURE_FETCH_BARRIER_BIT not GL_SHADER_IMAGE_ACCESS_BARRIER_BIT, because you're writing to it as an image, but you want the reads as texture fetch to be correct.
  • Beware a local_size_x = 1 and local_size_y = 1 is very inefficient. You'll literally be using one wave per wavefront. (performance). I suggest local_size_x = 8 and local_size_y = 8, with a glDispatchCompute( 2, 2, 1 );
  • Make sure to use performance keywords, i.e. layout(rgba32f) uniform restrict writeonly image2D destination; (performance)
  • Becareful with glUseProgramStages. Could be disabling your compute / vertex / pixel shaders.
  • Check for GL errors. Enable the Debug extension (you need to create the GL context with Debug flag as well).
  • Use RenderDoc.

#5304498 Conceptual Question On Separate Ps/vs Files

Posted by on 07 August 2016 - 10:19 AM

Hodgman pretty much covered everything I just have one more thing to add:


You don't have to use separate files for PS/VS if you don't want to -- you can specify which function is the entry-point of the shader, and compile the same file twice using two different targets (ps/vs) and two different entry points (ps_main, vs_main, etc). Personally I like to keep them in the same file.
You can also use #include to pull code from other files into the one that's being compiled.

I'd like to keep them too in the same file, but keep in mind if you're aiming at cross platform some day (i.e. OpenGL + GLSL); in GLSL you must declare the vertex and pixel shaders in separate files. Although you can also use macros to workaround, i.e.

void main()
vec4 main()
    return vec4( 1.0, 1.0, 1.0, 1.0 );

#5304205 Ide For Linux

Posted by on 05 August 2016 - 08:50 AM

QtCreator is what you're looking for. Lightweight, fast, supports everything you asked.


I'd disregard Eclipse. The biggest heavyweight slow IDE ever made. Hurts productivity a lot.

#5303252 Multithreading Library Preference

Posted by on 30 July 2016 - 05:19 PM

Thanks Adam, so it's not the performance overhead keeps game studios away from using them? (Like STL the reason most studio doesn't use it is because stl's bad debug performance and overhead)

Yes performance is also a reason. When you depend on a platform-specific library, it's waiting for problems. e.g. while a platform's performance may be decent, turns out it can suck on a different one because the compiler vendor rushed their release to support the latest C++ feature set.


I'm talking about the ones provided by a standard library or compiler. This doesn't apply to TBB, but like Adam_42 said, TBB is not free and adds yet another dependency on what most AAA developers will consider trivial (threading is really hard, but wrapping sync primitives is easy, and the key is keeping things simple. TBB adds more complex functionality that may not be needed or could be specialized for a game).

#5302971 How Do You Deal With Errors On Gpus? Do You At All?

Posted by on 28 July 2016 - 12:53 PM

The code you posted starts with _texCount uninitialized, which won't work as intended. It doesn't start with 0s unless you fill it. If you do fill it in your actual code, you need to sync that as well.

#5302955 How Do You Deal With Errors On Gpus? Do You At All?

Posted by on 28 July 2016 - 11:01 AM

Later when processing for (uint i = array[i]; i<array[i+1]; i++), because of unsigned numbers the difference overflows and gives a huge number close to 0xFFFFFFFFu.
Now i don't know if long runtime or out of buffer writes cause the blue screen, but i know why it happens.

I'm 99% sure what's happening is that because you get a bad loop count, your GPU will take too long to respond and thus you run into TDR.

The hardware bug (if so) seems to ignore my barriers, so it can happen that array[i+1] is smaller than array[i] - usually array[i+1] MUST be >= array[i].
(This also happens with work group size of 64, but less often)

GPU threading is hard.

How are you issuing your barrier? Beware in GLSL memoryBarrier does only half of the job. You also need a barrier:

//memoryBarrierShared ensures our write is visible to everyone else (must be done BEFORE the barrier)
//barrier ensures every thread's execution reached here.

#5302931 Anyone Tried Uploading A 5-Byte File To Php Before?

Posted by on 28 July 2016 - 08:50 AM

If the host is a cheap hosting service not controlled by you, it's very possible the webhost has lame "anti-hacking" measures enabled via custom apatche/php hook and htaccess rules that reject certain input when given in specific order because it matched a malware that used similar input to compromise an unpatched server a long time ago. I've seen that kind of crap before.

#5301847 What Makes A Game Look Realistic?

Posted by on 21 July 2016 - 06:09 PM

Realistic is a very misleading word the way we know it. When people say I want a game to look "realistic", what they really want is to have it look like it was filmed by a Holywood studio with special lighting setups (which may vary for different shots), makeup, particular hair styles that always highlight nice places and cover the not so pretty places, special lenses, and particular camera angles with a particular camera movement.


You know the phrase "even the girl from the fashion magazine doesn't look like the fashion model in the magazine" ? Same applies to "realistic" graphics, because people expect game's graphics to look like people in a magazine, TV shows, and movies. Which isn't realistic at all.


Therefore to look realistic we have to mimic what they do: Once we figure out the math stuff (proper BRDF, use HDR, Depth of field, wind effects, noise for a shaky camera effect), we need to setup lighting as in movie production (i.e. 3-point lighting is very popular), have the characters perform fashion-model-like walks for females, movie-like poses (3 point landing, anyone?), account for the 12 principles of animation, place the camera in strategic places, have camera shots change at the right times, have the depth of field focus what's important and leave what's unimportant out of focus.


Of course, high resolution textures, high polycount, motion capture and global illumination helps a lot; but it will only get you so far.

And of course, all of these "rules" can be broken. If you know what you're doing and know when to break them, it still looks good. When you don't, it looks crap. Just like crappy movies or your grandma's pictures (no offense to your grandma!).

#5301048 Dx11 Renderstate

Posted by on 16 July 2016 - 11:50 PM

I learn dx11 recently, and find the dx11 render state  is strange, why the render state is designed to an object?

Because of performance. Setting one at a time takes a lot more cycles as opposed to setting all of them at once.
Furthermore many parameters are actually inter-linked with each other, even if they appear completely unrelated:

  • For example modern GPUs require shaders to know the vertex layout (Input Assembly in D3D11 terms).
  • Depth write settings interact with alpha testing (to know whether to disable early Z optimization).
  • Cull mode interacts with shaders if the shader uses SV_IsFrontFace.
  • On certain mobile GPUs, blending modes are patched into the shaders.
  • Cull mode interacts with stencil (i.e. two sided stencil)

Because of these dependencies at the implementation level with seemingly unrelated features, changing the setting one by one would have to trigger a lot of flushing, whereas changing everything in one go allows for resolving all the dependencies because all of the data is supplied together (and validated beforehand by making the object immutable!).

TL;DR: Performance.

    if(material.context.depthenable == true);   context->setRS(depthenable, true);
   else context->setRS(depthenable, false);
 I don't know how to easy to handle the dx11 render state object.  Who can show me how you use these ridiculous things?

You're approaching it the wrong way, the D3D9 way. In order to approach the D3D11 way, you need to create your object beforehand:

struct Material
    bool mDepthEnabled;
    bool mDepthWriteEnabled;
    // ... many other settings

    ID3D11RasterizerState *mRasterizerState;
    ID3D11BlendState *mBlendState;
    ID3D11DepthStencilState *mDepthStencilState;

    //Always call after you're done modifying the material.
    void flush()
        SAFE_RELEASE( mRasterizerState );
        SAFE_RELEASE( mBlendState );
        SAFE_RELEASE( mDepthStencilState );
        D3D11_RASTERIZER_DESC rasterizerDesc;
        D3D11_RENDER_TARGET_BLEND_DESC blendDesc;
        D3D11_DEPTH_STENCIL_DESC depthStencilDesc;
        depthStencilDesc.DepthEnable = mDepthEnabled;
        depthStencilDesc.DepthWriteMask = mDepthWriteEnabled ? D3D11_DEPTH_WRITE_MASK_ZERO : D3D11_DEPTH_WRITE_MASK_ALL;

        //...Fill ALL the other settings...

        device->CreateRasterizerState( rasterizerDesc, &mRasterizerState );
        device->CreateBlendState( blendDesc, &mBlendState );
        device->CreateDepthStencilState( depthStencilDesc, &mDepthStencilState );

void setMaterial( material )
     //ASSUME material already has been flushed.
     assert( material->mRasterizerState && "You forgot to call flush!" );
     device->RSSetState( material->mRasterizerState ); //You could check if this is redundant. If it is, avoid calling again
     device->OMSetBlendState( material->mBlendState );//You could check if this is redundant. If it is, avoid calling again
     device->OMSetDepthStencilState( material->mDepthStencilState );//You could check if this is redundant. If it is, avoid calling again

The idea is simple: Create the material. Once it's set, initialize the D3D11 structures. This is all done at loading time, not every frame. Once that's done, don't modify the material frequently (at least the parts that need flushing). If you need to change parameters very often, it's better to use multiple materials instead.