Matias Goldberg

Members
  • Content count

    1660
  • Joined

  • Last visited

Community Reputation

9563 Excellent

About Matias Goldberg

  • Rank
    Contributor

Personal Information

  1. I agree with frob, plus one more thing: it's really easy nowadays to record pictures, video and audio by just a few swipes away with your hand. Most of these teachers have been busted with hard evidence (mostly because they were foolish enough to save pictures of the encounters).
  2. There's an issue with that: You don't have the guarantee that all warps will be working on the same primitive. Half of Warp A could be working on triangle X, and the other half of Warp A could be working on triangle Y. GPUs make some effort to keep everything convergent; but if they were to restrict triangle X to a set of warps; and triangle Y to another set to Warps, it would get very inefficient quickly.   I am curious: why are you asking these extremely low level questions? Knowing the insides of your GPU is important, specially if you want to squeeze the last drop of it both in a technique you want to do and performance you want to achieve. However without specifying a particular set of HW, GPUs are very heterogeneous. They're not like x86 CPUs which all work relatively similar because they have to produce perfectly identical results. Although there is some common ground, more than half of these answers will change in 2 years. Specializing in a particular HW is more useful (i.e. GCN is present in PC, XBox One & PS4; PowerVR is present in Metal-capable iOS devices). For example you ask about TMUs, yet TMUs no longer exist as such concept. It's much more complex and very GPU-specific. For instance Mali GPUs do not have threadgroup/LDS at all. They emulate it via RAM fetches. Therefore any optimization that relies on the use and reuse of threadgroup data in GCN (and other GPUs) hurts a lot in Mali. It's like learning how to drive a car and asking how atoms of a car battery move from one end to another to power the car's instruments. Yes, if you want to be the best driver perhaps this knowledge could be of use to you to be on the top 3 best drivers of the world; however you need to sit on the car and feel the wheel first. Btw this is a nice resource on latency hiding on GCN. If you want to learn the deep internals of each HW, I recommend you start by reading their manuals: https://01.org/linuxgraphics/documentation/hardware-specification-prms https://www.x.org/wiki/RadeonFeature/ (Go to "Documentation") https://static.docs.arm.com/100019/0100/arm_mali_application_developer_best_practices_developer_guide_100019_0100_00_en2.pdf http://malideveloper.arm.com/downloads/OpenGLES3.x/arm_mali_gpu_opengl_es_3-x_developer_guide_en.pdf GPUOpen The presentations on SIGGRAPH and GDC are also very useful.
  3. First, like Hodgman said, you don't need 3 of everything. Only of the resources you would consider "dynamic". Also "static" resources you want them to be GPU-only accessible, so that they always get allocated in the fastest memory (GPU device memory); while dynamic resources need obviously CPU access. Second, you don't need 3x number of resources and handles. Most of the things you'll be dealing with are going to be just buffers in memory. This means all you need to do is reserve 3x memory size; and then have a starting offset:   currentOffset = baseOffset + (currentFrame % 3) * bufferSize; That's it. The "grand design of things" is having an extra variable to store the current offset. There is one design issue you need to be careful: you can only write to that buffer once per frame. However you can violate that rule if you know what you're doing by "regressing" the currentOffset to a range you know its not in use (in GL terms this is the equivalent of doing GL_MAP_UNSYNCHRONIZED_BIT|GL_MAP_INVALIDATE_RANGE_BIT and in D3D11 of doing a map with D3D11_MAP_WRITE_NO_OVERWRITE). In design terms this means you need to delay writing to the buffers as much as possible until you have everything you need, because "writing as you go" is a terrible approach as you may end up advancing the currentOffset too early (i.e. thinking that you're done when you're not), and now you don't know how to go regress currentOffset to where it was before; so you need to grab a new buffer (which is also 3x size; so you end wasting memory).   If you're familiar with the concept of render queues, then this should be natural; as all you need is for Render Queues to collect everything and once you're done; start rendering what's in those queues.   Last but not least, there are cases where you want to do something as an exception; in which cases you may want to implement a "fullStall()" which waits for everything to finish. It's slow, it's not pretty; but it's great for debugging problems and for saving you in a pinch.
  4. What you're trying to do is known as shader dynamic control flow, and it is forbidden to do in the pixel shader in GLES 2.0 (most ES2 hardware is incapable of doing it), and IIRC also in ES 3.0 (not sure about that last one).
  5. Taken from here.   That is NOT what you described in your original post. You're talking about converting your in-game currency into something that has value in real life outside of your game. What Fyber, Tapjoy, Supersonic, inMobi, and Google gift card API do works in the opposite direction (turn real life money into in-game currencty)
  6. Whoa, it ain't supported in any GL 3 level hardware.    Yeah, NV's notion of "widely-supported ARB_clip_control" must be different from ours.
  7. It does if GL supports GL_ARB_clip_control extension (mandatory since OGL 4.5), where you can call glClipControl( whatever, GL_ZERO_TO_ONE ); to change the default depth range from [-1;1] to [0;1]
  8.   (cough) Heartbleed (cough) Proprietary's nothing to do with it. The difference between what happened in heartbleed is that heartbleed was a bug, while an OS like Windows simply has weak security by default, for "friendliness". Replacing dynamic libraries on Windows by malicious version is pretty easy, files and folders have weak permission system. The protocol that this current virus exploits is for network transfer, while there's nothing special about accessing files or folders and then to modify them. Not to mention that even if all that was good, it's still a stupid thing of the government of anywhere to rely on closed-source software.   You do realize that for every Windows exploits that got leaked from the NSA, there's like 5 leaks for *nix OSes, right? Linux has had extremely very bad exploits: Heartbleed Shellshock Debian Fiasco X11 is impossible to implement a secure lockscreen or screensaver. This is not fixed as of today. Unless you use Wayland... and when is Wayland adoption going to become wide spread? I'm tired of waiting... OpenGL drivers (including Mesa) return GPU memory without zero-initializing it first (which is a MASSIVE security hole). This is not fixed as of today. Just today was released patch to a lightdm bug allows guest users to access any file. I agree that basic infrastructure should be running in FOSS software and not proprietary. But asserting FOSS software is more secure than proprietary just because it's open source is blatantly wrong. Stop trolling.
  9. You're correct on all accounts, however you forget the physics is still updated very fast with no lag. To put it in another way, play a game blindfolded, with only sound cues or playing by memory; and you'll still be able to react and the physics engine will process your input immediately. Because the visuals are only 1 frame behind at 60fps, it's not that big of a deal (it is, but it's not the end of the world. Now if the framerate is lower...). Another issue you're forgetting is that the distance between physics & graphics may not be an exact frame (because it depends on graphics' framerate). The visual may be up to 1 frame behind. But they may be less (i.e. 0.5 frame behind, 0.2, 0.1). If both graphics & physics are updating at exact multiples then you may end up being 1 frame behind. You can also try to disable triple buffer to compensate.
  10. _Silence_ is right, the shader could belong to the FF pipeline. Alternatively, it is also possible the driver is clearing the colour buffer by rendering a fullscreen tri, if that's the case perhaps that's what's triggering it too.
  11. I'm replying you here so the answer is available for everyone who is also looking for this: Fortunately you're using the open source driver Mesa3D for Linux, so finding in the files makes MUCH easier understanding what it means: https://github.com/mesa3d/mesa/blob/master/src/mesa/drivers/dri/i965/brw_vs.c#L114   Now it's crystal clear what's going on: The driver is telling you it has to internally recompile a shader because some GL state has changed. You probably called glClampColorARB( GL_CLAMP_VERTEX_COLOR_ARB, TRUE ); and then later on called glClampColorARB( GL_CLAMP_VERTEX_COLOR_ARB, FALSE ); or something like that. But for this chip, there is no fixed function hardware to deal with that, so it is done directly with shader instructions. If you alter this setting, the driver internally has to use two different shaders. That's why Vulkan created PSOs, so that bullsh*t like this wouldn't end up in one shader actually mapping to multiple shaders depending on settings such as vertex format and render targets.   If you changed the setting by accident and you don't need it; then avoid it. It's costing you CPU cycles. If you cannot do that, then you'll have to ignore the warning. However with this knowledge, you may take advantage of that. If you have a loading screen, draw with vertex clamping disabled & then enabled; so that you give the driver the chance to compile both shaders and avoid a hiccup later on during rendering.   Edit: Another way to avoid this warning is to have two copies of the shader with identical source code; but one you will use it with colour clamping exclusively, the other one without colour clamping.
  12. Depends. If you do: float4 val0 = texture( tex0, uv ); float4 val1 = texture( tex1, uv ); finalVal = uv.y > 0.5 ? val1 : val0; There is no branch, and the cost is only slightly less performance, and higher power consumption (since you're consuming twice the bandwidth). But if you do: vec2 uvDdx = ddx( uv ); vec2 uvDdy = ddy( uv ); if( uv.y ) val = texture( tex0, uv, uvDdx, uvDdy ); else val = texture( tex1, uv, uvDdx, uvDdy ); Then the performance impact could be quite serious (the cards you're targeting are very bad at hiding latency) but the power consumption should stay in check. It boils down to whether the performance decrease you get puts you below the minimum you are targeting and decide to consume more power instead.   Smaller resolutions means less bits in the texture unit to perform addressing and filtering. Less bits means less transistors; which translates to lower costs, less power consumption therefore less heat and increased battery life.
  13. You can. What he is saying is that there's a very high chance that input.Tex.x & input.Tex.y are floats in range [0; 1] when they should be uint in range [0; texWidth) and range [0; texHeight). So basically when you write to pixel location (0.75; 0.75) you end up writing to (0, 0) (the first pixel) instead of writing to (768; 384) of a 1024x512 texture.
  14. You don't need to create multiple contexts. You can have multiple HDC associated with the same HGLRC (GL context). When rendering to each window, you need to do: wglMakeCurrent( hdcWindow0, context0 ); // ... Draw ... wglMakeCurrent( hdcWindow1, context0 ); // ... Draw ... wglMakeCurrent( hdcWindow2, context0 ); // ... Draw ... wglMakeCurrent( hdcWindow3, context0 ); The only caveats are: All windows must have the same pixel formats (i.e. 32bpp/16bpp, MSAA settings) Only use VSync on one of the windows (i.e. the last one), otherwise 4 windows with VSync in all of them will get you 15 fps (assuming a 60hz monitor). This is far more stable and easier to work with than the clusterf*** that is working with shared contexts.
  15. Although I agree with L. Spiro, Blender has a toggle that allows you to switch between them (and converts the numbers being displayed); which is very convenient for us programmers when we need a value in linear space (such as for example the clear colour to pass to glClearColor).