Matias Goldberg

  • Content count

  • Joined

  • Last visited

Community Reputation

9583 Excellent

About Matias Goldberg

  • Rank

Personal Information


  • Twitter
  1. DX11 Constant buffer and names?

    No. That declaration is just fine. What the alignment means is that if you've got: float3 a; float2 b; Then the address of b when you write the data from C++ starts at 0x0000010 instead of starting at 0x0000000C because there's 4 bytes of padding between a & b Please read the msdn article BrentMorris left you. It has plenty of examples on how the padding works.
  2. DX11 Constant buffer and names?

    Normally we explicitly define the register slots. So for const buffers you would do: cbuffer MyBuffer0 : register(b0) { // Declarations.. }; cbuffer MyBuffer1 : register(b1) { // Declarations.. }; cbuffer MyBuffer2 : register(b2) { // Declarations.. }; If you do not explicitly tell the register slots, the compiler will assign them for you and you have to retrieve them via HLSL reflection (which is cumbersome and error prone). When you call VSSetConstantBuffers( 0, ... ) the 0 will correspond to MyBuffer0, and VSSetConstantBuffers( 1, .. ) will correspond to MyBuffer1, etc. In the case of your buffer: cbuffer MatrixBuffer { matrix worldMatrix; matrix viewMatrix; matrix projectionMatrix; }; If the buffer you bind via VSSetConstantBuffers is less than the 192 bytes required for this structure (4x4 x 4 bytes per float x 3 matrices) the debug layer will complain, but you are guaranteed that reading const buffers out of bounds will return 0.
  3. Getting Real Instruments?

    If the project is commercial / for profit / will generate revenue, I'll advise creating some form of contract to determine what happens in the eventuallity of making profits. The local orchestra may do you a favour if it thinks your project is cool, but if you end up making loads of money they'll definitely sue for a chunk of that money. But obviously if you put up some form of contract in advance, then likely they won't just do it "as a favour" either. I like and am thrilled at the idea of using small local orchestras, but if you make money and they don't see a penny they will righteously feel ripped off.
  4. Depth-only pass

    MJP explained it well, your VS math has to match exactly due to floating point inconsistencies What I wanted to add, is that a depth prepass does no performance benefit to a deferred shader. A depth prepass has a cost (CPU wise all the commands issued twice, GPU wise vertex shader is ran twice, rasterizer works twice as hard); and it is only an optimization if this cost is lower than the cost of running expensive pixel shaders multiple times for the same pixel. This can be true for forward shading, but for Deferred Shading, this is rarely the case as the pixel shaders tend to be very cheap (sample albedo, sample some other parameters, calculate normals, finally export to render target)
  5. DX11 24bit depthbuffer is a sub-optimal format?

    Yes, but GL defaults to depth values being in range [-1; 1] which breaks it (reversed Z Buffer works if the range is [0; 1]). You can make it work by overriding that default by calling glClipControl with GL_ZERO_TO_ONE. But you need OpenGL 4.5 or the extension GL_ARB_clip_control to get it to work. If neither the extension or OpenGL 4.5 is present, you can't call glClipControl and the reversed Z buffer trick won't improve precision.
  6. "Optimal read performance" is relative. It's not the same to read the data once and never read it again during the frame, than to read it over and over a thousand times. Your mileage may vary. The best approach you could do is to prepare your engine's code in such a way that toggling between strategies becomes easy.
  7. DX11 24bit depthbuffer is a sub-optimal format?

    ajmiles has already explained it, but I just wanted to put it in clearer words: The GPU only has to pretend you get what you ask via DirectX. It doesn't have to do exactly that way internally as long as it produces the same results.
  8. OpenGL AMD horrible OpenGL performance

    I haven't seen a GPU have to fall back to software emulation in a decade. Unless his GPU is very old (eg. ATI Radeon X1800 series) this shouldn't be the problem. The reason your GL is slow on AMD could be serveral: Your code could be highly innefficient. Google "AZDO". You're abusing glMap* calls. Likely you don't discard correctly, or you're reading from a write-only buffer, or you're reading from from memory that is not optimized for reading. You indeed hit a driver bug (doubtful) Check with a profiler if something obvious pops out. Edit: I've checked out one of your posts talks about VSM shadow mapping. I implemented myself ESM (Exponential Shadow Maps) and noticed the AMD Radeon HD 7770 has no problem with it, but my older ATI Radeon HD 4650 Mobility has severe performance problems with it; most likely these older GPUs are not good at sampling with bilinear filtering from 16-bit float texture formats.
  9. I want to clarify many things because you're confusing a lot of concepts. All the games you just mentioned are long linear story driven narrative games. The difference between these games (DAI, GTA & Fallout vs Uncharted, Halo, The Last of Us, Half Life 2 & Persona 5). is that the former is an open world game with lots of side quests, and the latter are not open world (but rather area-based, level-based or chapter-based) and few side quests. Technically, making long linear story driver narrative game is literally the easiest part. It's just writing a very long script, like writing a book (note: writing a good book people want to read is hard, but in comparison it's the easiest part of making the game). Early text-based game fit that description. Graphics Adventure games also fit that description. What's hard is making a game to feel fun to play, keeping the game balanced, making all the art assets, animating the cutscenes, setting up the pacing correctly (cutscenes vs gameplay), ensuring the voice acting matches the animation, and setting up lots of side quests that are bug free (this is VERY hard) and don't contradict the main story (i.e. you can't be acting all Superman against an optional boss, reviving a forgettable character in a side quest; and then the main story treat you like you're just a regular mortal and toss in the perma-death of a main character) and many more details that make a game feel like an AAA game. Zelda: Breath of the Wild is not a "story driven narrative game" at all, however it is very similar in terms of scope, length, difficulty and look to DAI, GTA & Fallout because it's open world with lots of sidequests.
  10. Oh btw on loops: If your loop is based on equality of floating point, then there could be issues. For example: for( float x=0; x == 0.3; x += 0.1 ) { } May spin forever due to precision issues.
  11. I'm not sure I understand by "bad values to an intrinsic function". As for your loops, if they look like these: for( int i=0; i<4; ++i ) { } //Or this: #define LOOP_COUNT 4 for( int i=0; i<LOOP_COUNT; ++i ) { } It's fine. But if it looks like this: uniform int myConstValue; for( int i=0; i<myConstValue; ++i ) { } Then the value you pass to myConstValue is potentially dangerous (you better never send an insanely huge value) Normally yes, but lots of things can happen to report the wrong enum (driver bugs, the GPU actually hung while switching) Both. Multithreading is hard to get right. You said you properly put a mutex around the immediate context... but do you really put the mutex on every single usage of the immediate context? Is it also possible the mutex is malfunctioning (i.e. unlocking from a thread without locking it first)? Additionally, a driver may be reading data from the immediate context and assuming it's fully single threaded so it begins to read the data from a worker thread while you're actually still writing to it from a secondary thread. Technically, this would be a driver bug. It may even be fixed by now, but your user could be running an old driver.
  12. Usually this problem happens because you have: Corrupted memory or similar memory error. e.g. setting a dangling pointer as a texture SRV is bad. Shader being used with uninitialized data (const buffer, tex buffer, vertex buffer, etc) Infinite loop inside shader. Be it vertex, compute, or pixel shader. This is often caused by uninitialized variables, or variables with very large values or NaNs. Some very obscure API usage the debug layer didn't catch. However you ruled out most of these (except point #1 & #4). #1 is debugged the same way you debug any kind of memory corruption (either use a third party tool or override malloc and hook your own sanitizer). Other causes for these issues are: Out of date drivers. Seriously. This happens very often. Ask for driver version. If it's very old, ask them to update their drivers. This happens more often than you think. GPU problems that magically go away after updating drivers. Overclocked / overheating systems. Simple games will often allow everything in the GPU to run at 100%, something AAA games often don't achieve (because there's usually a huge bottleneck somewhere). It would explain why using VSync helps with the problem. Switchable graphics. Some notebooks may have Intel + NVIDIA GPUs combination (or Intel + AMD, but that's less common) and for some reason the system may have decided to switch the GPU (e.g. battery, thermal throttling). Monitor issues. The user literally detached / unplugged the monitor to which the active GPU was rendering to. More common on laptops and Win 10 tablets. Third party applications. Apps like MSI Afterburner,, Mumble hook themselves to D3D11 to intercept game's calls and either capture video or render overlays on top of it. They can also cause problems. Having a dump of all active processes when the game hung the GPU can be a good way to rule this out. If you find a common third app between a large percentage of these users, ask them to turn it off. Errr, unless you're really really good at multithreading, you shouldn't be making D3D API calls from other threads. It's asking for a lot of problems. This also explains why VSync diminishes the problem, since you're likely in a race condition and now the access patterns have changed. IIRC accessing the immediate context from two threads is not allowed, even if protected by a mutex. Update: It's allowed, but still you're asking for trouble. Also you better be synchronizing your context absolutely perfect.
  13. GitHub Public Repo Password Protection

    When you push to your repo, one of three things should happen (assuming you rebooted your computer): The tool you're using to push to github asks for your password. The tool you're using to push to github asks for your SSH password to decrypt it. You're not asked. That means the tool either has your github password stored in plain text somewhere on your hard drive, or the SSH key is stored unencrypted somewhere on your drive; which I dislike because anyone with access to your computer (i.e. someone steals your PC, breaks into your home, or infects your system with a virus/trojan) could steal your password and/or SSH key. If you're in option 3; I'd advise about looking into the settings of your tool online so that it doesn't save the password, or so that your SSH key is encrypted. Every time you reboot your PC, your tool should ask you for the password if security is what concerns you.
  14. GitHub Public Repo Password Protection

    Depends on what you see as "password protected" when it comes to public repos. GitHub uses passwords to prevent unauthorized users from modifying your repository (e.g. writing to it, making changes, pushing to it). However anyone can see everything you pushed to your public repos, download it (either via clone or pulling), and fork your repository, without needing a password. They can also make their own changes and upload those changes to their own forks.
  15. You're still confusing things. In a lockstep environment, the server will receive client inputs and it must apply them in the order of their frames. Anything else will cause desync. This means the server can't simulate too far behind because it must wait for everyone's input. And this is why it doesn't scale to many users. In a prediction-based, server-based network model (aka Quake's Multiplayer), client inputs can be applied in any order. But typically for responsiveness reasons you'll want to apply them in the order they're received (inputs aren't frame numbered, but packets still are sequenced) and discard inputs belonging to past packets. For example if you receive packet 0, packet 2, and packet 1, in that order, then packet 1 should be ignored (unless you're receiving all those packets at the same time, in which case you sort them first, and apply them in order). This potentially means if the user hit a button for one frame and its packet gets lost or reordered, then the server will never see that he pushed that button. But that's rarely an issue because: In a UDP model, most packets actually arrive just fine for most of the time. The user isn't that fast to push a button for just 16.66ms Button presses that need to be hold down (like firing a weapon in a shooter, or moving forward) aren't a problem. Worst case scenario, you can send this "button pressed" message repeated in several packets, and the server gives it a small cooldown to prevent acting on this button push twice; or instead of a cooldown, this message is sent with a "I hit this important button 2 frames ago"; and the server keeps a record to see if that was done. If it wasn't, then we do it now. Alternatively, worst case scenario the user will push that button again. To put it bluntly, a client-server Quake style model is like a mother and her child. The child has a toy gun, but the toy only makes a sound when the mother pushes a button in a remote control in her hand. The kid fires his toy gun but nothing happens, then suddenly 5 seconds later the toy gun begins making sound. The child says "Why mom!?!? I pressed this button 5 seconds ago! Why is it only reacting now!?" And the mother replies: BECAUSE I SAY SO. Client/Server models are the same. The client says what it wants, but the server ends up doing what it wants. (have you ever played a shooter where you're clearly shooting at an enemy but he doesn't die? and suddenly you're dead???) Now, the internet is unreliable, but it isn't that unreliable. It's not chaos. Normally most packets arrive and they arrive in order, and when they don't, it's hard to notice (either because nothing relevant was happening, or because the differences of what the client said it wanted and what the server ended up doing are hard to spot) and this is further masked via client side prediction (i.e. the weapon firing animation begins when the client pushed the button so it looks like it's immediate, but enemies won't be hit until server says so). Errors only get really obvious when the ping is very high (> 400ms) or your internet connection goes really bad for a noticeable amount of time (e.g. lots of noise in the DSL line, overheated router/modem, overloaded ISP, overloaded Server, Wifi connectivity issues, etc) and thus lots of packets start getting dropped or reordered until the connection quality improves again. For more information read Gaffer on Game's networking series, and read it several times (start from the bottom, then to the top articles)