• Content count

  • Joined

  • Last visited

Community Reputation

3629 Excellent

About Adam_42

  • Rank
  1. The default stack size on Windows is 1MB. 1. The stack size doesn't change on different computers. Maybe some people don't use the feature that consumes lots of stack space, or its stack consumption depends on the input data? 64-bit code will also consume more stack space than 32-bit. 2. Windows supports any stack size that will fit in RAM. You can change it in the linker settings, or as a parameter when creating a thread. I guess 1MB is a reasonable trade off between how much space the program gets to use, and the memory that's consumed by creating a thread, but if you have different requirements you can change it. Also if your stack is too big buggy recursive functions could take a long time before they fail with a stack overflow. 3. When it's crashed in the debugger add the esp (or rsp for x64) register to the watch window. That's the stack pointer, and you can watch it change as you move up and down the call stack in the debugger. Looking at how much it changes by will tell you how much stack is used. Alternatively just look at the size of the local variables in each function on the call stack until you find the big ones - you can use sizeof(variableName) in the watch window to see how big something is.
  2. I'd also recommend turning on the compiler warning that will catch this, by increasing the warning level to 4 in Visual Studio. You could even go as far as turning that warning into an error by doing: #pragma warning(error : 4706)
  3. I've reproduced the behaviour, and simplified the case that goes wrong. Here's my minimal failing case: groupshared uint tempData[1]; void MyFunc(inout uint inputData[1]) { } [numthreads(2, 1, 1)] void CSMain() {     MyFunc(tempData); } It looks like just passing the argument to the function is enough to make it fail to compile. Here's a workaround for the problem - don't pass the array as a function argument: #define ElementsCount 256 groupshared uint tempData[ElementsCount]; void MyFunc(in uint3 gtID) { tempData[gtID.x] = 0; } [numthreads(ElementsCount/2, 1, 1)] void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID) { MyFunc(gtID); }
  4. explains all the details of how the compiler DLL works. If you want to check which version of the DLL your program is using, then just pause it in the debugger and look through the modules window for the DLL. I believe the latest version is D3dcompiler_47.dll Have you tried compiling the shader using fxc.exe?
  5. Based on a quick search I found these:   Based on those it sounds like there might be a bug in certain versions of the compiler. I'd suggest trying to use either command line fxc.exe or a more recent version of the d3dcompiler dll to see if it makes any difference.
  6.     It sounds like you really want to be using DXT compressed textures. DXT1 is 4 bits per pixel (i.e. 8 times smaller than RGBA). You do lose a bit of image quality, but it usually not significant. If you need an alpha channel, then you want DXT5 instead of DXT1, which doubles the size to 1 byte per pixel.   Note that they don't only save space on disc, they also stay compressed in video memory. Because of that rendering with them can also be quicker than uncompressed textures.
  7. DX11

    Picking is often done entirely on the CPU. You construct a 3D ray and intersect it with objects / triangles in the scene.   I found a basic tutorial at:   If you're using a physics engine of some kind, it should be able to handle the hit testing of the ray for you. Here's an example that uses the Bullet Physics library:
  8. While you can write code to load any image format you want to, I'd strongly recommend using .dds files. They support many features that many standard image file formats like jpeg don't always handle:   - Mip maps. You want to generate high quality ones offline, and not try to generate them at load time. Especially if your textures are compressed. - Texture compression (BC1-BC7 formats in DX11, DXT1/DXT5 in D3D9). These can make your textures up to 8 times smaller in video memory as well as reducing the file size. - Floating point textures. - Cubemaps. - Alpha channels.   If you want to convert images to DDS format programatically, then you can use   If you prefer a Photoshop plugin, then Nvidia have made one:
  9. It varies. In some cases it can speed things up, and in others it can reduce performance. You need to test for your specific usage, on the hardware that you care about, and see what happens.   One way it can speed things up is when you're hitting memory bandwidth limits on writing to the frame buffer. In my experience this mostly affects lower end graphics cards. In those cases using discard to implement an alpha test so you don't write out fully transparent pixels can improve performance. This might apply to the rendering of a particle system, for example.   On the other hand, in shaders that write to the depth buffer, using discard can hurt performance. This is because using discard has a side effect of disabling some of the hardware's depth buffer optimizations, because it makes the depth output of the shader impossible to predict before the shader runs. Disabling those optimizations can make future draw calls go more slowly, especially ones that would fail the depth test.   In addition note that enabling alpha blending can also have a performance cost - it uses more memory bandwidth than opaque rendering, because it has to do a read-modify-write of the frame buffer instead of just a write.
  10. You need to be able to reproduce the bug yourself to identify the problem, and be confident that any fix you make has solved the problem.   If you can't reproduce the problem another way, you could always buy yourself a second hand 9500 GT. A graphics card that old won't be expensive.
  11. You can mix shaders with non-shader draw calls. They both go through DrawIndexedPrimitive(). It's the setup you do before the draw call that's different (i.e. you call SetVertexShader() and SetPixelShader() to activate shaders, which you can set back to NULL again to go back to non-shader mode).
  12. You might be able to reproduce the problem yourself by explicitly requesting a 10.0 or 10.1 feature level when you initialize the device.   WARP is also a feature level 10.1 device, so it might be worth testing with that.
  13. As a general guideline, you want the smallest number of bytes per pixel that you can use to represent your data. While there's no guarantee that smaller textures will be faster, they are very unlikely to be slower.   For textures that generally means that the block compressed formats like DXT1/BC1 are ideal. DXT1 is essentially 4 bits per pixel, which is smaller than any uncompressed format. Of course you lose a bit of image quality to get the size so small, but in almost all cases it's a good trade off.   Having small textures helps for several reasons:   1. You can fit more stuff in video memory. If you run out of memory on the card, then performance will tend to suffer as the driver is forced to move data between the GPU and main memory. 2. GPUs have texture caches. The more pixels that fit in the cache, the better the performance should be. 3. Small textures use less of the available memory bandwidth. 4. You can load small textures from disc faster, and they take less storage space.   Having said that, GPUs do a lot of work to try to hide the bandwidth and latency costs of memory accesses, so it very much depends on exactly what you're doing how much performance impact there will be from using a different texture format.
  14. There's three standard approaches to the problem here:   1. Redefine the units so there aren't any decimals, and use a sufficiently large integer type. For example don't store time as 0.3333333... days. Use 8 hours, or 480 minutes, or 28800 seconds, etc. The same goes for currency values - don't store it as $12.34 - store it as 1234 cents instead. Some languages have special decimal types for this.   2. Use an arbitrary precision maths library, that will give you enough accuracy for the operations you care about. This can be fully exact for basic operations on rational numbers. Of course infinite precision can also mean infinite memory and performance costs.   3. Decide that a standard floating point type is good enough, and use it. If you're worrying about precision then double is probably what to go for - it has about 15 decimal digits of precision. If you're making a computer game then float (~7 digit precision) should be good enough for almost everything (one significant exception is for total elapsed game time).
  15. You might want to consider going for more CPU cores. One option would be something like the the i7 6800K, which has 6 cores. The extra cores can help significantly with compile times, as the compilation process is easy to split across multiple cores (although it won't help link times).   You do sacrifice clock speed to get the extra cores though, so they aren't always faster, and they are more expensive.   Here's a review of the latest set of desktop CPUs with 6+ cores: