• Content count

  • Joined

  • Last visited

Community Reputation

232 Neutral

About Zoner

  • Rank
  1. Multiple OBJ files foils inlining, unless using link time code generation (/LTCG). If you turn off function level linking, the whole obj gets linked in if ANY function or data in the OBJ is referenced in the rest of the program. This can be an issue if you care about space (as we definitely do on consoles). Multiple OBJs can be a problem if a header file defines static variables. This will cause the variable to exist 'without a name' in each of the obj files and get linked into the final binary multiple times (another space issue, and can be hard to track down the cause)
  2. When compiling for 32 bits you need to pass SIMD data by reference or pointer. 64 bit ABI allows them to be passed by value at a language level, but if you compile on both targets you need to do it the 32 bit way. The simple reason is 32 bit ABI does not correctly align the stack to 16 bytes. You may ask, what about local variables then?. Functions that have __m128 variables as local variables cause the compiler to generate additional code to align the stack so they can be stored there. Note that even in x64, [b]__m128 variables are not passed via xmm registers[/b]. They will be written to the stack and passed by reference behind the scenes. However your code will compile when you write it to pass by value. scalar floats and doubles (i.e. stuff not using __m128 as their data type) DO get passed in xmm registers, but the ABI does not handle SIMD data types. Weird I know but thats the way it is speced at the moment. The way of dealing with this problem is to forceinline all the code passing by value, but that has some rather practical limits.
  3. Specular Power = 0

    zero bases break pow as well on GPUs since the pow is basically: float pow(float base, float exponent) { return exp2(log2(v), exponent); } The fix is to either call max(v, very_small_number ) or something like float mypow(float base, float exponent) { return base > very_small_number ? exp2(log2(v), exponent) : 0; } And you get to sit down and tune your own very_small_number for whatever you are working on.
  4. I'm dying to really see how well modern hardware handles true branches when you can disregard old hardware, because we work extra hard to flatten our shaders and build custom permutations for all of them in order to minimuze shader ALU processing.
  5. The games I am working on this year are probably the last two big SM3 projects we will be doing. its all D3D11 with SM5 and SM4 profiles going forward . . .
  6. Stabilizing the shadowmap requires a few steps: [list] [*]Padding up the shadowmap by 1 additional texel than required, then translating the shadowmap projection by an offset so that the center texel is centered at all times. This will stop the crawling when you translate the camera. [*]Mapping the visible part of the view frustum to a sphere before projected that into a texture, will protect it from crawls caused by rotating the camera. [*]If your camera's field of view animates, the shadows will also crawl as the view frustum will dynamically make the fit sphere larger or smaller. This can be hidden by making taking the FOV and rounding it up into buckets of increments that affect the sphere fitting, or you can just live with it if you don't change the FOV more or at all. The bucket strategy will avoid the crawl but you will get pops when changing buckets instead. If you constrain the min and max FOV well enough you could probably use a single value and never see a pop. [/list]
  7. Braindump inc:[list] [*]'C' programs or C++ written 'like C' should prefer using enums for constants. [*]Global variables are almost always re-read from memory. If they are pointers, the pointers are re-loaded and then dereferenced on almost every access. This can be a perf problem, even when the pointer is just some virtual base class. Compilers treat globals as volatile for the most part. [*]global const variables should be externed and placed in a single file. This also means the compiler can't inline their value in the code like it can with an enum or a define. On the plus side they are visible in mapfiles and debuggers and easy to change even in optimized and release builds. [*]If you make the variable static in some attempt to make the compiler inline its value where it is used, you will generally fail, and the linker map won't show the symbol (it is static), and it will take up space in your executable for every .cpp file that includes your header (which can be pretty bad if your static is a big array of bytes or something large). [*]class member static const variables are ok to use, they work more like enums (but can't be anything but integer types until C++11 is supported on your environment) [/list]
  8. Looking on the msdn site, xinput 1.4 is part of Windows 8, and won't be on win7/vista until there is a patch, windows update, or service pack (however they deploy directx 11.1 will be the way it lands on your machine). Changing the project to link and use the 1.3 import lib and dlls should do the trick, but might require installing or downgrading to the windows 7.0a (for vista) or 7.1 sdk (for win7). Rebuilding in visual studio 2010 should suffice as it is default setup to use the 7.0a sdk out of the box.
  9. The tricks with directional lights are making the shadows stable when rotating or translating the camera. The translation is a matter of padding the shadowmap 1 or 2 texels and computing how far off a shadowmap-texel-center the view origin is, and adjusting by that amount. Rotations are harder, you need to compute the convex chunk of the view frustum going into the shadowmap cascade and treat it as a sphere so it becomes rotationaly invariant with repsect to resolution. This has side effects, as changing the field of view will cause the shadowmap to dance. Which can be worked around by not allowing, or rounding upt he field of view into buckets and inflating the verts making up the view frustum by the worst-case field of view of the bucket it is in. They shadows will still pop when transitioning buckets but that can be hidden by not allowing
  10. traits and algorithms only (in particular sort). The generic containers are more or less banned, though a bitset has managed to make its way into our codebase since its such a pain to write one that scales up into the hundreds of bits.
  11. Bit Flag class

    Using one of the built in integer types and enum or defines for defining fields is much safer and more portable, and generally runs the fastest. Most CPU's are horrible at bit shifting by a variable amount (i.e. an amount chosen at runtime instead of hardcoded at compile time)
  12. I would download the Intel Optimization Manuals from Intel's website. There is a lot of information, but Chapter 7 (Optimizing Cache Usage) should have most of your answers. [url=""][/url] x64/x86 CPUs have extremely sophisticated hardware predictive prefecthing capabilities so generally you shouldn't need to explicitly prefetch data in your code. The first iteration of something can be an exception to this since code can frequently 'surprise' the hardware prefetcher and you will frequently need to prefetch much farther in advance in the code-base yourself. This is frequently not very practical and you have to eat the first L3 miss.
  13. FYI timeGetTime jumps in very large increments (esp relative to QPC calls)
  14. This is more from knowledge and some intuition vs experience here: Make a thread for each display device, give them each their own HWND and message pump, call Present from those threads. The actual rendering can be on any thread but presenting on the pump thread requires a bit of syncornization finesse (but is very much worth it and more or less required in multithreaded rendering anyway).
  15. dealing with driver bugs

    I would argue flushing the pipeline improves performance as it dramatically reduces latency [img][/img] . The cost is making the rendering less async-y, so the CPU spins its wheels a while waiting on the GPU to finish, but then you get things like the most up-to date user input before rendering the next frame and whatnot. We've long since stopped looking at the adapter identifier string except for logging/troubleshooting purposes. In D3D9 land you can get most of the useful information from AMD and NVIDIA format extensions and you can figure out which kind of card you have (more or less) from that: here is some code I can post to explain things a bit more quickly: [CODE] /////////////////////////////////////////////////////////////////////// // Radeon defines from Advanced DX9 Capabilities for ATI Radeon Cards // #define ATI_FOURCC_INTZ ((D3DFORMAT)(MAKEFOURCC('I','N','T','Z'))) #define ATI_FOURCC_NULL ((D3DFORMAT)(MAKEFOURCC('N','U','L','L'))) #define ATI_FOURCC_RESZ ((D3DFORMAT)(MAKEFOURCC('R','E','S','Z'))) #define ATI_FOURCC_DF16 ((D3DFORMAT)(MAKEFOURCC('D','F','1','6'))) #define ATI_FOURCC_DF24 ((D3DFORMAT)(MAKEFOURCC('D','F','2','4'))) #define ATI_FOURCC_ATI1N ((D3DFORMAT)MAKEFOURCC('A','T','I','1')) #define ATI_FOURCC_ATI2N ((D3DFORMAT)MAKEFOURCC('A','T','I','2')) #define ATI_ALPHA_TO_COVERAGE_ENABLE (MAKEFOURCC('A','2','M','1')) #define ATI_ALPHA_TO_COVERAGE_DISABLE (MAKEFOURCC('A','2','M','0')) #define ATI_FETCH4_ENABLE ((DWORD)MAKEFOURCC('G','E','T','4')) #define ATI_FETCH4_DISABLE ((DWORD)MAKEFOURCC('G','E','T','1')) /////////////////////////////////////////////////////////////////////// // Nvidia defines (GPU Programming Guide G80) // #define NVIDIA_FOURCC_INTZ ((D3DFORMAT)(MAKEFOURCC('I','N','T','Z'))) #define NVIDIA_FOURCC_RAWZ ((D3DFORMAT)(MAKEFOURCC('R','A','W','Z'))) #define NVIDIA_FOURCC_NULL ((D3DFORMAT)(MAKEFOURCC('N','U','L','L'))) #define NVIDIA_DEPTH_BOUND ((D3DFORMAT)(MAKEFOURCC('N','V','D','B'))) void d3d9Render::DetectHardwareSpecificOptions() { D3DDISPLAYMODE mode; VERIFYD3D9RESULT(D3D->GetAdapterDisplayMode(D3DADAPTER_DEFAULT, &mode)); HRESULT HasBC4 = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, 0, D3DRTYPE_TEXTURE, ATI_FOURCC_ATI1N); DeviceSupports_BC4 = SUCCEEDED(HasBC4); HRESULT HasBC5 = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, 0, D3DRTYPE_TEXTURE, ATI_FOURCC_ATI2N); DeviceSupports_BC5 = SUCCEEDED(HasBC5); HRESULT HasFetch4 = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, D3DUSAGE_DEPTHSTENCIL, D3DRTYPE_TEXTURE, ATI_FOURCC_DF24); DeviceSupports_Fetch4 = SUCCEEDED(HasFetch4); RENDER_COMPILE_ASSERT(ATI_FOURCC_NULL == NVIDIA_FOURCC_NULL, ATI_And_NVIDIA_FOURCC_For_NULL_IsIdentical); HRESULT HasNULL = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, D3DUSAGE_RENDERTARGET, D3DRTYPE_SURFACE, ATI_FOURCC_NULL); DeviceSupports_NullColorBuffer = SUCCEEDED(HasNULL); HRESULT HasDepthBounds = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, 0, D3DRTYPE_SURFACE, NVIDIA_DEPTH_BOUND); DeviceSupports_DepthBounds = SUCCEEDED(HasDepthBounds); // GBX:Zoner - NVDB is the best test for hardware PCF, since the alternatives are to guess and scan the adapater id string HRESULT HasHardwarePCF = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, 0, D3DRTYPE_SURFACE, NVIDIA_DEPTH_BOUND); DeviceSupports_HardwarePCF = SUCCEEDED(HasHardwarePCF); HRESULT FilteringFP16 = D3D->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, mode.Format, D3DUSAGE_RENDERTARGET | D3DUSAGE_QUERY_FILTER, D3DRTYPE_TEXTURE, D3DFMT_A16B16G16R16F); DeviceSupports_FilteringFP16 = SUCCEEDED(FilteringFP16); } [/CODE] Basically : Fetch4 = radeon (but not some of the older ones) Depth Bounds = NVIDIA