JoeJ

Members
  • Content count

    641
  • Joined

  • Last visited

Community Reputation

2586 Excellent

About JoeJ

  • Rank
    Advanced Member

Personal Information

  • Interests
    Art
    Programming
  1. 3D I like to learn about BSP rendering

    In DirectX you code shaders in HLSL which is like C. DirectX translates this to bytecode, and the driver to ASM (each vendor and GPU has different instruction set). There are different shaders to process vertices or pixels, so a minimal vertex shader transforms vertices by madel and camera matrices, and a typical pixel shader has access to interpolated values (UVs, vertex colors, custom data) and ouputs final color for each pixel. You can still do lighting in the vertex shader like fixed function pipeline did, but any GPU is poweful enough to do it per pixel. I you are happy with vertex lighting, yes you can do it on CPU yourself using vertex colors to get around the 8 lights limit. HSR per pixel is what GPUs do using Z-Buffer. Software rasterizer (Quake / Doom) can do it e.g. per span (a scanline from a triangle), which is much less work than per pixel. But GPUs have lots of parallel processing power and work efficient HSR algorithms are hard to parallelize. Also triangles baecame much smaller so ZBuffer wins on GPU. Using GPUs you don't care about fine grained HSR, but you may want to avoid rendering one half of the level if it is hidden behind a wall. Here BSP can be used in combination with a precomputed Potentially Visible Set (PVS): Each cell of the BSP knows what other cells may be visible if the camera is inside. This idea works also with other data structures but is limited to static geometry (and it's hard, so see how far you can get without a need for such things). No. Only if you need to work with angles. You need trig for a first person camera once per frame, but you can animate a chracter and transform its entire skeleton hierarchy without trig. You mostly use dot and cross product instead sin and cos. Probably your hardware can do much more math than you think. You use one matrix for the camera, and one matrix for each dynamic object. By multiplying both matrices you need to transform each vertex with only one matrix and do the perspective projecton to get screen coords (DirectX does this for you). However, Both CPU and GPU are built to do math on data - no need to avoid it. Today it's more important to optimize for efficient memory access than to limit math work. I'd like to add your target hardware to Infinisearchs list
  2. 3D I like to learn about BSP rendering

    You can lift that limitation in two ways: 1. Do per pixel shading with shaders (the modern approach ) 2. Implement your own vertex lighting and submit lit vertex colors to DX. If you still want hardware Lighting from fixed function GPU for more than 8 lights, you can make sure any triangle is lit by only 8 at max and change lights accordingly while rendering. Partititioning geometry will be usefull for this, but you can use any method you want. BSP is static, so it may not be the best choice here. Doom / Quake had hidden surface removal (HSR) on top of BSP sorted drawing order to reduce overdraw (can achieve zero overdraw for static scenes). The combination of those things made BSP so attractive and efficient back the days. But GPU can't do HSR. They still profit from front to back order due to early Z, but a less restricted and more flexible way of sorting makes more sense for GPU. I recommend you read the chapters, it's interesting anyways, and also the chapters about baked radiosity (with that 8 dynamic lights should suffice!)
  3. 3D I like to learn about BSP rendering

    To learn about BSP in Doom, read the chapter in Michael Abrashs Black Book, which is online everywhere. But i did not find the source code and hope it's ok to attach it here. (It's OpenGL or software rasterizer but i guess it still compiles) BSP has nothing to do with graphics API or shaders anyways. Doom uses BSP for front to back sorted rendering of level geometry and frustum culling. The idea is to generate small chunks of geometry by cutting space in two halfs recursively. Those chinks are static and ther is no need to touch vertexbuffers at runtime. Edit: Why do you think you need BSP nowadays? ddjbsp2.zip
  4. I have experience mainly with compute and did not much rendering yet, but i assume it's the same for both: You can record a very complex system to just one command buffer. In my case that means prerecording indirect dispatches that may do zero work most of the time, but that approach is a lot faster than doing such decissions on CPU. Vulkan is almost twice as fast than OpenCL for me because of this, although both use the same shader code. AFAIK, you also can put a whole graphics engine in to one CB if you want. Using only one command buffer is usually the fastest way. (Enqueueing multiple command buffers has a noticeable cost!). But there are two exceptions: 1. Multithreaded command buffer recording on CPU. Interesting if you do not want to prerecord your whole pipeline and do a lot decissions on CPU every frame. 2. Async compute. As long as you have no pipeline barriers in your queue AMD does it automatically. But otherwise you can use multiple queues to keep working while e.g. a memory barrier stalls a queue. I made this testcase for AMD: https://github.com/JoeJGit/Vulkan-Async-Compute-Test It shows some interesting behaviours and may help to get a sense of when async compute is worth it or not. You can also see the cost of using multiple queues and CBs as small gaps in profiler output. Those gaps are similar if you use just one queue but enqueue multiple CBs to it. So in both cases the reason to use multiple command buffers is either CPU or GPU parallelization, simply because there is no other way. But other than that you want to put all your work into one buffer and avoid fragmentation most of the time.
  5. My MMORPG Character

    Awesome
  6. You can warp fourier transform around a circle, and traet the circle as a square. You can then decice how much bands you need: 1st. Band is a constant term, 2nd can encode a lode towards a single directions, adding more bands means you can approximate multiple lights more accurate. SH is similar: The smallest 2 band version has one number for the constant term (1st band), and a 3D vector (2nd band) for a directional bump. (This band tells you the dominant light direction if you gather many samples, similar to my curvature example.) So for 2D you should need 3 numbers: Constant term, and a 2D direction (or angle and amplitude like i did, but direction avoids the trig functions when decoding).
  7. If you want it only for 2D, SH is more than necessary, but i'm unsure what you try to do. However, i use code below to calculate the curvature directions of a mesh. The problem here is that curvature direction is the same on opposite sides, e.g. vec2(0.7, 0.7) equals vec2(-0.7, -0.7) so i can not simply add vectors to get average curvature direction. Instead i express directions with a sine wave that has two lobes pointing forwards and backwards, phase is direction and amplitude is intensity. Now adding two sine waves always results in another single sine wave and this way i get an accurate result from summing any number of samples. (Same principle is used in SH and Fourier Transform). So, if this sounds interesting to you, you could do the same for lighting, but you would want the lobe pointing only in one direction and not the opposite as well, which means replacing factors of 2 with 1 and adjusting some other things as well. But: For lighting i would just sum up vector wise and accept the error coming from that. Also note that my approach does not have a constant band like SH, so the same amount of light coming from right and left would result to zero - might be worth to add this for lighting. struct Sinusoid { float phase; float amplitude; Sinusoid () { phase = 0; amplitude = 0; } Sinusoid (const float phase, const float amplitude) { this->phase = phase; this->amplitude = amplitude; } Sinusoid (const float *dir2D, const float amplitude) { this->amplitude = amplitude; phase = PI + atan2 (dir2D[1], dir2D[0]) * 2.0f; } float Value (const float angle) const { return cos(angle * 2.0f + phase) * amplitude; } void Add (const Sinusoid &op) { float a = amplitude; float b = op.amplitude; float p = phase; float q = op.phase; phase = atan2(a*sin(p) + b*sin(q), a*cos(p) + b*cos(q)); float t = a*a + b*b + 2*a*b * cos(p-q); amplitude = sqrt(max(0,t)); } float PeakAngle () const { return phase * -0.5f; } float PeakValue () const { return Value(PeakAngle ()); } void Direction (float *dir2D, const float angle) const { float scale = (amplitude + Value (angle)) * 0.5f; dir2D[0] = sin(angle) * scale; dir2D[1] = cos(angle) * scale; } };
  8. Shadow Mapping

    Nice you already have experience with all kinds of shadows optimizations i've in mind But there are two more... Does this make sense also for the scrolling cascades of directional sun with some kind of streaming? The second question is about updating shadowmaps at lower frequency, say after 4 frames. I assume the easiest way to achieve this would be to transform the sample point back in time for dynamic objects. Anyone tried something similar already?
  9. Shadow Mapping

    But that's the point of my suggestion: You never get closer to the texture edge than 0.5 texels, so you don't need to worry about disconnected UV space. Of course you still need to select the proper UV offset to adress the atlas, but only once and not 3 times, and this should be very cheap anyways and can be made branchless. The only question is how bad artefacts are, and how this depends on shadow technique (PCF, VSM, ...)
  10. Shadow Mapping

    What faces do you mean? Geometry? Or related to the 6 projections?
  11. Shadow Mapping

    What if you increase fov for each face by a small amount so you get one 'overlaping' border of texels? You would need to sample always just one shadow map, and i assume artefacts at cube corners / edges would be negligible for shadows?
  12. Geometry vs Texturing in Game Art

    Maybe as an artist you should not distinguish between geometry and texture too much at all. There may be a preprocessing toolchain that takes your art and converts it to efficient game assets. It may resample textures, merge materials, decide what details remain geometry or become normal maps, generate LODs. It may even remesh your geometry and turn details to displacement maps, geometry images, voxels or whatever turns out to be most efficient. So thinking ahead it's probably most important you always create art with very high detail so it still looks as intended even if everything gets resampled and downscaled. E.g. a tool may rotate UVs and straight lines of texels could become jaggy, or a tool may remove long and thin features of geometry, etc. Being aware of such issues may become important, while caring about low poly count may become obsolete for the artist.
  13. DX12 Split Barrier Question

    If there is no other work in between there schould be no advantage from using a split barrier. (But i can't tell from experience. In Vulkan you can do this with events but i have not used it yet.)
  14. Compute-Shader: InterlockedAdd: buggy?

    If you you do 3 atomic operations in sequence: InterlockedAdd(_cnt[0].trianglecount, 1, idxTriangle1); InterlockedAdd(_cnt[0].trianglecount, 1, idxTriangle2); InterlockedAdd(_cnt[0].trianglecount, 1, idxTriangle3); ...then other threads running in parallel can do their instructions inbetween them, so idxTriangle variables will not be an ascending sequence like 4,5,6 but more likely something random like 4, 12, 40. Your second attempt is also much better because you use less expensive atomic instructions.
  15. Depth-only pass

    You may also need to use precise to prevent the compiler from shuffling your math between shaders resulting in slight differences: https://msdn.microsoft.com/en-us/library/windows/desktop/hh447204(v=vs.85).aspx