Jump to content
  • Advertisement

ADDMX

Member
  • Content Count

    17
  • Joined

  • Last visited

Community Reputation

281 Neutral

About ADDMX

  • Rank
    Member

Personal Information

  • Interests
    Programming
  1. OK - thanks ! Indeed I'm experimenting with raycasting and I didn't have this problem on my older hardware (GTX960) - but the copy should work regardless of error ? or if the debug layer is enabled the copy wont work on RTX hardware ? (that would sucks realy :/)
  2. Hi It is possible at all to copy descriptors between _two different_ heaps ? The documentation says so: First (source) heap is: D3D12_DESCRIPTOR_HEAP_DESC HeapDesc; HeapDesc.NumDescriptors = 256; HeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER; HeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; HeapDesc.NodeMask = 0; and SrcHeap->GetCPUDescriptorHandleForHeapStart() ==> Handle.ptr == 4 (strange, value indeed, I'd expected ptr as in case of GPU handles) Second (destination) heap is: HeapDesc.NumDescriptors = 128; HeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER; HeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; HeapDesc.NodeMask = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE; and DstHeap->GetCPUDescriptorHandleForHeapStart() ==> Handle.ptr == 9 (strange, value indeed, I'd expected ptr as in case of GPU handles) and I want to copy elements 5, 6, and 7 from first one to the second one auto Increment = Device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER); // Return 32 CD3DX12_CPU_DESCRIPTOR_HANDLE Src = CD3DX12_CPU_DESCRIPTOR_HANDLE(SrcHeap->GetCPUDescriptorHandleForHeapStart(), 5, Increment); CD3DX12_CPU_DESCRIPTOR_HANDLE Dst = CD3DX12_CPU_DESCRIPTOR_HANDLE(DstHeap->GetCPUDescriptorHandleForHeapStart(), 0, Increment); Device->CopyDescriptorsSimple(3, Dst, Src, D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER); and debug layers says: D3D12 ERROR: ID3D12Device::CopyDescriptors: Source ranges and dest ranges overlap, which results in undefined behavior. [ EXECUTION ERROR #653: COPY_DESCRIPTORS_INVALID_RANGES] and indeed samplers are not copied to the shader visible descriptors heap ... why ? I have win10 1809 (x64), latest nvidia drivers and 2080RTX (I do not have any other cards, and device is initialized on 2080RTX) I'v compilled ModelViewer from DXSamples MiniEngine ... and it spills out that same error from within it's DynamicDescriptorHeap implementation :/
  3. Hi Just a simple question about compute shaders (CS5, DX11). Do the atomic operations (InterlockedAdd in my case) should work without any issues on RWByteAddressBuffer and be globaly coherent ? I'v come back from CUDA world and commited fairly simple kernel that does some job, the pseudo-code is as follows: (both kernels use that same RWByteAddressBuffer) first kernel does some job and sets Result[0] = 0; (using Result.Store(0, 0)) I'v checked with debugger, and indeed the value stored at dword 0 is 0 now my second kernel RWByteAddressBuffer Result; [numthreads(8, 8, 8)] void main() { for (int i = 0; i < 5; i++) { uint4 v0 = DoSomeCalculations1(); uint4 v1 = DoSomeCalculations2(); uint4 v2 = DoSomeCalculations3(); if (v0.w == 0 && v1.w == 0 && v2.w) continue; // increment counter by 3, and get it previous value // this should basically allocate space for 3 uint4 values in buffer uint prev; Result.InterlockedAdd(0, 3, prev); // this fills the buffer with 3 uint4 values (+1 is here as the first 16 bytes is occupied by DrawInstancedIndirect data) Result.Store4((prev+0+1)*16, v0); Result.Store4((prev+1+1)*16, v1); Result.Store4((prev+2+1)*16, v2); } } Now I invoke it with Dispatch(4,4,4) Now I use DrawInstancedIndirect to draw the buffer, but ocassionaly there is missed triangle here and there for a frame, as if the atomic counter does not work as expected do I need any additional synchronization there ? I'v tried 'AllMemoryBarrierWithGroupSync' at the end of kernel, but without effect. If I do not use atomic counter, and istead just output empty vertices (that will transform into degenerated triangles) the all is OK - as if I'm missing some form of synchronization, but I do not see such a thing in DX11. I'v tested on both old and new nvidia hardware (680M and 1080, the behaviour is that same).
  4. If you use system such as this it will be a lot slower than not using it at all   Your first problem is that you read data from gpu IN THAT SAME frame that you submited queres - this way your cpu and gpu work in full sync with is realy bad   You cannot use bounding box that encloses shape to render it's occlusion proxy, for occlusion proxy you need real object or it's simplified version, it's not that easy to develop working occlusion system with occlusion queres   better than thinking of individual object think of 'zones' for example if you are rendering big city and you are standing on the street, divide your city into zones, and assign occlusion proxy (some simple - probably convex - shape that encloses each zone) then render your city normally (in first frame) , THEN render occlusion queryes, THEN in next frame query them for results and  from this result decite to render or not to render zones (you should ideally have 2- 3 frames of delay between pushing occlusion query and fetching it's result - othervise gpu-cpu sync occur and performance will be lost   (ofcourse your camer moves, so you need to take that into account and do a LOT of heuristic to not produce artifacts)
  5.   yes ... I'v got around 20ms on 680GTX a while back (full HD screen capture) - it's slow but universal. The faster alternatives exist but more code is need - if you have application code and can acces backbuffer, then reading back backbuffer will be WAY faster, else you need dll injection into application (or similar technique), to inject your code just before 'Present' call (inside 'fake' Present actually).  If you need capture whole desktop I'm affraid you are doomed :|
  6.     If you need to grab entire desktop, you may use DX9Ex GetFrontBufferData - it's fast. Be more specific what you want to grab, entire desktop, just your app window content (if content is rendered via d3d, you can simply read back backbuffer - that - with proper swapchain setup should be very fast)
  7.   It's not true, for DX11 Device and Primary context are sill bound together, what is independent is swapchain, so you can create Device/Context combo without HWND, AFAIK this is not possible in OpenGL. In OpenGL every context needs a windows, since there is no concept of swapchains separated from rendering context / resource management (but more than one context CAN share that same window).
  8. ADDMX

    Bad Red Robot

    Hi there!     Red Bad Robot is a programming puzzle game, which is an easy way to introduce you to the world of programming. The game uses the mechanics which are used during programming. Red Bad Robot allows players to understand the basic concepts of programming, such as flow of instructions, sequences, procedures and loops. The player's task is to plan the commands for the robot to reach the exit. Levels are harder and harder, and you need to use the minimum number of commands. During the game, there are additional features such as the possibility of moving boxes, shooting, open doors, and even teleport or multi-threaded robots. The game has 24 stages and is intended for players of all ages. In addition, the game features an editor which allow you to create your own levels and there is the possibility of uploading it to the server.   Game is available for Android and Windows Phone 8.1 https://play.google.com/store/apps/details?id=pl.ewbm.roboticalite http://windowsphone.com/s?appId=e1cfcad8-6ad3-4ea6-a188-9e56de16bfcd    
  9. ADDMX

    GPU ray tracing in real time

      to write realtime ray tracing engine you need deep knowledge on the subject (both algoritmic and how the gpu work) but this is perfectly doable on current consumer's hardware here is example: http://brigade3.com/ You can look for Arauna 2 - another pathtracer See my old engine source code http://mxadd.org/bin/RayCasterCode.rar (binaries are on http://mxadd.org see projects section) First read few books and papers, then prepare for year or so of hard work, and then you'll be able to write another pathtracer (unles you'll give up ;))
  10. Hi   Me and some of my friends in spare time wrote (again) a game (indie, crazy carts racing with guns and chainsaw ;).   Now when the game is almost finished from the code point of view (more tracks and cars need to be modelled and added)  I need to add some leaderboard/achievements/micro-transactions system to it. We have our in-engine local achievemnts/leaderboards tables but from the marketing point of view it will be a lot better to have global ones. Game will be free - with micro-transactions system (to buy updates, unlock new levels, etc. (but balanced not to be pay-for-win)) Game is writen on our own enigine - (mainly c++ with objective-c/java mixup for iOS/Android) - supported platforms are: Android (2.3 and up), iOS, LINUX and WINDOWS So here I have a real question:   Is there any micro-transaction system that supports ALL of those platforms ? (on our previous projects we'v used scoreLoop but it is only Android/iOS), same with google play services. I'm doomed to use Steam on Win/Linux and scoreLoop on Android/iOS or is there any 'magic' platform that spans accors all 4 ? I'v came accross facebook Game services - it's seems to be platform independent - and support what I need  Any other popular solutions ?    
  11. In each cell of the grid you need to store LIST of ALL trinagless that colide with that cell (this willprobably lead to many list sharing one triangle situation) the other approach is to take each triangle and clip it to each cell, then store resulting triangles in that cell - this step is costly, and due to numerical rounding errors can lead to seams on final image (but most of the time its ok))
  12.   That seems quite low in terms of performance, but hey context and unoptimized. Thanks for sharing, a doubling in performance seems pretty clear.   In terms of performance IMHO it's quite good, 12M triangles for the shadowmaps is quite a lot for 680M (acounting the number of batches & that 50% of the meshas HAVE pixel shader (vegetation needs alphakilling in PS)).
  13. I'v already done some experiments in my own engine and here are results (everything tested on complex scene (with large amount of small/medium scale vegetation objects (most of them instanced) with about 3 milion vertices casting shadows, 30% of them are skined, nvidia 680m, i7):   1] draw everything once, use GS to replicate vertices and use 2048x2048 texture with 4 1024x1024 quaters - SLOOOW - you need to use 2 custom clip planes to clip to the quater of atlas - GS is main botleneck (performance of whole frame around FPS = 21.1)   2] draw everything once, use GS to replicate vertices and use 1024x1024x4 texture array - SLOOW - but better than previous since no clipping planes are needed - GS is main bottleneck (FPS = 22.4)   3] draw everything once, into 2048x2048 texture with 4 1024x1024 quaters - this time for every drawcall multiply instances count by 4, and in VERTEX shader use (InstanceIndex&0x3) to output into specific quater of atlas (again 2 custom clip planes used) - FAST, FPS = 42.7 !!! (twice as fast as with GS path) - this time the bottleneck is in vertex shader for all those skined vertexes.   4] use texture array, but for each cascade submit their own set of draw calls, THERE _IS_ oprortunity to clip them independently, so the win is total number of vertices processed by VS (for points 1, 2 the total was 3M, for 3 it was 12M!, for 4 it was 6M) but the loos is total number of batches (for 1, 2, 3 it was 972, for 4 it was 1944) FPS = 42.1 - if all is submited to base context, 44.1 if 4 deferred contexts are used and each is created on different sheduler task, then all of them are submited at once  into base context)   for 3 there is probably chance to outperform 4 if some neat way of clipping is introduced, but for now i have no time for this and i'm stick with it as-is since i need to get with batches as low as possible since other parts of engine demands them.
  14. ADDMX

    Render queue ids/design?

      Hi   Well I'm using different design. Lets say we have a scenegraph, where each renderable object holds pointer to material, then each material holds pointer to shader, now when it comes to rendering we traverse scenegraph adding each renderable object to it's material's internal list. Now when it comes to rendering of solid things we traverse thru list of solid shaders, bind program and 'global scale' uniforms, then we traverse thru shader's materials list bind per material states (textures, 'material scale' uniforms) and then go thru the material's list of renderable objects and bind their uniforms and finaly renders them.   By shader I mean: shader program + any shader global uniforms (for example bumpShader or paralaxShader) By material mean: set of textures + any material specific uniforms (for example duffise_map.dds, bump_map.dds, specular_power, specular_color, etc. etc.) By object I mean: VertexBuffer, IndexBuffer, ToWorld matrix, optiomally SkinningPalette.   Instancing is easy, you just need to sort each material renderables list by object_id (assuming that objects that have exacly that same VertexBuffer and IndexBuffer have that same object_id) and then put all ToWorld matrixes of objects with same object_id into instancing table and draw them at once.   In reality when you traverse scenegraph you will clip to to frustum, shadow_frustum, reflection_frustum(s) etc. and have a bit_flag that gets bits set for each frustum where the object is visible and then at render time you just check if the visibility_mask & clip_bits are zero or not to render only whats needed.   For objects with transparency there's a problem with sorting as with this design you can sort only in one material bucket, but for me this was not an issue since i have only one transparent shader and all the transparent objects have same texture atlas.   Any comments on this design are welcome.
  15. ADDMX

    backing GI

      ok, that will give me DIRECT light on the probes - this is the easy part ... now the INDIRECT light (bounces) :)) the trick is that I want to calculate DIRECT light in conventional way for (for quality) and add only INDIRECT light from the probes.   If you are familiar with unity-pro, then this is the mechanism applied for lights with 'auto' settings when backing light probes, only indirect lighting (bounces) is backed on probes, then when the rendering is done, the direct lighting is calculated as usual (shadowmaps rendered, diffuse term evaluated, etc.) and then indirect light is added.   the real question would be - HOW TO BAKE INDIRECT LIGHT (without writing own montecarlo/metropolis tracer ;))) (or what middleware use)
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!