Jump to content
  • Advertisement

Search the Community

Showing results for tags 'DX12'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Categories

  • Audio
    • Music and Sound FX
  • Business
    • Business and Law
    • Career Development
    • Production and Management
  • Game Design
    • Game Design and Theory
    • Writing for Games
    • UX for Games
  • Industry
    • Interviews
    • Event Coverage
  • Programming
    • Artificial Intelligence
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Engines and Middleware
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
  • Archive

Categories

  • Audio
  • Visual Arts
  • Programming
  • Writing

Categories

  • Game Dev Loadout
  • Game Dev Unchained

Categories

  • Game Developers Conference
    • GDC 2017
    • GDC 2018
  • Power-Up Digital Games Conference
    • PDGC I: Words of Wisdom
    • PDGC II: The Devs Strike Back
    • PDGC III: Syntax Error

Forums

  • Audio
    • Music and Sound FX
  • Business
    • Games Career Development
    • Production and Management
    • Games Business and Law
  • Game Design
    • Game Design and Theory
    • Writing for Games
  • Programming
    • Artificial Intelligence
    • Engines and Middleware
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
    • 2D and 3D Art
    • Critique and Feedback
  • Community
    • GameDev Challenges
    • GDNet+ Member Forum
    • GDNet Lounge
    • GDNet Comments, Suggestions, and Ideas
    • Coding Horrors
    • Your Announcements
    • Hobby Project Classifieds
    • Indie Showcase
    • Article Writing
  • Affiliates
    • NeHe Productions
    • AngelCode
  • Topical
    • Virtual and Augmented Reality
    • News
  • Workshops
    • C# Workshop
    • CPP Workshop
    • Freehand Drawing Workshop
    • Hands-On Interactive Game Development
    • SICP Workshop
    • XNA 4.0 Workshop
  • Archive
    • Topical
    • Affiliates
    • Contests
    • Technical
  • GameDev Challenges's Topics
  • For Beginners's Forum

Calendars

  • Community Calendar
  • Games Industry Events
  • Game Jams
  • GameDev Challenges's Schedule

Blogs

There are no results to display.

There are no results to display.

Product Groups

  • Advertisements
  • GameDev Gear

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me


Website


Role


Twitter


Github


Twitch


Steam

Found 307 results

  1. I'm trying to share resources across different command queues. Obviously I want to keep the sharing to a minimum but at some point something needs to be shared. Afaik there's no mutex object in dx12, but does using fences work? Here's my current plan that seems to work, but I'm wondering if it could be better or what other people do. I want the compute queue to run as fast as possible, and as many times as it can, but I also want the draw queue to stop it when drawing needs to happen.   F0 <- 1 F1 <- 0   Compute loop: * queue->Set F1 = 1 * queue->Wait for F0 == 1 * Exec command list * queue->Set F1 = 0   Draw loop: * cpu->Set F0 = 0, this should stop new computes from starting * queue->Wait for F1 == 0, this should wait until the currently executing compute command list is finished * Exec command list, compute queue should not be executing anything at this point * queue->Set F0 = 1, compute queue can run again   I'm pretty sure there's a deadlock in there somewhere but whatever.   Should/could I do this with one fence instead of two? If I have a bunch of command lists pending inside the compute queue, I want the draw queue to take priority so that I can stack a bunch of stuff into compute but maintain priority for drawing. Obviously this will fall apart if a compute command list takes too long and the draw queue waits for it, so I want compute command lists to be pretty fast relative to draws.   I think setting the queue priority would let me do this -- i.e. if two command queues are waiting on the same fence, the one with higher priority should get it? But all the documentation says is  so idk :(.
  2. Hi. I started to learn DX12 and I have a question regarding constant buffer update.   I need to write only to the certain parts of my constant buffer by offset. In D3D11 I could do like that:   BackingBuffer is type of DataBuffer and I could write there my data separately by offsets like this: BackingBuffer.Set(offset, ref value); And then update my constant buffer: // Setup the dest region inside the buffer if ((this.Description.BindFlags & BindFlags.ConstantBuffer) != 0) { device.UpdateSubresource(new DataBox(BackingBuffer.Pointer, 0, 0), constantBuffer); } else { var destRegion = new ResourceRegion(offsetInBytes, 0, 0, offsetInBytes + BackingBuffer.Size, 1, 1); device.UpdateSubresource(new DataBox(BackingBuffer.Pointer, 0, 0), constantBuffer, 0, destRegion); } How can I do the same in D3D12?
  3. Hey Guys,   I have a very simple profile system in my little dx12 engine which can basically visualize time spent on GPU for each task by using timestamp. This is a good way to tell whether we got GPU bubbles or not, or identify suspicious time consuming passes . But the problem is that we can't tell the GPU usage, for example my 'fancy' postprocess pass my be bandwidth limited, or my compute shader maybe register limited which all could cause very little gpu usage that cannot be clearly reflected by using timestamp. So I really hope to be able to visualize the gpu usage thus I can tell whether the GPU is fully saturated or not, and then could be able to do better optimization.    And I believe being able to visualize gpu usage per task is an very important way to place your async compute shader wisely..   So it will be greatly appreciated if someone could enlightening me on that   Thanks   Peng
  4. I have two questions.   As I understand it, you generally want to keep your GPU a frame or two behind your CPU.  While your GPU is rendering a frame, the CPU is generating the draw calls for the next frame, so that there isn't a bottleneck between the two.   1) How does this work in practical terms?  Is the CPU side, "generating the draw calls", just building the command lists with calls to DrawIndexedInstanced and the like?  And then to actually perform the rendering, the GPU side, you call ExecuteCommandLists?   2) In terms of multi-threaded rendering, is that a misnomer?  Are the other threads just generating draw calls, with the main rendering thread being the only thing that actually calls ExecuteCommandLists?  Or can you simultaneously render to various textures, and then your main rendering thread uses them to generate a frame for the screen?
  5. rerndering to the first layer works just fine: but the second layer looks this: resource descs: http://pastebin.com/zetRzV9t rtv desc: http://pastebin.com/905tW9Lx dsv desc: http://pastebin.com/YtY6rLL2 vertex shader: http://pastebin.com/8UYhzNKW geometry shader: http://pastebin.com/t5nBAWFA pixel shader: http://pastebin.com/Ljbchrre ---------- for rendering gbuffer srv desc: http://pastebin.com/guDMnWqP screenquad vs: http://pastebin.com/qUz0YRiy screenquad gs: http://pastebin.com/1LymhJ4W screenquad ps: http://pastebin.com/yBehwuBz
  6. Hello My question is as in the topic title. Thanks in advance
  7. Anyone had any success reading back texture data?  I'm getting all black. ... ThrowIfFailed(device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_READBACK), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(readbackBufferSize), D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&textureReadback))); D3D12_TEXTURE_COPY_LOCATION source; source.pResource = texture.Get(); source.Type = D3D12_TEXTURE_COPY_TYPE_SUBRESOURCE_INDEX; source.SubresourceIndex = 0; D3D12_TEXTURE_COPY_LOCATION dest; dest.pResource = textureReadback.Get(); dest.Type = D3D12_TEXTURE_COPY_TYPE_PLACED_FOOTPRINT; dest.PlacedFootprint.Offset = 0; dest.PlacedFootprint.Footprint.Format = GetPixelFormat(descriptor); dest.PlacedFootprint.Footprint.Width = descriptor.width; dest.PlacedFootprint.Footprint.Height = descriptor.height; dest.PlacedFootprint.Footprint.Depth = 1; dest.PlacedFootprint.Footprint.RowPitch = descriptor.width * BytesPerPixel; list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(texture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_COPY_SOURCE)); list->CopyTextureRegion(&dest, 0, 0, 0, &source, nullptr); list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(texture.Get(), D3D12_RESOURCE_STATE_COPY_SOURCE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE)); ... After this I use a fence and wait for it to complete, then I call Map on the textureReadback and memcpy the data to the buffer I want it in.  And it's all zeros.
  8. hi there,   EDIT: oh didnt see that there's a newer driver, installing 355.82 solved it...   I tried to run the DX12 hello window graphics sample (https://github.com/Microsoft/DirectX-Graphics-Samples/blob/master/Samples/D3D12HelloWorld/src/HelloWindow/D3D12HelloWindow.cpp), but I can only run it, if I set m_useWarpDevice to true, meaning I can only create a warp device.   I have a 64 bit windows 10 installed, I tried to run the sample on a gtx660 using the latest driver (353.62). Nvidia control panel says I have dx12 runtime and api version, but only feature level 11_0. DXDiag reports 11.3 directx version, and says that the driver is wddm 2.0 capable, but again, it only lists feature levels up to 11_0.   While I can still develop using a warp device, I'll need the hardware device later, so it'd be great if I could solve this   any idea how to fix this? I tried to reinstall the display driver previously, no success.   best regards, Yours3!f
  9. Hi, I was playing around with dx12 using sharpdx beta. I'm able to render simple models now. However I went into troubles when trying to render models with several sub parts. The texture I uploaded is not displaying for second part.   This is how I do it:   1. Create a root signature var rootSignatureDesc = new RootSignatureDescription( RootSignatureFlags.AllowInputAssemblerInputLayout, // Root Parameters new[] { new RootParameter(ShaderVisibility.All, new [] { new DescriptorRange() { RangeType = DescriptorRangeType.ShaderResourceView, DescriptorCount = 2, OffsetInDescriptorsFromTableStart = -1, BaseShaderRegister = 0 }, new DescriptorRange() { RangeType = DescriptorRangeType.ConstantBufferView, DescriptorCount = 1, OffsetInDescriptorsFromTableStart = -1, BaseShaderRegister = 0 } }), new RootParameter(ShaderVisibility.Pixel, new DescriptorRange() { RangeType = DescriptorRangeType.Sampler, DescriptorCount = 1, OffsetInDescriptorsFromTableStart = -1, BaseShaderRegister = 0 }), }); 2. Create Descriptor Heap for srv and cbf var srvCbvHeapDesc = new DescriptorHeapDescription() { DescriptorCount = 3, Flags = DescriptorHeapFlags.ShaderVisible, Type = DescriptorHeapType.ConstantBufferViewShaderResourceViewUnorderedAccessView }; srvCbvHeap = device.CreateDescriptorHeap(srvCbvHeapDesc); 3.Upload textures var tex = Texture.LoadFromFile("../../models/test.tga"); byte[] textureData = tex.Data; TextureWidth = tex.Width; TextureHeight = tex.Height; var textureDesc = ResourceDescription.Texture2D(Format.B8G8R8A8_UNorm, TextureWidth, TextureHeight, 1, 1, 1, 0, ResourceFlags.None, TextureLayout.Unknown, 0); texture = device.CreateCommittedResource(new HeapProperties(HeapType.Upload), HeapFlags.None, textureDesc, ResourceStates.GenericRead, null); texture.Name = "Texture"; var handle = GCHandle.Alloc(textureData, GCHandleType.Pinned); var ptr = Marshal.UnsafeAddrOfPinnedArrayElement(textureData, 0); texture.WriteToSubresource(0, null, ptr, TextureWidth * 4, textureData.Length); handle.Free(); 4. Create Shader Resource View Description for each texture by offsetting the heap address var Step = device.GetDescriptorHandleIncrementSize(DescriptorHeapType.ConstantBufferViewShaderResourceViewUnorderedAccessView); ...... device.CreateShaderResourceView(texture[i], srvDesc, srvCbvHeap.CPUDescriptorHandleForHeapStart + I * Step); 5.When building command list I set the descriptor table for each texture resource by offsetting commandList.SetGraphicsRootSignature(rootSignature); DescriptorHeap[] descHeaps = new[] { srvCbvHeap, samplerViewHeap }; commandList.SetDescriptorHeaps(descHeaps.GetLength(0), descHeaps); commandList.SetGraphicsRootDescriptorTable(0, srvCbvHeap.GPUDescriptorHandleForHeapStart); commandList.SetGraphicsRootDescriptorTable(1, samplerViewHeap.GPUDescriptorHandleForHeapStart); ...... draw first ...... commandList.SetGraphicsRootDescriptorTable(0, srvCbvHeap.GPUDescriptorHandleForHeapStart + Step * i); ...... draw second ...... But the thing is only the first part will have texture. I'm still trying to understand how the whole resource management works but it looks so complex. Please help, I checked the cpp sample code on GitHub but can't find what I did wrong.   Thanks in advance.
  10. I am working on getting my dx12 renderer working. However, I have encountered a perplexing issue. The pipeline state shows the correct vs and ps and creates correctly, But nothing renders. The Debugger show the ia stage correctly , the vs stage correctly , then has stage not bound for the ps stage. I don't understand the ps is set and the debugger shows it in the state object the vs output and ps input are the same why is not bound?
  11. arjansingh00

    DX12 Good Directx Books

    I picked up Frank Luna's DirectX 12 book a month ago but to be honest I am finding graphics programming hard as hell. I covered everything upto the Drawing part but the book skips over a lot of the code after a while leaving me clueless on how to even initialize directx. I did look at the code samples provided but he uses his own framework which can make things complicated and hard to understand. I CAN'T EVEN ENABLE 4X MSAA without running into an error. I was expecting it to walk you through making your own engine step by step making yourself code but instead it just covers a little code and hands you all the code in the form of samples. Some of the topics like Shadow Mapping are very well explained but when it comes to implementation he does it all through his framework which confuses the hell out of me and makes it hard to implement myself. Could someone recommend any other books to try or maybe graphics programming isn't for me... 
  12. So the title pretty much sums it all up. I wanted to know how to set the HWND for my DirectX Engine to a C# Panel. I know it involves making a C++/CLI Wrapper. I'm learning DirectX from Frank Luna's DirectX 12 Book so the engine is here: https://github.com/d3dcoder/d3d12book. If someone could download his source code and make a C++/CLI Wrapper out of Chapter 4 that would be great. All you need to do is go to chapter 4 and open it in Visual Studio and you'll have all the files there. I know this is a lot of ask but I've been trying to do this for days and any help would be so appreciated. 
  13. In DX11 and prior to draw to an off-screen texture (renderTarget) you would do something like this: dx11_dev_context->OMSetRenderTargets(1, &_color_view, _depth_view); And any draw calls after that would get sent to that texture until another OMSetRenderTargets() was called. This allowed a bind() in a high level function and then draw all objects in a lower scene-graph draw().   My question is in DX12 how is this type of scenario accomplished? Does every CommandList need to do a cl->OMSetRenderTargets() call so that if a Draw call is made it knows where to go?   Is it and option to:   CL0 {    Transition::RenderTarget   OMSetRenderTargets()    ClearRenderTargetView } CL1 {   IASetPrimTop   IASetVertexBuffer   ....   Draw scene objects } CL2 {    Transition::Present }   CommandQueue->execute(CL0,CL1,Cl2);
  14. Hi, I wonder if there is available a chart (xls or whatever) showing the different IHVs' GPUs support for the optional Direct3D 11 features (http://msdn.microsoft.com/en-us/library/ff476124.aspx), especially feature level 11.0 optional features that are required (at least most of them) in feature level 11.1 (http://msdn.microsoft.com/en-us/library/hh404457.aspx). And yes I know, brace yourself DX12 is coming (and it will probably inherit most of DX11 caps bits)..
  15. "In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques.  As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer.  In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass.  With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten." link   Does this means we will be able to bind lots of textures to a shader and than based on a drawable material (cb variable) pick the correct texture (index to texture) ?   Say I have a 2d game and I managed to put all my images on only 3 texture atlas, can I bind the 3 textures and never worry about textures again, as my sprites material will have an index to texture, being able to draw everything w/ a single draw call.   Did I get it all wrong?
  16. When implementing subpixel precision in a software rasterizer I've found the following code: It's just too bad the author doesn't explain anything to it. Can someone explain to me how shifting these integer variables gives us 4 bit subpixel precision a.k.a 16 more values of precision ? Not to mention this not working in my code :/ // 28.4 fixed-point coordinates const int Y1 = iround(16.0f * v1.y); const int Y2 = iround(16.0f * v2.y); const int Y3 = iround(16.0f * v3.y); const int X1 = iround(16.0f * v1.x); const int X2 = iround(16.0f * v2.x); const int X3 = iround(16.0f * v3.x); // Fixed-point deltas const int FDX12 = DX12 << 4; const int FDX23 = DX23 << 4; const int FDX31 = DX31 << 4; const int FDY12 = DY12 << 4; const int FDY23 = DY23 << 4; const int FDY31 = DY31 << 4; // Bounding rectangle int minx = (min(X1, X2, X3) + 0xF) >> 4; int maxx = (max(X1, X2, X3) + 0xF) >> 4; int miny = (min(Y1, Y2, Y3) + 0xF) >> 4; int maxy = (max(Y1, Y2, Y3) + 0xF) >> 4; int CY1 = C1 + DX12 * (miny << 4) - DY12 * (minx << 4); int CY2 = C2 + DX23 * (miny << 4) - DY23 * (minx << 4); int CY3 = C3 + DX31 * (miny << 4) - DY31 * (minx << 4); for(int y = miny; y < maxy; y++) { int CX1 = CY1; int CX2 = CY2; int CX3 = CY3; for(int x = minx; x < maxx; x++) { if(CX1 > 0 && CX2 > 0 && CX3 > 0) { colorBuffer[x] = 0x00FFFFFF; } CX1 -= FDY12; CX2 -= FDY23; CX3 -= FDY31; } CY1 += FDX12; CY2 += FDX23; CY3 += FDX31; }
  17. Hi all! Last weekend I took the time to code a class which can rasterize triangles basing my work on some code I could find on the net, particularly this one over at devmaster. Over the course of this week I extended it and now I would say it is a complete solution. Features: The rasterizer can support an arbitrary number of render targets (you will most likely use two, a color buffer and a depth buffer) The rasterization is completely decoupled from the actual shading of the visible pixels. You can configure the rasterizer with a pixel shader which does the actual work of computing and assigning a color value. It does only use integer math because i intend to use it on the GP2X which does not have an FPU. The rasterizer is tile based. Currently it uses blocks of 8x8 pixels. It interpolates an arbitrary number of integer varyings across the triangle (so you can use fixed point here). This is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide. It supports a clipping rectangle. It provides a means for the pixel shader to compute the derivative of the interpolated varyings. This is needed for example to compute the texture mimmap level from the texture coordinates. It allows for an early depth test. For example the shader could store the minimum depth value for each 8x8 block and than discad a whole block if the minimum expected depth value for this block is greater than the one stored. The source code is actually quite short ~600 lines with a lot of comments. The only problem I can see right now is with small or large but thin triangles. Because the rasterizer is tile based it must at least scan a whole 8x8 block and test each pixel if it is inside the triangle or not. Large triangles are handled quite efficiently since for the inner part only the corners are tested for inout. What do you think about this. How big a performance problem might this be when targeting the GP2X? I include the code here for everyoune to look at. I would be very thankful for any input of possible improvements. Header file: /* Copyright (c) 2007, Markus Trenkwalder All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #ifndef RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3 #define RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3 class Rasterizer { public: // some constants static const int MAX_RENDER_TARGETS = 2; static const int MAX_VARYING = 8; static const int BLOCK_SIZE = 8; public: // Type definitions struct Vertex { int x, y; // in 28.4 fixed point int z; // range from 0 to 0x7fffffff int w; // in 16.16 fixed point int varyings[MAX_VARYING]; }; // holds the pointers to each render target. // These will be passed to the fragment shader which then can write to // the pointed to location. Note: Only the first n pointers will be // valid where n is the current number of render targets struct BufferPointers { void* ptr[MAX_RENDER_TARGETS]; }; // This is the data the fragment shader gets struct FragmentData { int z; int varying[MAX_VARYING]; }; typedef FragmentData PixelBlock[BLOCK_SIZE][BLOCK_SIZE] ; class RenderTarget { public: virtual int width() = 0; virtual int height() = 0; virtual int stride() = 0; virtual void *buffer_pointer() = 0; virtual int element_size() = 0; virtual void clear(int x, int y, int w, int h) = 0; }; class FragmentShader { public: // This provides a means for an early depth test. // x and y are the coordinates of the upper left corner of the current block. // If the shader somewhere stores the minimum z of each block that value // can be compared to the parameter z. // returns false when the depth test failed. In this case the whole block // can be culled. virtual bool early_depth_test(int x, int y, int z) { return true; } // This notifies the shader of any render target clears. // This is meant to be used in conjunction with the early depth test to update // any buffers used virtual void clear(int target, int x, int y, int w, int h) {} // To compute the mipmap level of detail one needs the derivativs in x and y of // the texture coordinates. These can be computed from the values in the pixel // block since all the fragment values have alredy been computed for this block // when this is called virtual void prepare_for_block(int x, int y, PixelBlock b) {} // This tells the rasterizer how many varyings this fragment shader needs virtual int varying_count() = 0; // This is called once for each visible fragment inside the triangle // x and y are the coordinates within the block [0, BLOCK_SIZE[ // the pixel block is indexed with p[y][x] !!! virtual void shade(const BufferPointers&, const PixelBlock& b, int x, int y) = 0; }; private: // Variables struct RenderTargetParams { int count; int minwidth, minheight; // cache these params to avoid too // many virtual function calls int stride[MAX_RENDER_TARGETS]; int element_size[MAX_RENDER_TARGETS]; } rendertarget_params_; RenderTarget *rendertargets_[MAX_RENDER_TARGETS]; FragmentShader *fragment_shader_; struct { int x0, y0, x1, y1; } clip_rect_; private: bool setup_valid(); public: // constructor Rasterizer(); public: // main interface // Set the render targets. // This resets the clipping rectangle void rendertargets(int n, RenderTarget* rt[]); // set the fragment shader void fragment_shader(FragmentShader *fs); void clear(); void clear(int target); void clip_rect(int x, int y, int w, int h); // The triangle must be counter clockwise in screen space in order to be // drawn. void draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3); }; #endif Implementation file: /* Copyright (c) 2007, Markus Trenkwalder All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "rasterizer.h" #include <cmath> #include <cassert> #include <algorithm> #ifndef _MSC_VER #include <stdint.h> #else #include "stdint.h" #endif //////////////////////////////////////////////////////////////////////////////// // utility functions namespace { inline int min(int a, int b, int c) { return std::min(std::min(a,b), c); } inline int max(int a, int b, int c) { return std::max(std::max(a,b), c); } inline void compute_plane( int v0x, int v0y, int v1x, int v1y, int v2x, int v2y, int z0, int z1, int z2, int64_t plane[4]) { const int px = v1x - v0x; const int py = v1y - v0y; const int pz = z1 - z0; const int qx = v2x - v0x; const int qy = v2y - v0y; const int qz = z2 - z0; /* Crossproduct "(a,b,c):= dv1 x dv2" is orthogonal to plane. */ const int64_t a = (int64_t)py * qz - (int64_t)pz * qy; const int64_t b = (int64_t)pz * qx - (int64_t)px * qz; const int64_t c = (int64_t)px * qy - (int64_t)py * qx; /* Point on the plane = "r*(a,b,c) + w", with fixed "r" depending on the distance of plane from origin and arbitrary "w" parallel to the plane. */ /* The scalar product "(r*(a,b,c)+w)*(a,b,c)" is "r*(a^2+b^2+c^2)", which is equal to "-d" below. */ const int64_t d = -(a * v0x + b * v0y + c * z0); plane[0] = a; plane[1] = b; plane[2] = c; plane[3] = d; } inline int solve_plane(int x, int y, const int64_t plane[4]) { assert(plane[2] != 0); return (int)((plane[3] + plane[0] * x + plane[1] * y) / -plane[2]); } template <int denominator> inline void floor_divmod(int numerator, int &floor, int &mod) { assert(denominator > 0); if(numerator >= 0) { // positive case, C is okay floor = numerator / denominator; mod = numerator % denominator; } else { // Numerator is negative, do the right thing floor = -((-numerator) / denominator); mod = (-numerator) % denominator; if(mod) { // there is a remainder floor--; mod = denominator - mod; } } } // Fixed point division template <int p> inline int32_t fixdiv(int32_t a, int32_t b) { #if 0 return (int32_t)((((int64_t)a) << p) / b); #else // The following produces the same results as the above but gcc 4.0.3 // generates fewer instructions (at least on the ARM processor). union { int64_t a; struct { int32_t l; int32_t h; }; } x; x.l = a << p; x.h = a >> (sizeof(int32_t) * 8 - p); return (int32_t)(x.a / b); #endif } // Perform a fixed point multiplication using a 64-bit intermediate result to // prevent overflow problems. template <int p> inline int32_t fixmul(int32_t a, int32_t b) { return (int32_t)(((int64_t)a * b) >> p); } } // end anonymous namespace //////////////////////////////////////////////////////////////////////////////// Rasterizer::Rasterizer() : fragment_shader_(0) { rendertarget_params_.count = 0; } bool Rasterizer::setup_valid() { return rendertarget_params_.count >= 1 && fragment_shader_ != 0; } void Rasterizer::clear() { for (int i = 0; i < rendertarget_params_.count; ++i) clear(i); } void Rasterizer::clear(int target) { assert(target <= rendertarget_params_.count); rendertargets_[target]->clear(0, 0, rendertarget_params_.minwidth, rendertarget_params_.minheight); // notify shader about clear (might want to update internal data structutes) if (fragment_shader_) fragment_shader_->clear(target, 0, 0, rendertarget_params_.minwidth, rendertarget_params_.minheight); } void Rasterizer::clip_rect(int x, int y, int w, int h) { if (rendertarget_params_.count == 0) return; clip_rect_.x0 = std::max(0, x); clip_rect_.y0 = std::max(0, y); clip_rect_.x1 = std::min(x + w, rendertarget_params_.minwidth); clip_rect_.y1 = std::min(y + h, rendertarget_params_.minheight); } //////////////////////////////////////////////////////////////////////////////// // main interface void Rasterizer::rendertargets(int n, RenderTarget* rt[]) { assert(n <= MAX_RENDER_TARGETS); RenderTargetParams &rtp = rendertarget_params_; rtp.count = n; if (n == 0) return; rtp.minwidth = rt[0]->width(); rtp.minheight = rt[0]->height(); for (int i = 0; i < n; ++i) { rendertargets_ = rt; rtp.minwidth = std::min(rtp.minwidth, rt->width()); rtp.minheight = std::min(rtp.minheight, rt->height()); // cache these to avoid too many virtual function calls later rtp.element_size = rt->element_size(); rtp.stride = rt->stride(); } clip_rect_.x0 = 0; clip_rect_.y0 = 0; clip_rect_.x1 = rtp.minwidth; clip_rect_.y1 = rtp.minheight; } void Rasterizer::fragment_shader(FragmentShader *fs) { assert(fs != 0); fragment_shader_ = fs; } void Rasterizer::draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3) { if (!setup_valid()) return; int64_t zPlane[4]; int64_t wPlane[4]; int64_t vPlane[MAX_VARYING][4]; compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, v1.z, v2.z, v3.z, zPlane); compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, // interpolate 1/w across triangle fixdiv<16>(1 << 16, v1.w), fixdiv<16>(1 << 16, v2.w), fixdiv<16>(1 << 16, v3.w), wPlane); int varying_count = fragment_shader_->varying_count(); for (int i = 0; i < varying_count; ++i) compute_plane( v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, fixdiv<16>(v1.varyings, v1.w), fixdiv<16>(v2.varyings, v2.w), fixdiv<16>(v3.varyings, v3.w), vPlane ); // Deltas const int DX12 = v1.x - v2.x; const int DX23 = v2.x - v3.x; const int DX31 = v3.x - v1.x; const int DY12 = v1.y - v2.y; const int DY23 = v2.y - v3.y; const int DY31 = v3.y - v1.y; // Fixed-point deltas const int FDX12 = DX12 << 4; const int FDX23 = DX23 << 4; const int FDX31 = DX31 << 4; const int FDY12 = DY12 << 4; const int FDY23 = DY23 << 4; const int FDY31 = DY31 << 4; // Bounding rectangle int minx = (min(v1.x, v2.x, v3.x) + 0xF) >> 4; int maxx = (max(v1.x, v2.x, v3.x) + 0xF) >> 4; int miny = (min(v1.y, v2.y, v3.y) + 0xF) >> 4; int maxy = (max(v1.y, v2.y, v3.y) + 0xF) >> 4; // consider clipping rectangle minx = std::max(minx, clip_rect_.x0); maxx = std::min(maxx, clip_rect_.x1); miny = std::max(miny, clip_rect_.y0); maxy = std::min(maxy, clip_rect_.y1); // Start in corner of 8x8 block minx &= ~(BLOCK_SIZE - 1); miny &= ~(BLOCK_SIZE - 1); BufferPointers buffers; for (int i = 0; i < rendertarget_params_.count; ++i) buffers.ptr = (char*)rendertargets_->buffer_pointer() + miny * rendertargets_->stride(); // Half-edge constants int C1 = DY12 * v1.x - DX12 * v1.y; int C2 = DY23 * v2.x - DX23 * v2.y; int C3 = DY31 * v3.x - DX31 * v3.y; // Correct for fill convention if(DY12 < 0 || (DY12 == 0 && DX12 > 0)) C1++; if(DY23 < 0 || (DY23 == 0 && DX23 > 0)) C2++; if(DY31 < 0 || (DY31 == 0 && DX31 > 0)) C3++; // Loop through blocks for(int y = miny; y < maxy; y += BLOCK_SIZE) { for(int x = minx; x < maxx; x += BLOCK_SIZE) { // Corners of block int x0 = x << 4; int x1 = (x + BLOCK_SIZE - 1) << 4; int y0 = y << 4; int y1 = (y + BLOCK_SIZE - 1) << 4; // Evaluate half-space functions bool a00 = C1 + DX12 * y0 - DY12 * x0 > 0; bool a10 = C1 + DX12 * y0 - DY12 * x1 > 0; bool a01 = C1 + DX12 * y1 - DY12 * x0 > 0; bool a11 = C1 + DX12 * y1 - DY12 * x1 > 0; int a = (a00 << 0) | (a10 << 1) | (a01 << 2) | (a11 << 3); bool b00 = C2 + DX23 * y0 - DY23 * x0 > 0; bool b10 = C2 + DX23 * y0 - DY23 * x1 > 0; bool b01 = C2 + DX23 * y1 - DY23 * x0 > 0; bool b11 = C2 + DX23 * y1 - DY23 * x1 > 0; int b = (b00 << 0) | (b10 << 1) | (b01 << 2) | (b11 << 3); bool c00 = C3 + DX31 * y0 - DY31 * x0 > 0; bool c10 = C3 + DX31 * y0 - DY31 * x1 > 0; bool c01 = C3 + DX31 * y1 - DY31 * x0 > 0; bool c11 = C3 + DX31 * y1 - DY31 * x1 > 0; int c = (c00 << 0) | (c10 << 1) | (c01 << 2) | (c11 << 3); // Skip block when outside an edge if(a == 0x0 || b == 0x0 || c == 0x0) continue; #define CLIP_TEST(X, Y) ((X) >= clip_rect_.x0 && (X) < clip_rect_.x1 && (Y) >= clip_rect_.y0 && (Y) < clip_rect_.y1) // test for the clipping rectangle bool clip00 = CLIP_TEST(x, y); bool clip10 = CLIP_TEST(x + 7, y); bool clip01 = CLIP_TEST(x, y + 7); bool clip11 = CLIP_TEST(x + 7, y + 7); // skip block if all is clippled if (!clip00 && !clip10 && !clip01 && !clip11) continue; bool clip_all_in = clip00 && clip10 && clip01 && clip11; //! compute attribute interpolants at corners FragmentData f00; FragmentData f10; FragmentData f01; FragmentData f11; int xx1 = (x + BLOCK_SIZE) << 4; int yy1 = (y + BLOCK_SIZE) << 4; f00.z = solve_plane(x0, y0, zPlane); f10.z = solve_plane(xx1, y0, zPlane); f01.z = solve_plane(x0, yy1, zPlane); f11.z = solve_plane(xx1, yy1, zPlane); if (!fragment_shader_->early_depth_test(x, y, std::min(std::min(std::min(f00.z, f10.z), f01.z), f11.z))) continue; int w00 = fixdiv<16>(1 << 16, solve_plane(x0, y0, wPlane)); int w10 = fixdiv<16>(1 << 16, solve_plane(xx1, y0, wPlane)); int w01 = fixdiv<16>(1 << 16, solve_plane(x0, yy1, wPlane)); int w11 = fixdiv<16>(1 << 16, solve_plane(xx1, yy1, wPlane)); for (int i = 0; i < varying_count; ++i) { f00.varying = fixmul<16>(solve_plane(x0, y0, vPlane), w00); f10.varying = fixmul<16>(solve_plane(xx1, y0, vPlane), w10); f01.varying = fixmul<16>(solve_plane(x0, yy1, vPlane), w01); f11.varying = fixmul<16>(solve_plane(xx1, yy1, vPlane), w11); } //! compute attribute step y left and right struct varying_step_t { struct step_info_t { int step; int rem; int error_term; step_info_t():error_term(0){} int dostep() { int r = step; error_term += rem; if (error_term >= BLOCK_SIZE) { error_term -= BLOCK_SIZE; r++; } return r; } }; step_info_t z; step_info_t varying[MAX_VARYING]; varying_step_t(FragmentData& p1, FragmentData& p2, int vc) { floor_divmod<BLOCK_SIZE>(p2.z - p1.z, z.step, z.rem); for (int i = 0; i < vc; ++i) { floor_divmod<BLOCK_SIZE>(p2.varying - p1.varying, varying.step, varying.rem); } } }; varying_step_t step_left(f00, f01, varying_count); varying_step_t step_right(f10, f11, varying_count); BufferPointers block_buffers = buffers; #define RENDER_TARGET_LOOP for (int i = 0; i < rendertarget_params_.count; ++i) #define STEP_POINTERS_BY_ELEMENTSIZE(VAR, FACTOR) { RENDER_TARGET_LOOP (char*&)VAR.ptr += FACTOR * rendertarget_params_.element_size; } #define STEP_POINTERS_BY_STRIDE(VAR) { RENDER_TARGET_LOOP (char*&)VAR.ptr += rendertarget_params_.stride; } #define STEP_FRAGMENTDATA(FDVAR, STEPVAR) { FDVAR.z += STEPVAR.z.dostep(); for (int i = 0; i < varying_count; ++i) FDVAR.varying += STEPVAR.varying.dostep(); } // only copy the neccessary varyings #define EFFICIENT_COPY(SRC, DST) { DST.z = SRC.z; for (int i = 0; i < varying_count; ++i) DST.varying = SRC.varying; } #define BLOCK_BEGIN fragment_shader_->prepare_for_block(x, y, pixel_block); for (int iy = 0; iy < BLOCK_SIZE; ++iy) { BufferPointers inner = block_buffers; STEP_POINTERS_BY_ELEMENTSIZE(inner, x); for (int ix = 0; ix < BLOCK_SIZE; ++ix) { #define BLOCK_END STEP_POINTERS_BY_ELEMENTSIZE(inner, 1); } STEP_POINTERS_BY_STRIDE(block_buffers); } PixelBlock pixel_block; bool skip_flag[BLOCK_SIZE][BLOCK_SIZE]; memset(skip_flag, 0, sizeof(skip_flag)); if (!clip_all_in) { for (int iy = 0; iy < BLOCK_SIZE; ++iy) for (int ix = 0; ix < BLOCK_SIZE; ++ix) if (!CLIP_TEST(ix + x, iy + y)) skip_flag[iy][ix] = true; } // Accept whole block when totally covered if(a == 0xF && b == 0xF && c == 0xF) { // first compute all fragment data for(int iy = 0; iy < BLOCK_SIZE; iy++) { //! compute attribute step x for this scanline varying_step_t stepx(f00, f10, varying_count); FragmentData fragment_data = f00; for(int ix = 0; ix < BLOCK_SIZE; ix++) { EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]); STEP_FRAGMENTDATA(fragment_data, stepx); } //! step left and right attrib y STEP_FRAGMENTDATA(f00, step_left); STEP_FRAGMENTDATA(f10, step_right); } //! fragment_shader_block (can now use derivatives of attributes) if (clip_all_in) { BLOCK_BEGIN fragment_shader_->shade(inner, pixel_block, ix, iy); BLOCK_END } else { BLOCK_BEGIN if (!skip_flag[iy][ix]) fragment_shader_->shade(inner, pixel_block, ix, iy); BLOCK_END } } else // Partially covered block { int CY1 = C1 + DX12 * y0 - DY12 * x0; int CY2 = C2 + DX23 * y0 - DY23 * x0; int CY3 = C3 + DX31 * y0 - DY31 * x0; for(int iy = 0; iy < BLOCK_SIZE; iy++) { int CX1 = CY1; int CX2 = CY2; int CX3 = CY3; //! compute attribute step x for this scanline varying_step_t stepx(f00, f10, varying_count); FragmentData fragment_data = f00; for(int ix = 0; ix < BLOCK_SIZE; ix++) { if(!(CX1 > 0 && CX2 > 0 && CX3 > 0)) skip_flag[iy][ix] = true; // we still need to do this since the fragment shader might want // to compute the derivative of attibutes EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]); CX1 -= FDY12; CX2 -= FDY23; CX3 -= FDY31; STEP_FRAGMENTDATA(fragment_data, stepx); } CY1 += FDX12; CY2 += FDX23; CY3 += FDX31; //! step left and right attrib y STEP_FRAGMENTDATA(f00, step_left); STEP_FRAGMENTDATA(f10, step_right); } //! fragment_shader_block (can now use derivatives of attributes) BLOCK_BEGIN if (!skip_flag[iy][ix]) fragment_shader_->shade(inner, pixel_block, ix, iy); BLOCK_END } } for (int i = 0; i < rendertarget_params_.count; ++i) (char*&)buffers.ptr += BLOCK_SIZE * rendertargets_->stride(); } } Test program: /* Copyright (c) 2007, Markus Trenkwalder All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "SDL.h" #include "rasterizer.h" #include <cmath> #include <algorithm> class SDL_SurfaceRenderTarget : public Rasterizer::RenderTarget { SDL_Surface *surface; public: SDL_SurfaceRenderTarget(SDL_Surface *s):surface(s) {} virtual int width() { return surface->w; } virtual int height() { return surface->h; } virtual int stride() { return surface->pitch; } virtual int element_size() { return sizeof(int); } virtual void* buffer_pointer() { return surface->pixels; } virtual void clear(int x, int y, int w, int h) {} }; class TestFragmentShader : public Rasterizer::FragmentShader { public: virtual int varying_count() { return 3; } virtual void shade(const Rasterizer::BufferPointers& ptrs, const Rasterizer::PixelBlock& block, int x, int y) { unsigned int* color_buffer = (unsigned int*)ptrs.ptr[0]; // unfortunaltely at the corners of the triangle we can get negative // values for the interpolants -> std::max int r = std::max(0, block[y][x].varying[0]); int g = std::max(0, block[y][x].varying[1]); int b = std::max(0, block[y][x].varying[2]); int color = r << 16 | g << 8 | b; *color_buffer = color; } }; int main(int ac, char *av[]) { SDL_Init(SDL_INIT_VIDEO); SDL_Surface *screen = SDL_SetVideoMode(320, 240, 32, SDL_SWSURFACE); Rasterizer r; SDL_SurfaceRenderTarget color_target(screen); TestFragmentShader fragment_shader; Rasterizer::RenderTarget* rendertargets[] = { &color_target}; r.rendertargets(1, rendertargets); r.fragment_shader(&fragment_shader); r.clip_rect(45, 70, 100, 100); Rasterizer::Vertex v[3]; v[0].x = (int)(120.0f * 16.0f); v[0].y = (int)(50.0f * 16.0f); v[0].z = 0; v[0].w = 1 << 16; v[0].varyings[0] = 255; v[0].varyings[1] = 0; v[0].varyings[2] = 0; v[1].x = (int)(20.0f * 16.0f); v[1].y = (int)(100.0f * 16.0f); v[1].z = 0x7fffffff; v[1].w = 1 << 16; v[1].varyings[0] = 0; v[1].varyings[1] = 255; v[1].varyings[2] = 0; v[2].x = (int)(150.0f * 16.0f); v[2].y = (int)(220.0f * 16.0f); v[2].z = 0x7fffffff >> 1; v[2].w = 1 << 16; v[2].varyings[0] = 0; v[2].varyings[1] = 0; v[2].varyings[2] = 255; SDL_Rect rect; rect.x = 45; rect.y = 70; rect.w = 100; rect.h = 100; SDL_FillRect(screen, &rect, 0xffffffff); r.draw_triangle(v[0], v[1], v[2]); SDL_Flip(screen); SDL_Event e; while (SDL_WaitEvent(&e) && e.type != SDL_QUIT); SDL_Quit(); return 0; } I don't include the stdint.h file. You can get it yourself if needed. OK, i now have a domain www.trenki.net where I uploaded a vastly improved version of the renderer. The above code is obsolete, the features are retained. [Edited by - Trenki on August 22, 2007 5:11:28 AM]
  18. Finally the ray tracing geekyness starts: https://blogs.msdn.microsoft.com/directx/2018/03/19/announcing-microsoft-directx-raytracing/ lets collect some interesting articles, I start with: https://www.remedygames.com/experiments-with-directx-raytracing-in-remedys-northlight-engine/
  19. Hey all, I'm trying to understand implicit state promotion for directx 12 as well as its intended use case. https://msdn.microsoft.com/en-us/library/windows/desktop/dn899226(v=vs.85).aspx#implicit_state_transitions I'm attempting to utilize copy queues and finding that there's a lot of book-keeping I need to do to first "pre-transition" from my Graphics / Compute Read-Only state (P-SRV | NP-SRV) to Common, Common to Copy Dest, perform the copy on the copy command list, transition back to common, and then find another graphics command list to do the final Common -> (P-SRV | NP-SRV) again. With state promotion, it would seem that I can 'nix the Common -> Copy Dest, Copy Dest -> Common bits on the copy queue easily enough, but I'm curious whether I could just keep all of my "read-only" buffers and images in the common state and effectively not perform any barriers at all. This seems to be encouraged by the docs, but I'm not sure I fully understand the implications. Does this sound right? Thanks.
  20. I am working on a rendering framework. We have adopted the DX12 style where you create all pipelines for all permutations at load time. I am just wondering whether there is a limit to the number of pipelines you can create or if you have to pay some hidden cost if you have pipelines just lying around until you have to actually use it (For example: Choose MSAAX2 pipeline after the user picks it from the settings menu) Or should I only create the pipelines I need at load time and then re-create them whenever necessary?
  21. I'm using fxc.exe from the Win10 SDK (10.0.10586.0) to build my HLSL code. I've got some code in different pixel shaders that uses interlocked instructions on UAVs, such as: RWBuffer<uint> FragmentCount : register(u2); RWTexture2D<uint> HeadIndex : register(u3); ... uint newNode = 0; /*!!*/InterlockedAdd(FragmentCount[0], 1U, newNode);/*!!*/ ... uint previousTail = 0; /*!!*/InterlockedExchange(HeadIndex[xy], newNode+1, previousTail); /*!!*/ ... uint previousHead = 0; /*!!*/InterlockedAdd(HeadIndex[xy], 0x01000000, previousHead);/*!!*/ This compiles fine with the ps_5_0, cs_5_0 and cs_5_1 targets, but with ps_5_1, the compiler gives error x4532: cannot map expression to ps_5_1 instruction set, on the lines indicated with /*!!*/ Anyone else experienced this? What the hell, right?
  22. Hi, I'm reading https://software.intel.com/en-us/articles/sample-application-for-direct3d-12-flip-model-swap-chains to figure out the best way to setup my swapchain but I dont understand the following: 1 - In "Classic Mode" on the link above, what causes the GPU to wait for the next Vsync before starting working on the next frame? (eg: orange bar on the second column doesn't start executing right after the previous blue bar) Its clear it has to wait because the new frame will render to the render target currently on screen, but is there an explicit wait one must do in code? Or does the driver force the wait? If so, how does the driver know? Does it check which RT is bound, so if I was rendering to GBuffer no wait would happen? 2 - When Vsync is off, what does it mean that a frame is dropped and what causes it? Thanks!!
  23. Simple question. What is best practice for drawing overlay graphics in Directx 12? For now, all I want to do is to draw a semi-transparent rectangle in the upper left corner of my view. Is there a shortcut, or do I need to set up more shaders, vertex buffers, constant buffers, root signatures etc. etc.? Since we are talking about DX12 I guess it's the later Any small example project out there?
  24. I wanted to see how others are currently handling descriptor heap updates and management. I've read a few articles and there tends to be three major strategies : 1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc) 2) You have one descriptor heap for an entire pipeline 3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc) The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient. The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
  25. I have a vertex buffer on a default heap. I need a CPU pointer to that buffer in order to loop through the vertices and change one value in some vertices (the color value). In the past this was possible by creating the buffer with the flag D3DUSAGE_DYNAMIC/D3D11_USAGE_DYNAMIC and using IDirect3DVertexBuffer9::Lock or ID3D11DeviceContext::Map to get a pointer. What is the correct way to do the same in DX 12? As far as I understand, the method ID3D12Resource::Map cannot be used on a default heap because default heaps cannot be accessed directly from the CPU. The documentation says that upload heaps are intended for CPU-write-once, GPU-read-once usage, so I don't think these are equivalent to the "dynamic" buffers. Is the readback heap equivalent to what was called a dynamic buffer? Or should I create a custom heap? I am thinking to do the following: -Create a temporary readback heap. -Copy the data from the default heap to the readback heap using UpdateSubresources. -Get a CPU pointer to the readback heap using Map and edit the data. -Copy the data back to the default heap using UpdateSubresources. What do you think about this?
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!