Jump to content
  • Advertisement

Search the Community

Showing results for tags 'DX12'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Categories

  • Audio
    • Music and Sound FX
  • Business
    • Business and Law
    • Career Development
    • Production and Management
  • Game Design
    • Game Design and Theory
    • Writing for Games
    • UX for Games
  • Industry
    • Interviews
    • Event Coverage
  • Programming
    • Artificial Intelligence
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Engines and Middleware
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
  • Archive

Categories

  • Audio
  • Visual Arts
  • Programming
  • Writing

Categories

  • Game Dev Loadout
  • Game Dev Unchained

Categories

  • Game Developers Conference
    • GDC 2017
    • GDC 2018
  • Power-Up Digital Games Conference
    • PDGC I: Words of Wisdom
    • PDGC II: The Devs Strike Back
    • PDGC III: Syntax Error

Forums

  • Audio
    • Music and Sound FX
  • Business
    • Games Career Development
    • Production and Management
    • Games Business and Law
  • Game Design
    • Game Design and Theory
    • Writing for Games
  • Programming
    • Artificial Intelligence
    • Engines and Middleware
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
    • 2D and 3D Art
    • Critique and Feedback
  • Community
    • GameDev Challenges
    • GDNet+ Member Forum
    • GDNet Lounge
    • GDNet Comments, Suggestions, and Ideas
    • Coding Horrors
    • Your Announcements
    • Hobby Project Classifieds
    • Indie Showcase
    • Article Writing
  • Affiliates
    • NeHe Productions
    • AngelCode
  • Topical
    • Virtual and Augmented Reality
    • News
  • Workshops
    • C# Workshop
    • CPP Workshop
    • Freehand Drawing Workshop
    • Hands-On Interactive Game Development
    • SICP Workshop
    • XNA 4.0 Workshop
  • Archive
    • Topical
    • Affiliates
    • Contests
    • Technical
  • GameDev Challenges's Topics
  • For Beginners's Forum

Calendars

  • Community Calendar
  • Games Industry Events
  • Game Jams
  • GameDev Challenges's Schedule

Blogs

There are no results to display.

There are no results to display.

Product Groups

  • Advertisements
  • GameDev Gear

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me


Website


Role


Twitter


Github


Twitch


Steam

Found 297 results

  1. So the title pretty much sums it all up. I wanted to know how to set the HWND for my DirectX Engine to a C# Panel. I know it involves making a C++/CLI Wrapper. I'm learning DirectX from Frank Luna's DirectX 12 Book so the engine is here: https://github.com/d3dcoder/d3d12book. If someone could download his source code and make a C++/CLI Wrapper out of Chapter 4 that would be great. All you need to do is go to chapter 4 and open it in Visual Studio and you'll have all the files there. I know this is a lot of ask but I've been trying to do this for days and any help would be so appreciated. 
  2. I am working on getting my dx12 renderer working. However, I have encountered a perplexing issue. The pipeline state shows the correct vs and ps and creates correctly, But nothing renders. The Debugger show the ia stage correctly , the vs stage correctly , then has stage not bound for the ps stage. I don't understand the ps is set and the debugger shows it in the state object the vs output and ps input are the same why is not bound?
  3. Hello My question is as in the topic title. Thanks in advance
  4. Hey Guys,   I have a very simple profile system in my little dx12 engine which can basically visualize time spent on GPU for each task by using timestamp. This is a good way to tell whether we got GPU bubbles or not, or identify suspicious time consuming passes . But the problem is that we can't tell the GPU usage, for example my 'fancy' postprocess pass my be bandwidth limited, or my compute shader maybe register limited which all could cause very little gpu usage that cannot be clearly reflected by using timestamp. So I really hope to be able to visualize the gpu usage thus I can tell whether the GPU is fully saturated or not, and then could be able to do better optimization.    And I believe being able to visualize gpu usage per task is an very important way to place your async compute shader wisely..   So it will be greatly appreciated if someone could enlightening me on that   Thanks   Peng
  5. Hello guys,   I'm coding a simple D3D12 program and have many command lists with hundreds of prerecorded commands (commands are recorded once at initialization and never reset again). The problem is that commands that reference the backbuffer can not be recorded because i'm using triplebuffering and when a command recorded for the current backbuffer is executed on the next frame, the program hangs. For example i can't do something like this (i can record it but can't execute it without hanging): m_command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(), D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET)); m_command_list->OMSetRenderTargets(1, &m_rtv_handle[m_frame_index], false, nullptr); m_command_list->ClearRenderTargetView(m_rtv_handle[m_frame_index], clearColor, 0, nullptr); m_command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(), D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT)); This totally breaks my prerecording model. One thing would have to get the handle to the current backbuffer and record a separate command every frame. Another thing would be to prerecord a set of command lists (one for each backbuffer) with the commands in the example above and execute the corresponding one before or after my other prerecorded command lists (the ones with draw submissions), but what if i'd like to set a resource barrier or clear the backbuffer in the middle of my command lists? (it makes no sense to clear the backbuffer in the middle of a frame but is just an example). In D3D11 it was easy to do this with deferred contexts when creating the swap chain with the swap effect as DXGI_SWAP_EFFECT_DISCARD because the current writeable backbuffer was only accesible through index 0. In D3D12 i can not even set the backbuffer count below 2, no matter what swap effect i'm creating the swap chain with. Do you guys have a programing model to overcome this?  
  6. After looking through docs about dx12 resource binding for awhile, I felt I still get confused when the following term come together:   D3D12_GPU_VIRTUAL_ADDRESS, D3D12_CPU_DESCRIPTOR_HANDLE, D3D12_GPU_DESCRIPTOR_HANDLE   From what I know, D3D12_GPU_VIRTUAL_ADDRESS works as GPU pointers point to actual GPU resource ( in vram ), once you got that pointer you can get access to the real data (you may need info about the memory layout .etc though)     D3D12_CPU_DESCRIPTOR_HANDLE, D3D12_GPU_DESCRIPTOR_HANDLE are handles for your resource descriptor (one for CPU, one for GPU and reside in descriptor heap only), which describes your gpu resource (include your resource layout, config, and always get linked to one ID3DResource by calling API CreateXXXView, through which you can essentially get D3D12_GPU_VIRTUAL_ADDRESS)   The first question I have is: why we have different versions of descriptor handle(one for CPU, and one for GPU). My guessing (from dx12 sample code) is: if you want to use handles in a cmdlist, you need to use GPU version, otherwise, use CPU version (but why they are designed like this? will it be easier and more straight forward if the driver maintains a map between the CPU version and GPU version handle, so the developer can only focus on one descriptor handle? what's the catch?).    Also from DX12 resource bounding API, I found they only use D3D12_GPU_DESCRIPTOR_HANDLE when setting descriptor tables. So if I understand correctly, I can safely ignore all descriptor stuff if I don't use descriptor table ( I mean for real simple demo, theoretically), for example I don't even need to call SetDescriptorHeaps even...     Sorry for all these mess, I just not very clear on all these stuff. It will be greatly appreciated if someone could elaborate on that both from the API design perspective view, and from application benefits point of view .   Thanks
  7. I'm trying to share resources across different command queues. Obviously I want to keep the sharing to a minimum but at some point something needs to be shared. Afaik there's no mutex object in dx12, but does using fences work? Here's my current plan that seems to work, but I'm wondering if it could be better or what other people do. I want the compute queue to run as fast as possible, and as many times as it can, but I also want the draw queue to stop it when drawing needs to happen.   F0 <- 1 F1 <- 0   Compute loop: * queue->Set F1 = 1 * queue->Wait for F0 == 1 * Exec command list * queue->Set F1 = 0   Draw loop: * cpu->Set F0 = 0, this should stop new computes from starting * queue->Wait for F1 == 0, this should wait until the currently executing compute command list is finished * Exec command list, compute queue should not be executing anything at this point * queue->Set F0 = 1, compute queue can run again   I'm pretty sure there's a deadlock in there somewhere but whatever.   Should/could I do this with one fence instead of two? If I have a bunch of command lists pending inside the compute queue, I want the draw queue to take priority so that I can stack a bunch of stuff into compute but maintain priority for drawing. Obviously this will fall apart if a compute command list takes too long and the draw queue waits for it, so I want compute command lists to be pretty fast relative to draws.   I think setting the queue priority would let me do this -- i.e. if two command queues are waiting on the same fence, the one with higher priority should get it? But all the documentation says is  so idk :(.
  8. Hi. I started to learn DX12 and I have a question regarding constant buffer update.   I need to write only to the certain parts of my constant buffer by offset. In D3D11 I could do like that:   BackingBuffer is type of DataBuffer and I could write there my data separately by offsets like this: BackingBuffer.Set(offset, ref value); And then update my constant buffer: // Setup the dest region inside the buffer if ((this.Description.BindFlags & BindFlags.ConstantBuffer) != 0) { device.UpdateSubresource(new DataBox(BackingBuffer.Pointer, 0, 0), constantBuffer); } else { var destRegion = new ResourceRegion(offsetInBytes, 0, 0, offsetInBytes + BackingBuffer.Size, 1, 1); device.UpdateSubresource(new DataBox(BackingBuffer.Pointer, 0, 0), constantBuffer, 0, destRegion); } How can I do the same in D3D12?
  9. In DX11 and prior to draw to an off-screen texture (renderTarget) you would do something like this: dx11_dev_context->OMSetRenderTargets(1, &_color_view, _depth_view); And any draw calls after that would get sent to that texture until another OMSetRenderTargets() was called. This allowed a bind() in a high level function and then draw all objects in a lower scene-graph draw().   My question is in DX12 how is this type of scenario accomplished? Does every CommandList need to do a cl->OMSetRenderTargets() call so that if a Draw call is made it knows where to go?   Is it and option to:   CL0 {    Transition::RenderTarget   OMSetRenderTargets()    ClearRenderTargetView } CL1 {   IASetPrimTop   IASetVertexBuffer   ....   Draw scene objects } CL2 {    Transition::Present }   CommandQueue->execute(CL0,CL1,Cl2);
  10. Anyone had any success reading back texture data?  I'm getting all black. ... ThrowIfFailed(device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_READBACK), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(readbackBufferSize), D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&textureReadback))); D3D12_TEXTURE_COPY_LOCATION source; source.pResource = texture.Get(); source.Type = D3D12_TEXTURE_COPY_TYPE_SUBRESOURCE_INDEX; source.SubresourceIndex = 0; D3D12_TEXTURE_COPY_LOCATION dest; dest.pResource = textureReadback.Get(); dest.Type = D3D12_TEXTURE_COPY_TYPE_PLACED_FOOTPRINT; dest.PlacedFootprint.Offset = 0; dest.PlacedFootprint.Footprint.Format = GetPixelFormat(descriptor); dest.PlacedFootprint.Footprint.Width = descriptor.width; dest.PlacedFootprint.Footprint.Height = descriptor.height; dest.PlacedFootprint.Footprint.Depth = 1; dest.PlacedFootprint.Footprint.RowPitch = descriptor.width * BytesPerPixel; list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(texture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_COPY_SOURCE)); list->CopyTextureRegion(&dest, 0, 0, 0, &source, nullptr); list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(texture.Get(), D3D12_RESOURCE_STATE_COPY_SOURCE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE)); ... After this I use a fence and wait for it to complete, then I call Map on the textureReadback and memcpy the data to the buffer I want it in.  And it's all zeros.
  11. hi there,   EDIT: oh didnt see that there's a newer driver, installing 355.82 solved it...   I tried to run the DX12 hello window graphics sample (https://github.com/Microsoft/DirectX-Graphics-Samples/blob/master/Samples/D3D12HelloWorld/src/HelloWindow/D3D12HelloWindow.cpp), but I can only run it, if I set m_useWarpDevice to true, meaning I can only create a warp device.   I have a 64 bit windows 10 installed, I tried to run the sample on a gtx660 using the latest driver (353.62). Nvidia control panel says I have dx12 runtime and api version, but only feature level 11_0. DXDiag reports 11.3 directx version, and says that the driver is wddm 2.0 capable, but again, it only lists feature levels up to 11_0.   While I can still develop using a warp device, I'll need the hardware device later, so it'd be great if I could solve this   any idea how to fix this? I tried to reinstall the display driver previously, no success.   best regards, Yours3!f
  12. Hi, I was playing around with dx12 using sharpdx beta. I'm able to render simple models now. However I went into troubles when trying to render models with several sub parts. The texture I uploaded is not displaying for second part.   This is how I do it:   1. Create a root signature var rootSignatureDesc = new RootSignatureDescription( RootSignatureFlags.AllowInputAssemblerInputLayout, // Root Parameters new[] { new RootParameter(ShaderVisibility.All, new [] { new DescriptorRange() { RangeType = DescriptorRangeType.ShaderResourceView, DescriptorCount = 2, OffsetInDescriptorsFromTableStart = -1, BaseShaderRegister = 0 }, new DescriptorRange() { RangeType = DescriptorRangeType.ConstantBufferView, DescriptorCount = 1, OffsetInDescriptorsFromTableStart = -1, BaseShaderRegister = 0 } }), new RootParameter(ShaderVisibility.Pixel, new DescriptorRange() { RangeType = DescriptorRangeType.Sampler, DescriptorCount = 1, OffsetInDescriptorsFromTableStart = -1, BaseShaderRegister = 0 }), }); 2. Create Descriptor Heap for srv and cbf var srvCbvHeapDesc = new DescriptorHeapDescription() { DescriptorCount = 3, Flags = DescriptorHeapFlags.ShaderVisible, Type = DescriptorHeapType.ConstantBufferViewShaderResourceViewUnorderedAccessView }; srvCbvHeap = device.CreateDescriptorHeap(srvCbvHeapDesc); 3.Upload textures var tex = Texture.LoadFromFile("../../models/test.tga"); byte[] textureData = tex.Data; TextureWidth = tex.Width; TextureHeight = tex.Height; var textureDesc = ResourceDescription.Texture2D(Format.B8G8R8A8_UNorm, TextureWidth, TextureHeight, 1, 1, 1, 0, ResourceFlags.None, TextureLayout.Unknown, 0); texture = device.CreateCommittedResource(new HeapProperties(HeapType.Upload), HeapFlags.None, textureDesc, ResourceStates.GenericRead, null); texture.Name = "Texture"; var handle = GCHandle.Alloc(textureData, GCHandleType.Pinned); var ptr = Marshal.UnsafeAddrOfPinnedArrayElement(textureData, 0); texture.WriteToSubresource(0, null, ptr, TextureWidth * 4, textureData.Length); handle.Free(); 4. Create Shader Resource View Description for each texture by offsetting the heap address var Step = device.GetDescriptorHandleIncrementSize(DescriptorHeapType.ConstantBufferViewShaderResourceViewUnorderedAccessView); ...... device.CreateShaderResourceView(texture[i], srvDesc, srvCbvHeap.CPUDescriptorHandleForHeapStart + I * Step); 5.When building command list I set the descriptor table for each texture resource by offsetting commandList.SetGraphicsRootSignature(rootSignature); DescriptorHeap[] descHeaps = new[] { srvCbvHeap, samplerViewHeap }; commandList.SetDescriptorHeaps(descHeaps.GetLength(0), descHeaps); commandList.SetGraphicsRootDescriptorTable(0, srvCbvHeap.GPUDescriptorHandleForHeapStart); commandList.SetGraphicsRootDescriptorTable(1, samplerViewHeap.GPUDescriptorHandleForHeapStart); ...... draw first ...... commandList.SetGraphicsRootDescriptorTable(0, srvCbvHeap.GPUDescriptorHandleForHeapStart + Step * i); ...... draw second ...... But the thing is only the first part will have texture. I'm still trying to understand how the whole resource management works but it looks so complex. Please help, I checked the cpp sample code on GitHub but can't find what I did wrong.   Thanks in advance.
  13. Hi, I wonder if there is available a chart (xls or whatever) showing the different IHVs' GPUs support for the optional Direct3D 11 features (http://msdn.microsoft.com/en-us/library/ff476124.aspx), especially feature level 11.0 optional features that are required (at least most of them) in feature level 11.1 (http://msdn.microsoft.com/en-us/library/hh404457.aspx). And yes I know, brace yourself DX12 is coming (and it will probably inherit most of DX11 caps bits)..
  14. "In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques.  As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer.  In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass.  With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten." link   Does this means we will be able to bind lots of textures to a shader and than based on a drawable material (cb variable) pick the correct texture (index to texture) ?   Say I have a 2d game and I managed to put all my images on only 3 texture atlas, can I bind the 3 textures and never worry about textures again, as my sprites material will have an index to texture, being able to draw everything w/ a single draw call.   Did I get it all wrong?
  15. Well, I decided to make a DirectX 12 quick reference guide... The main issue is dealing with page formatting. It's not just about some MS crazy nomenclature like VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation (one of the best feature of D3D12 though), simply I never wrote such type of document. Currently I am using notepad++ for raw text data and InDesign for the layout and formatting..   I am using the Vulkan quick reference guide as reference.   Here I attach a sample draft, it doesn't have final colours, annotation and structures are missing; but I would like to receive some feedbacks and tips about it, especially on layout and text organization. I noted that current font sizes are enough good for printing but terrible on non-high density screens. I also would like to know if you have a better model to use as reference (for both layout and text grouping and organization). Finally, are external url or internal links useful for you?    Thank you UPDATE: it is now on github, RAW data included: https://github.com/alessiot89/D3D12QuickRef/
  16. When implementing subpixel precision in a software rasterizer I've found the following code: It's just too bad the author doesn't explain anything to it. Can someone explain to me how shifting these integer variables gives us 4 bit subpixel precision a.k.a 16 more values of precision ? Not to mention this not working in my code :/ // 28.4 fixed-point coordinates const int Y1 = iround(16.0f * v1.y); const int Y2 = iround(16.0f * v2.y); const int Y3 = iround(16.0f * v3.y); const int X1 = iround(16.0f * v1.x); const int X2 = iround(16.0f * v2.x); const int X3 = iround(16.0f * v3.x); // Fixed-point deltas const int FDX12 = DX12 << 4; const int FDX23 = DX23 << 4; const int FDX31 = DX31 << 4; const int FDY12 = DY12 << 4; const int FDY23 = DY23 << 4; const int FDY31 = DY31 << 4; // Bounding rectangle int minx = (min(X1, X2, X3) + 0xF) >> 4; int maxx = (max(X1, X2, X3) + 0xF) >> 4; int miny = (min(Y1, Y2, Y3) + 0xF) >> 4; int maxy = (max(Y1, Y2, Y3) + 0xF) >> 4; int CY1 = C1 + DX12 * (miny << 4) - DY12 * (minx << 4); int CY2 = C2 + DX23 * (miny << 4) - DY23 * (minx << 4); int CY3 = C3 + DX31 * (miny << 4) - DY31 * (minx << 4); for(int y = miny; y < maxy; y++) { int CX1 = CY1; int CX2 = CY2; int CX3 = CY3; for(int x = minx; x < maxx; x++) { if(CX1 > 0 && CX2 > 0 && CX3 > 0) { colorBuffer[x] = 0x00FFFFFF; } CX1 -= FDY12; CX2 -= FDY23; CX3 -= FDY31; } CY1 += FDX12; CY2 += FDX23; CY3 += FDX31; }
  17. Hello,   I'm currently implementing DirectX 12 into an engine that we developing last few months.  I have now a basic renderer that is able to render models, textures and using shaders. But now I'm running into a problem I can't figure out anymore. Since last week the D3D12CreateDevice() is failing. The error happens when I loop through the adapters to find any adapter with DX12 support.   The code is very similar from the samples: ComPtr<IDXGIAdapter1> adapter; *ppAdapter = nullptr; for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != pFactory->EnumAdapters1(adapterIndex, &adapter); ++adapterIndex) { DXGI_ADAPTER_DESC1 desc; adapter->GetDesc1(&desc); if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) { // Don't select the Basic Render Driver adapter. continue; } // Check to see if the adapter supports Direct3D 12, but don't create the actual device yet. HRESULT r = D3D12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr); if (SUCCEEDED(r)) { break; } else { // Log why it cant create: IS_FAILED(r); } } *ppAdapter = adapter.Detach(); Return value of r: HRESULT r = E_POINTER Invalid pointer. The return value is bit unusual for me that I don't know what to do with it.  My GPU is a GeForce GT 650M, which should be able to run DirectX 12 to some point. (It also runs the DirectX12 Sample and Mini Engine.)   If there is more information needed, please tell and I will provide. Many Thanks, Joris    
  18. I am trying to render a textured quad.  If I just use a heap of type D3D12_HEAP_TYPE_UPLOAD then I can see the texture perfectly.  But I want to use a heap of type D3D12_HEAP_TYPE_DEFAULT and use a D3D12_HEAP_TYPE_UPLOAD to copy the data to it.   So everything works when I call:   CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD)... IID_PPV_ARGS(&textureUpload)); textureUpload->WriteToSubresource(... device->CreateDescriptorHeap(... device->CreateShaderResourceView(textureUpload.Get()...   With this I see my texture just fine.  But when I try to do it like so, I just get a black quad:   CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT)... IID_PPV_ARGS(&texture)); CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD)... IID_PPV_ARGS(&textureUpload)); UpdateSubresources(commandList, texture.Get(), textureUpload.Get()... commandList->ResourceBarrier(... device->CreateDescriptorHeap(... device->CreateShaderResourceView(texture.Get()...   Is there something blatantly wrong that I'm doing??
  19. Hello, ladies and gentlemen.   Today, I've been trying to create a swap chain to no avail and I really can't figure out why.   When I call CreateSwapChain(), it fails with 0x887a0001 (DXGI_ERROR_INVALID_CALL) and the debug layer doesn't even report anything.   The way I create my swap chain is no different than the way the official samples do (The samples run just fine on my computer and my GPU fully support DX12). In fact, the code below is almost the same as the code found in the samples, so where could the problem be?   Any help is appreciated.   Here's the code: bool Renderer::InitializePipeline(HWND hWnd) { HRESULT HR; ID3D12Debug *debugController; HR = D3D12GetDebugInterface(IID_PPV_ARGS(&debugController)); if (SUCCEEDED(HR)) { debugController->EnableDebugLayer(); } HR = D3D12CreateDevice(nullptr, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&device)); if (FAILED(HR)) return false; D3D12_COMMAND_QUEUE_DESC commandQueueDesc = {}; commandQueueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE; commandQueueDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; HR = device->CreateCommandQueue(&commandQueueDesc, IID_PPV_ARGS(&commandQueue)); if (FAILED(HR)) return false; DXGI_SWAP_CHAIN_DESC swapChainDesc = {}; swapChainDesc.BufferCount = 2; //Used to be 1, but changed it to 2 thanks to Alessio1989. There's still a problem I haven't figured out... swapChainDesc.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; swapChainDesc.BufferDesc.Height = 600; swapChainDesc.BufferDesc.Width = 800; swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; swapChainDesc.OutputWindow = hWnd; swapChainDesc.SampleDesc.Count = 1; swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; swapChainDesc.Windowed = true; IDXGIFactory4 *factory; HR = CreateDXGIFactory1(IID_PPV_ARGS(&factory)); if (FAILED(HR)) { factory->Release(); return false; } HR = factory->CreateSwapChain(commandQueue, &swapChainDesc, &swapChain); //Fails every single time. factory->Release(); if (FAILED(HR)) debugController->Release(); return false; debugController->Release(); return true; }
  20. How is one supposed to use Root Signatures for rendering lots of different things?  I've been reading through the documentation and sample code for DX12, but everything so far is set up for a single application that has one specific rendering purpose.   This page on Root Signatures says "Currently, there is one graphics and one compute root signature per app." Yet the first sentence in the paragraph is "The root signature defines what resources are bound to the graphics pipeline." If a root signature links the command list to the resources required by the shader (textures and cbuffers), then it would seem like I want a root signature per distinct shader.  (eg. one that has diffuse/normal textures, a different one that has 3 cbuffers, etc.)   Or am I supposed to instead make an uber root signature that contains the maximum SRVs and CBVs I could ever use?
  21. I have two questions.   As I understand it, you generally want to keep your GPU a frame or two behind your CPU.  While your GPU is rendering a frame, the CPU is generating the draw calls for the next frame, so that there isn't a bottleneck between the two.   1) How does this work in practical terms?  Is the CPU side, "generating the draw calls", just building the command lists with calls to DrawIndexedInstanced and the like?  And then to actually perform the rendering, the GPU side, you call ExecuteCommandLists?   2) In terms of multi-threaded rendering, is that a misnomer?  Are the other threads just generating draw calls, with the main rendering thread being the only thing that actually calls ExecuteCommandLists?  Or can you simultaneously render to various textures, and then your main rendering thread uses them to generate a frame for the screen?
  22. rerndering to the first layer works just fine: but the second layer looks this: resource descs: http://pastebin.com/zetRzV9t rtv desc: http://pastebin.com/905tW9Lx dsv desc: http://pastebin.com/YtY6rLL2 vertex shader: http://pastebin.com/8UYhzNKW geometry shader: http://pastebin.com/t5nBAWFA pixel shader: http://pastebin.com/Ljbchrre ---------- for rendering gbuffer srv desc: http://pastebin.com/guDMnWqP screenquad vs: http://pastebin.com/qUz0YRiy screenquad gs: http://pastebin.com/1LymhJ4W screenquad ps: http://pastebin.com/yBehwuBz
  23. Hi all! Last weekend I took the time to code a class which can rasterize triangles basing my work on some code I could find on the net, particularly this one over at devmaster. Over the course of this week I extended it and now I would say it is a complete solution. Features: The rasterizer can support an arbitrary number of render targets (you will most likely use two, a color buffer and a depth buffer) The rasterization is completely decoupled from the actual shading of the visible pixels. You can configure the rasterizer with a pixel shader which does the actual work of computing and assigning a color value. It does only use integer math because i intend to use it on the GP2X which does not have an FPU. The rasterizer is tile based. Currently it uses blocks of 8x8 pixels. It interpolates an arbitrary number of integer varyings across the triangle (so you can use fixed point here). This is done perspectively correct for the corners of each 8x8 block and affine within each block to avoid the costly per pixel divide. It supports a clipping rectangle. It provides a means for the pixel shader to compute the derivative of the interpolated varyings. This is needed for example to compute the texture mimmap level from the texture coordinates. It allows for an early depth test. For example the shader could store the minimum depth value for each 8x8 block and than discad a whole block if the minimum expected depth value for this block is greater than the one stored. The source code is actually quite short ~600 lines with a lot of comments. The only problem I can see right now is with small or large but thin triangles. Because the rasterizer is tile based it must at least scan a whole 8x8 block and test each pixel if it is inside the triangle or not. Large triangles are handled quite efficiently since for the inner part only the corners are tested for inout. What do you think about this. How big a performance problem might this be when targeting the GP2X? I include the code here for everyoune to look at. I would be very thankful for any input of possible improvements. Header file: /* Copyright (c) 2007, Markus Trenkwalder All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #ifndef RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3 #define RASTERIZER_EE4F8987_BDA2_434d_A01F_BB3446E535C3 class Rasterizer { public: // some constants static const int MAX_RENDER_TARGETS = 2; static const int MAX_VARYING = 8; static const int BLOCK_SIZE = 8; public: // Type definitions struct Vertex { int x, y; // in 28.4 fixed point int z; // range from 0 to 0x7fffffff int w; // in 16.16 fixed point int varyings[MAX_VARYING]; }; // holds the pointers to each render target. // These will be passed to the fragment shader which then can write to // the pointed to location. Note: Only the first n pointers will be // valid where n is the current number of render targets struct BufferPointers { void* ptr[MAX_RENDER_TARGETS]; }; // This is the data the fragment shader gets struct FragmentData { int z; int varying[MAX_VARYING]; }; typedef FragmentData PixelBlock[BLOCK_SIZE][BLOCK_SIZE] ; class RenderTarget { public: virtual int width() = 0; virtual int height() = 0; virtual int stride() = 0; virtual void *buffer_pointer() = 0; virtual int element_size() = 0; virtual void clear(int x, int y, int w, int h) = 0; }; class FragmentShader { public: // This provides a means for an early depth test. // x and y are the coordinates of the upper left corner of the current block. // If the shader somewhere stores the minimum z of each block that value // can be compared to the parameter z. // returns false when the depth test failed. In this case the whole block // can be culled. virtual bool early_depth_test(int x, int y, int z) { return true; } // This notifies the shader of any render target clears. // This is meant to be used in conjunction with the early depth test to update // any buffers used virtual void clear(int target, int x, int y, int w, int h) {} // To compute the mipmap level of detail one needs the derivativs in x and y of // the texture coordinates. These can be computed from the values in the pixel // block since all the fragment values have alredy been computed for this block // when this is called virtual void prepare_for_block(int x, int y, PixelBlock b) {} // This tells the rasterizer how many varyings this fragment shader needs virtual int varying_count() = 0; // This is called once for each visible fragment inside the triangle // x and y are the coordinates within the block [0, BLOCK_SIZE[ // the pixel block is indexed with p[y][x] !!! virtual void shade(const BufferPointers&, const PixelBlock& b, int x, int y) = 0; }; private: // Variables struct RenderTargetParams { int count; int minwidth, minheight; // cache these params to avoid too // many virtual function calls int stride[MAX_RENDER_TARGETS]; int element_size[MAX_RENDER_TARGETS]; } rendertarget_params_; RenderTarget *rendertargets_[MAX_RENDER_TARGETS]; FragmentShader *fragment_shader_; struct { int x0, y0, x1, y1; } clip_rect_; private: bool setup_valid(); public: // constructor Rasterizer(); public: // main interface // Set the render targets. // This resets the clipping rectangle void rendertargets(int n, RenderTarget* rt[]); // set the fragment shader void fragment_shader(FragmentShader *fs); void clear(); void clear(int target); void clip_rect(int x, int y, int w, int h); // The triangle must be counter clockwise in screen space in order to be // drawn. void draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3); }; #endif Implementation file: /* Copyright (c) 2007, Markus Trenkwalder All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "rasterizer.h" #include <cmath> #include <cassert> #include <algorithm> #ifndef _MSC_VER #include <stdint.h> #else #include "stdint.h" #endif //////////////////////////////////////////////////////////////////////////////// // utility functions namespace { inline int min(int a, int b, int c) { return std::min(std::min(a,b), c); } inline int max(int a, int b, int c) { return std::max(std::max(a,b), c); } inline void compute_plane( int v0x, int v0y, int v1x, int v1y, int v2x, int v2y, int z0, int z1, int z2, int64_t plane[4]) { const int px = v1x - v0x; const int py = v1y - v0y; const int pz = z1 - z0; const int qx = v2x - v0x; const int qy = v2y - v0y; const int qz = z2 - z0; /* Crossproduct "(a,b,c):= dv1 x dv2" is orthogonal to plane. */ const int64_t a = (int64_t)py * qz - (int64_t)pz * qy; const int64_t b = (int64_t)pz * qx - (int64_t)px * qz; const int64_t c = (int64_t)px * qy - (int64_t)py * qx; /* Point on the plane = "r*(a,b,c) + w", with fixed "r" depending on the distance of plane from origin and arbitrary "w" parallel to the plane. */ /* The scalar product "(r*(a,b,c)+w)*(a,b,c)" is "r*(a^2+b^2+c^2)", which is equal to "-d" below. */ const int64_t d = -(a * v0x + b * v0y + c * z0); plane[0] = a; plane[1] = b; plane[2] = c; plane[3] = d; } inline int solve_plane(int x, int y, const int64_t plane[4]) { assert(plane[2] != 0); return (int)((plane[3] + plane[0] * x + plane[1] * y) / -plane[2]); } template <int denominator> inline void floor_divmod(int numerator, int &floor, int &mod) { assert(denominator > 0); if(numerator >= 0) { // positive case, C is okay floor = numerator / denominator; mod = numerator % denominator; } else { // Numerator is negative, do the right thing floor = -((-numerator) / denominator); mod = (-numerator) % denominator; if(mod) { // there is a remainder floor--; mod = denominator - mod; } } } // Fixed point division template <int p> inline int32_t fixdiv(int32_t a, int32_t b) { #if 0 return (int32_t)((((int64_t)a) << p) / b); #else // The following produces the same results as the above but gcc 4.0.3 // generates fewer instructions (at least on the ARM processor). union { int64_t a; struct { int32_t l; int32_t h; }; } x; x.l = a << p; x.h = a >> (sizeof(int32_t) * 8 - p); return (int32_t)(x.a / b); #endif } // Perform a fixed point multiplication using a 64-bit intermediate result to // prevent overflow problems. template <int p> inline int32_t fixmul(int32_t a, int32_t b) { return (int32_t)(((int64_t)a * b) >> p); } } // end anonymous namespace //////////////////////////////////////////////////////////////////////////////// Rasterizer::Rasterizer() : fragment_shader_(0) { rendertarget_params_.count = 0; } bool Rasterizer::setup_valid() { return rendertarget_params_.count >= 1 && fragment_shader_ != 0; } void Rasterizer::clear() { for (int i = 0; i < rendertarget_params_.count; ++i) clear(i); } void Rasterizer::clear(int target) { assert(target <= rendertarget_params_.count); rendertargets_[target]->clear(0, 0, rendertarget_params_.minwidth, rendertarget_params_.minheight); // notify shader about clear (might want to update internal data structutes) if (fragment_shader_) fragment_shader_->clear(target, 0, 0, rendertarget_params_.minwidth, rendertarget_params_.minheight); } void Rasterizer::clip_rect(int x, int y, int w, int h) { if (rendertarget_params_.count == 0) return; clip_rect_.x0 = std::max(0, x); clip_rect_.y0 = std::max(0, y); clip_rect_.x1 = std::min(x + w, rendertarget_params_.minwidth); clip_rect_.y1 = std::min(y + h, rendertarget_params_.minheight); } //////////////////////////////////////////////////////////////////////////////// // main interface void Rasterizer::rendertargets(int n, RenderTarget* rt[]) { assert(n <= MAX_RENDER_TARGETS); RenderTargetParams &rtp = rendertarget_params_; rtp.count = n; if (n == 0) return; rtp.minwidth = rt[0]->width(); rtp.minheight = rt[0]->height(); for (int i = 0; i < n; ++i) { rendertargets_ = rt; rtp.minwidth = std::min(rtp.minwidth, rt->width()); rtp.minheight = std::min(rtp.minheight, rt->height()); // cache these to avoid too many virtual function calls later rtp.element_size = rt->element_size(); rtp.stride = rt->stride(); } clip_rect_.x0 = 0; clip_rect_.y0 = 0; clip_rect_.x1 = rtp.minwidth; clip_rect_.y1 = rtp.minheight; } void Rasterizer::fragment_shader(FragmentShader *fs) { assert(fs != 0); fragment_shader_ = fs; } void Rasterizer::draw_triangle(const Vertex &v1, const Vertex &v2, const Vertex &v3) { if (!setup_valid()) return; int64_t zPlane[4]; int64_t wPlane[4]; int64_t vPlane[MAX_VARYING][4]; compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, v1.z, v2.z, v3.z, zPlane); compute_plane(v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, // interpolate 1/w across triangle fixdiv<16>(1 << 16, v1.w), fixdiv<16>(1 << 16, v2.w), fixdiv<16>(1 << 16, v3.w), wPlane); int varying_count = fragment_shader_->varying_count(); for (int i = 0; i < varying_count; ++i) compute_plane( v1.x, v1.y, v2.x, v2.y, v3.x, v3.y, fixdiv<16>(v1.varyings, v1.w), fixdiv<16>(v2.varyings, v2.w), fixdiv<16>(v3.varyings, v3.w), vPlane ); // Deltas const int DX12 = v1.x - v2.x; const int DX23 = v2.x - v3.x; const int DX31 = v3.x - v1.x; const int DY12 = v1.y - v2.y; const int DY23 = v2.y - v3.y; const int DY31 = v3.y - v1.y; // Fixed-point deltas const int FDX12 = DX12 << 4; const int FDX23 = DX23 << 4; const int FDX31 = DX31 << 4; const int FDY12 = DY12 << 4; const int FDY23 = DY23 << 4; const int FDY31 = DY31 << 4; // Bounding rectangle int minx = (min(v1.x, v2.x, v3.x) + 0xF) >> 4; int maxx = (max(v1.x, v2.x, v3.x) + 0xF) >> 4; int miny = (min(v1.y, v2.y, v3.y) + 0xF) >> 4; int maxy = (max(v1.y, v2.y, v3.y) + 0xF) >> 4; // consider clipping rectangle minx = std::max(minx, clip_rect_.x0); maxx = std::min(maxx, clip_rect_.x1); miny = std::max(miny, clip_rect_.y0); maxy = std::min(maxy, clip_rect_.y1); // Start in corner of 8x8 block minx &= ~(BLOCK_SIZE - 1); miny &= ~(BLOCK_SIZE - 1); BufferPointers buffers; for (int i = 0; i < rendertarget_params_.count; ++i) buffers.ptr = (char*)rendertargets_->buffer_pointer() + miny * rendertargets_->stride(); // Half-edge constants int C1 = DY12 * v1.x - DX12 * v1.y; int C2 = DY23 * v2.x - DX23 * v2.y; int C3 = DY31 * v3.x - DX31 * v3.y; // Correct for fill convention if(DY12 < 0 || (DY12 == 0 && DX12 > 0)) C1++; if(DY23 < 0 || (DY23 == 0 && DX23 > 0)) C2++; if(DY31 < 0 || (DY31 == 0 && DX31 > 0)) C3++; // Loop through blocks for(int y = miny; y < maxy; y += BLOCK_SIZE) { for(int x = minx; x < maxx; x += BLOCK_SIZE) { // Corners of block int x0 = x << 4; int x1 = (x + BLOCK_SIZE - 1) << 4; int y0 = y << 4; int y1 = (y + BLOCK_SIZE - 1) << 4; // Evaluate half-space functions bool a00 = C1 + DX12 * y0 - DY12 * x0 > 0; bool a10 = C1 + DX12 * y0 - DY12 * x1 > 0; bool a01 = C1 + DX12 * y1 - DY12 * x0 > 0; bool a11 = C1 + DX12 * y1 - DY12 * x1 > 0; int a = (a00 << 0) | (a10 << 1) | (a01 << 2) | (a11 << 3); bool b00 = C2 + DX23 * y0 - DY23 * x0 > 0; bool b10 = C2 + DX23 * y0 - DY23 * x1 > 0; bool b01 = C2 + DX23 * y1 - DY23 * x0 > 0; bool b11 = C2 + DX23 * y1 - DY23 * x1 > 0; int b = (b00 << 0) | (b10 << 1) | (b01 << 2) | (b11 << 3); bool c00 = C3 + DX31 * y0 - DY31 * x0 > 0; bool c10 = C3 + DX31 * y0 - DY31 * x1 > 0; bool c01 = C3 + DX31 * y1 - DY31 * x0 > 0; bool c11 = C3 + DX31 * y1 - DY31 * x1 > 0; int c = (c00 << 0) | (c10 << 1) | (c01 << 2) | (c11 << 3); // Skip block when outside an edge if(a == 0x0 || b == 0x0 || c == 0x0) continue; #define CLIP_TEST(X, Y) ((X) >= clip_rect_.x0 && (X) < clip_rect_.x1 && (Y) >= clip_rect_.y0 && (Y) < clip_rect_.y1) // test for the clipping rectangle bool clip00 = CLIP_TEST(x, y); bool clip10 = CLIP_TEST(x + 7, y); bool clip01 = CLIP_TEST(x, y + 7); bool clip11 = CLIP_TEST(x + 7, y + 7); // skip block if all is clippled if (!clip00 && !clip10 && !clip01 && !clip11) continue; bool clip_all_in = clip00 && clip10 && clip01 && clip11; //! compute attribute interpolants at corners FragmentData f00; FragmentData f10; FragmentData f01; FragmentData f11; int xx1 = (x + BLOCK_SIZE) << 4; int yy1 = (y + BLOCK_SIZE) << 4; f00.z = solve_plane(x0, y0, zPlane); f10.z = solve_plane(xx1, y0, zPlane); f01.z = solve_plane(x0, yy1, zPlane); f11.z = solve_plane(xx1, yy1, zPlane); if (!fragment_shader_->early_depth_test(x, y, std::min(std::min(std::min(f00.z, f10.z), f01.z), f11.z))) continue; int w00 = fixdiv<16>(1 << 16, solve_plane(x0, y0, wPlane)); int w10 = fixdiv<16>(1 << 16, solve_plane(xx1, y0, wPlane)); int w01 = fixdiv<16>(1 << 16, solve_plane(x0, yy1, wPlane)); int w11 = fixdiv<16>(1 << 16, solve_plane(xx1, yy1, wPlane)); for (int i = 0; i < varying_count; ++i) { f00.varying = fixmul<16>(solve_plane(x0, y0, vPlane), w00); f10.varying = fixmul<16>(solve_plane(xx1, y0, vPlane), w10); f01.varying = fixmul<16>(solve_plane(x0, yy1, vPlane), w01); f11.varying = fixmul<16>(solve_plane(xx1, yy1, vPlane), w11); } //! compute attribute step y left and right struct varying_step_t { struct step_info_t { int step; int rem; int error_term; step_info_t():error_term(0){} int dostep() { int r = step; error_term += rem; if (error_term >= BLOCK_SIZE) { error_term -= BLOCK_SIZE; r++; } return r; } }; step_info_t z; step_info_t varying[MAX_VARYING]; varying_step_t(FragmentData& p1, FragmentData& p2, int vc) { floor_divmod<BLOCK_SIZE>(p2.z - p1.z, z.step, z.rem); for (int i = 0; i < vc; ++i) { floor_divmod<BLOCK_SIZE>(p2.varying - p1.varying, varying.step, varying.rem); } } }; varying_step_t step_left(f00, f01, varying_count); varying_step_t step_right(f10, f11, varying_count); BufferPointers block_buffers = buffers; #define RENDER_TARGET_LOOP for (int i = 0; i < rendertarget_params_.count; ++i) #define STEP_POINTERS_BY_ELEMENTSIZE(VAR, FACTOR) { RENDER_TARGET_LOOP (char*&)VAR.ptr += FACTOR * rendertarget_params_.element_size; } #define STEP_POINTERS_BY_STRIDE(VAR) { RENDER_TARGET_LOOP (char*&)VAR.ptr += rendertarget_params_.stride; } #define STEP_FRAGMENTDATA(FDVAR, STEPVAR) { FDVAR.z += STEPVAR.z.dostep(); for (int i = 0; i < varying_count; ++i) FDVAR.varying += STEPVAR.varying.dostep(); } // only copy the neccessary varyings #define EFFICIENT_COPY(SRC, DST) { DST.z = SRC.z; for (int i = 0; i < varying_count; ++i) DST.varying = SRC.varying; } #define BLOCK_BEGIN fragment_shader_->prepare_for_block(x, y, pixel_block); for (int iy = 0; iy < BLOCK_SIZE; ++iy) { BufferPointers inner = block_buffers; STEP_POINTERS_BY_ELEMENTSIZE(inner, x); for (int ix = 0; ix < BLOCK_SIZE; ++ix) { #define BLOCK_END STEP_POINTERS_BY_ELEMENTSIZE(inner, 1); } STEP_POINTERS_BY_STRIDE(block_buffers); } PixelBlock pixel_block; bool skip_flag[BLOCK_SIZE][BLOCK_SIZE]; memset(skip_flag, 0, sizeof(skip_flag)); if (!clip_all_in) { for (int iy = 0; iy < BLOCK_SIZE; ++iy) for (int ix = 0; ix < BLOCK_SIZE; ++ix) if (!CLIP_TEST(ix + x, iy + y)) skip_flag[iy][ix] = true; } // Accept whole block when totally covered if(a == 0xF && b == 0xF && c == 0xF) { // first compute all fragment data for(int iy = 0; iy < BLOCK_SIZE; iy++) { //! compute attribute step x for this scanline varying_step_t stepx(f00, f10, varying_count); FragmentData fragment_data = f00; for(int ix = 0; ix < BLOCK_SIZE; ix++) { EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]); STEP_FRAGMENTDATA(fragment_data, stepx); } //! step left and right attrib y STEP_FRAGMENTDATA(f00, step_left); STEP_FRAGMENTDATA(f10, step_right); } //! fragment_shader_block (can now use derivatives of attributes) if (clip_all_in) { BLOCK_BEGIN fragment_shader_->shade(inner, pixel_block, ix, iy); BLOCK_END } else { BLOCK_BEGIN if (!skip_flag[iy][ix]) fragment_shader_->shade(inner, pixel_block, ix, iy); BLOCK_END } } else // Partially covered block { int CY1 = C1 + DX12 * y0 - DY12 * x0; int CY2 = C2 + DX23 * y0 - DY23 * x0; int CY3 = C3 + DX31 * y0 - DY31 * x0; for(int iy = 0; iy < BLOCK_SIZE; iy++) { int CX1 = CY1; int CX2 = CY2; int CX3 = CY3; //! compute attribute step x for this scanline varying_step_t stepx(f00, f10, varying_count); FragmentData fragment_data = f00; for(int ix = 0; ix < BLOCK_SIZE; ix++) { if(!(CX1 > 0 && CX2 > 0 && CX3 > 0)) skip_flag[iy][ix] = true; // we still need to do this since the fragment shader might want // to compute the derivative of attibutes EFFICIENT_COPY(fragment_data, pixel_block[iy][ix]); CX1 -= FDY12; CX2 -= FDY23; CX3 -= FDY31; STEP_FRAGMENTDATA(fragment_data, stepx); } CY1 += FDX12; CY2 += FDX23; CY3 += FDX31; //! step left and right attrib y STEP_FRAGMENTDATA(f00, step_left); STEP_FRAGMENTDATA(f10, step_right); } //! fragment_shader_block (can now use derivatives of attributes) BLOCK_BEGIN if (!skip_flag[iy][ix]) fragment_shader_->shade(inner, pixel_block, ix, iy); BLOCK_END } } for (int i = 0; i < rendertarget_params_.count; ++i) (char*&)buffers.ptr += BLOCK_SIZE * rendertargets_->stride(); } } Test program: /* Copyright (c) 2007, Markus Trenkwalder All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "SDL.h" #include "rasterizer.h" #include <cmath> #include <algorithm> class SDL_SurfaceRenderTarget : public Rasterizer::RenderTarget { SDL_Surface *surface; public: SDL_SurfaceRenderTarget(SDL_Surface *s):surface(s) {} virtual int width() { return surface->w; } virtual int height() { return surface->h; } virtual int stride() { return surface->pitch; } virtual int element_size() { return sizeof(int); } virtual void* buffer_pointer() { return surface->pixels; } virtual void clear(int x, int y, int w, int h) {} }; class TestFragmentShader : public Rasterizer::FragmentShader { public: virtual int varying_count() { return 3; } virtual void shade(const Rasterizer::BufferPointers& ptrs, const Rasterizer::PixelBlock& block, int x, int y) { unsigned int* color_buffer = (unsigned int*)ptrs.ptr[0]; // unfortunaltely at the corners of the triangle we can get negative // values for the interpolants -> std::max int r = std::max(0, block[y][x].varying[0]); int g = std::max(0, block[y][x].varying[1]); int b = std::max(0, block[y][x].varying[2]); int color = r << 16 | g << 8 | b; *color_buffer = color; } }; int main(int ac, char *av[]) { SDL_Init(SDL_INIT_VIDEO); SDL_Surface *screen = SDL_SetVideoMode(320, 240, 32, SDL_SWSURFACE); Rasterizer r; SDL_SurfaceRenderTarget color_target(screen); TestFragmentShader fragment_shader; Rasterizer::RenderTarget* rendertargets[] = { &color_target}; r.rendertargets(1, rendertargets); r.fragment_shader(&fragment_shader); r.clip_rect(45, 70, 100, 100); Rasterizer::Vertex v[3]; v[0].x = (int)(120.0f * 16.0f); v[0].y = (int)(50.0f * 16.0f); v[0].z = 0; v[0].w = 1 << 16; v[0].varyings[0] = 255; v[0].varyings[1] = 0; v[0].varyings[2] = 0; v[1].x = (int)(20.0f * 16.0f); v[1].y = (int)(100.0f * 16.0f); v[1].z = 0x7fffffff; v[1].w = 1 << 16; v[1].varyings[0] = 0; v[1].varyings[1] = 255; v[1].varyings[2] = 0; v[2].x = (int)(150.0f * 16.0f); v[2].y = (int)(220.0f * 16.0f); v[2].z = 0x7fffffff >> 1; v[2].w = 1 << 16; v[2].varyings[0] = 0; v[2].varyings[1] = 0; v[2].varyings[2] = 255; SDL_Rect rect; rect.x = 45; rect.y = 70; rect.w = 100; rect.h = 100; SDL_FillRect(screen, &rect, 0xffffffff); r.draw_triangle(v[0], v[1], v[2]); SDL_Flip(screen); SDL_Event e; while (SDL_WaitEvent(&e) && e.type != SDL_QUIT); SDL_Quit(); return 0; } I don't include the stdint.h file. You can get it yourself if needed. OK, i now have a domain www.trenki.net where I uploaded a vastly improved version of the renderer. The above code is obsolete, the features are retained. [Edited by - Trenki on August 22, 2007 5:11:28 AM]
  24. Having a pause from handling files and editor is good, to at least update something in rendering and keep motivation high. So I went ahead and implemented voxel cone tracing global illumination (and reflections of course). Anyways, image time: Although quite dark, secondary shadows are visible. Note, global illumination is full integrated into the editor. Reflective box, global illumination debug buffer (in Debug window), and color bleeding visible from spotlight. Anyways so much for the show - so how is it done? In short: Scene is voxelized, during which phase lights and shadows are injected in. Reflection pass performs cone tracing, the cone angle is defined based on material properties GI pass performs cone tracing for global illumination Lighting pass has 1 fullscreen quad for indirect light (and reflections), and then 1 for each light (which I'd like to replace with tile-based method) Resolution for reflection and GI pass can be any (therefore even sub-sampling can be done), actually in the images Scene is voxelized into 512x512x512 buffer, Reflection & GI pass are done at FullHD with 1x MSAA, and Lighting pass is done with 8x MSAA. Original G-Buffer generation is done at 8x MSAA. Everything is resolved later (actually even after the tone mapping pass). I have an option to switch from voxel texture into sparse voxel octree, yet it is still heavily un-optimized (and slower), although having a lot smaller memory footprint. When I manage to find some more time for that, I'd like to switch over to sparse voxel octree only. If possible, I'd like to re-visit resource management and dynamic re-loading, which would be a bit less 'showcase' and more 'coding' topic. Other than that, virtual shadow maps and virtual textures are going to be visit and attempted by me, hopefully in next weeks. Side note: If you're trying to implement VXGI or voxelization on the GPU, and having some questions - I'll gladly answer them. That should be it for today, thanks for reading!
  25. I hope this is in the right forum. I'm new to this community, but I've been programming engines for many years as a hobby. I'm now writing a simple Direct3D 12 game engine, with audio and multithreading support, for Windows (GNU GPL license). I know there are several ready-to-use engines out there, but my goal is not to compete with Unity or any of the others. I mainly want to explore the issues involved with creating these engines, and that's why I'm posting about it here. I also want to help beginners, so this can be used as a learning tool, everyone feel free to copy code (I'll be posting it on Sourceforge, and put sample output on YouTube). Also, if anyone asks me to add a feature to the engine, I will try to implement it. Finally, I want to become familiar with the terminology and industry practices involved, since I am self-taught. Most of the engine is already planned out, but I want to write some code before divulging the plan to show that I am capable of creating this. I'm going to finish this post and write some initialization code. If anyone has any comments or ideas, please reply, otherwise I'll be back with a YouTube video. --237
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!