• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By racarate
      Hey everybody!
      I am trying to replicate all these cool on-screen debug visuals I see in all the SIGGRAPH and GDC talks, but I really don't know where to start.  The only resource I know of is almost 16 years old:
      http://number-none.com/product/Interactive Profiling, Part 1/index.html
      Does anybody have a more up-to-date reference?  Do people use minimal UI libraries like Dear ImgGui?  Also, If I am profiling OpenGL ES 3.0 (which doesn't have timer queries) is there really anything I can do to measure performance GPU-wise?  Or should I just chart CPU-side frame time?  I feel like this is something people re-invent for every game there has gotta be a tutorial out there... right?
       
       
    • By Achivai
      Hey, I am semi-new to 3d-programming and I've hit a snag. I have one object, let's call it Object A. This object has a long int array of 3d xyz-positions stored in it's vbo as an instanced attribute. I am using these numbers to instance object A a couple of thousand times. So far so good. 
      Now I've hit a point where I want to remove one of these instances of object A while the game is running, but I'm not quite sure how to go about it. At first my thought was to update the instanced attribute of Object A and change the positions to some dummy number that I could catch in the vertex shader and then decide there whether to draw the instance of Object A or not, but I think that would be expensive to do while the game is running, considering that it might have to be done several times every frame in some cases. 
      I'm not sure how to proceed, anyone have any tips?
    • By fleissi
      Hey guys!

      I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
      The full source code is hosted on github:
      https://github.com/fleissna/flyEngine

      I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
      I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

      I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

      Features I have so far:
      - Dynamic shader generation based on material properties
      - Dynamic sorting of meshes to be renderd based on shader and material
      - Rendering large amounts of static meshes
      - Hierarchical culling (detail + view frustum)
      - Limited support for dynamic (i.e. moving) meshes
      - Normal, Parallax and Relief Mapping implementations
      - Wind animations based on vertex displacement
      - A very basic integration of the Bullet physics engine
      - Procedural Grass generation
      - Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
      - Caching mechanisms for textures, shaders, materials and meshes

      Features I would like to have:
      - Global illumination methods
      - Scalable physics
      - Occlusion culling
      - A nice procedural terrain generator
      - Scripting
      - Level Editing
      - Sound system
      - Optimization techniques

      Books I have so far:
      - Real-Time Rendering Third Edition
      - 3D Game Programming with DirectX 11
      - Vulkan Cookbook (not started yet)

      I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
      There are some videos on youtube that demonstrate some of the features:
      Procedural grass on the GPU
      Procedural Terrain Engine
      Quadtree detail and view frustum culling

      The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.

      Bye,

      Phil
    • By tj8146
      I have attached my project in a .zip file if you wish to run it for yourself.
      I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this? 
       
      Errors when I build my code:
      1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========  
       
      Renderer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
      #pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
      #pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
      #pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }  
      Game.zip
    • By lxjk
      Hi guys,
      There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
      Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
      On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
      This method can be naturally extended to clustered light culling as well.
      The following image shows the general ideas

       
      Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test
       

       
      I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!
       
      Eric
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL Question about per-frame resources in Vulkan/DX12

Recommended Posts

Hi,

In older APIs (OpenGL, DX11 etc) you have been able to access textures, buffers, render targets etc pretty much like you access CPU resources. You bind a buffer, draw using it, then update the buffer with new data, draw that data etc. It has all just worked.

In new low-level APIs such as Vulkan or DX12 you no longer have this luxury, but instead you have to take into account the fact that the GPU will be using the buffer long after you have called "draw" followed by "submit to queue".

Most Vulkan texts I have read suggests having resources created for three frames in a ring buffer, ie. you have three sets of command pools, command buffers and frame buffers + any semaphores and/or fences you need to sync graphics and present queues etc. AFAIK it works the same in DX12. With this system you can continue with rendering the next frame immediately and only have to wait if the GPU cannot keep up and you have rendered all three of the frames.

My question is, since there are obviously many more resources you need to keep around "per-frame", how do you structure your code? Do you simply allocate three of everything that might get written to during a frame and then pass around the current frame index whenever you need to access one of those resources? Is there a more elegant way to handle these resources? Also, where do you draw the line of what you need several of? Eg. a vertex buffer that gets updated every frame obviously needs its own instance per frame. But what about a texture that is read from a file into device local memory? Sounds to me like you only need one of those, but is there some case where you several?

Is there some grand design I am missing?

Thanks!

Share this post


Link to post
Share on other sites
Advertisement

Also, where do you draw the line of what you need several of? Eg. a vertex buffer that gets updated every frame obviously needs its own instance per frame. But what about a texture that is read from a file into device local memory? Sounds to me like you only need one of those, but is there some case where you several?
In older API's you've still made these decisions by which flags/hints you pass to the API - e.g. old GL had GL_STATIC_DRAW, GL_STREAM_DRAW and GL_DYNAMIC_DRAW, D3D11 has D3D11_USAGE_IMMUTABLE, D3D11_USAGE_DYNAMIC.

IMHO you should force your users to declare at creation time how they will be using the resource. Will it be immutable? Will they be updating it from the CPU once per frame? Will they be updating it from the CPU many times per frame? Will they be updating it from the CPU once per many frames? Will they be reading data back to the CPU from the resource?

3x isn't always the limit. If you write to a constant buffer 100 times per frame, then you need 300x it's size in storage capacity!

Also, for the vertex streaming case -- a buffer that gets updated every frame. On old APIs you can just update it every frame and let the driver work things out for you... but that doesn't mean that you should. It's common on older APIs (even D3D9) to implement vertex streaming via a buffer that is 3x bigger than the per-frame capacity and streaming data into it (e.g. with MAP_NOOVERWRITE in D3D). If the game already doing stuff like this for dynamic resources, then it will port to Vulkan/D3D12 just fine :D

See this old talk for old APIs, which is still super relevant: http://gamedevs.org/uploads/efficient-buffer-management.pdf

Share this post


Link to post
Share on other sites

I'm just starting to finish up a simple Vulkan wrapper library for some hobby projects, so anything I say should be taken with a grain of salt.  Also I've only played with Vulkan and not DX12...

I've focused my fine-grained synchronization around VkFence.  I have a VkFence wrapper that has a function ExecuteOnReset().  I can pass any function object, and when I reset the fence, all the stored functions get executed.  When I have any resources that need to be released/recycled at a later time (when they are no longer in use) I simply add the cleanup function to their associated fence.  At some point in the future I will have to check/wait on that fence, when the fence is signaled I then present the associated swap buffer image and reset the fence, which causes all the associated cleanup functions to execute.

Its surprisingly simple and efficient, and handles nearly 95% of all synchronization.  I tried a couple other methods, and found this was by far the easiest to both implement and use.  It was really one of those 'ah-ha' moments.  All the other attempts at making a full blown all bells-whistles resource manager either was very complex, inefficient, or awkward; and I found that no matter what I did I was always passing around VkFence's to synchronize on anyways.  So I eventually just decided to stick it all in the fence and be done with it.

I also cache/re-use command pools.  So my Device class allows manual creation/destruction of pools, but also allows you to pull/return pools from a cache so I'm not constantly re-creating them every frame.  Coupled with the above Fence class drawing is usually as simple as: request a command pool, create command buffers, fill buffers, submit buffers, pass pool to fence to be recycled.  If I want to store/reuse the command buffers for later that's trivial as well.  I know online a lot of people are talking about creating the command buffers once, then using draw indirect.  I have a hard time believing this will be a better option; but I could be wrong and have no data to go on.  I'd love to see a comparison of the two styles: dynamic/resuse command buffers vs static buffers with dynamic draw data manually uploaded with a proper benchmark.

The problem I find with fixing command pools or resources ahead of time is that you really don't know what/how many you'll need before hand.  If you're managing each thread 'by hand' it can probably work (ie. I need 3 command pools for each thread to rotate on, one thread for physics, one thread of fore-ground objects, one for UI, etc...), but I'd rather just throw everything at a thread pool and let things work themselves out.  On top of that sometimes you want to re-use command pools, other times you want to recycle them.  I found it quickly became impractical to manage.  So the cache system works great.  Any thread can pull from the cache, any thread can recycle to the cache.  I can just toss all the rendering jobs at a thread/job pool without any pre-planning or thought, the command pools are recycled using the Fence's, the command buffers returned via Future's from the threads.  Its stupidly simple to use and implement and I like that.

As far as updating dynamic data (apart from using push constants whenever possible) for the vast majority of buffer updates (matrices, shader constats, etc...) I'm using vkCmdUpdateBuffer(); this means I only need to allocate the buffer/memory once and can re-use it each frame (no buffer rotations necessary, but you do need pipeline barriers).  For the rather rare cases where I actually need to dynamically upload data each frame and I can't use push constants or vkCmdUpdateBuffer() I'm writing two dynamic memory allocators.  The first is a very simple slab/queue allocator designed to handle situations where allocations/free's occur in order.  The second is a buddy allocator for situations where allocations/free's happen randomly.

I'm not claiming that what I've done is optimal, just thought I'd throw it up for discussion/idea purposes.  I'm interested as well to see what others have done/are planning to do.

Share this post


Link to post
Share on other sites

Thanks guys! I have now implemented a resource pooling system where resources are returned to the pool on fence release, as per Ryan_001's suggestion. Works great!

The modern low-level APIs are great in the sense that they make you a better programmer whether you want it or not. I ported my old GUI rendering system which I originally wrote for DX11 4 years ago to Vulkan, and I now realize how many hoops the driver has had to jump through to get my GUI on the screen.

Share this post


Link to post
Share on other sites

Where possible, I try to do all my resource tracking with a single fence per frame. I tag items with the frame number on which they were last submitted to the GPU on, and then use a single per-frame fence to count which frame the GPU has most recently completed. This scales really well as you can track any number of resources with one fence.

I do use more fine-grained fences for operations where you may want to be more aggressive about recovering unused memory quicker, or things that don't complete on a per-frame timeframe, such as the upload queue.

The modern low-level APIs are great in the sense that they make you a better programmer whether you want it or not. I ported my old GUI rendering system which I originally wrote for DX11 4 years ago to Vulkan, and I now realize how many hoops the driver has had to jump through to get my GUI on the screen.
Yeah I learned so much from having to do graphics programming on consoles, which have always had these low-level API's, and because they're all secretive and NDA'ed it's created a divide in the graphics programming community. It's great for the PC to finally have low-level APIs available to everyone so they can learn this stuff :D 

Share this post


Link to post
Share on other sites

My question is, since there are obviously many more resources you need to keep around "per-frame", how do you structure your code? Do you simply allocate three of everything that might get written to during a frame and then pass around the current frame index whenever you need to access one of those resources? Is there a more elegant way to handle these resources? Also, where do you draw the line of what you need several of? Eg. a vertex buffer that gets updated every frame obviously needs its own instance per frame. But what about a texture that is read from a file into device local memory? Sounds to me like you only need one of those, but is there some case where you several?
Is there some grand design I am missing?
Thanks!

First, like Hodgman said, you don't need 3 of everything. Only of the resources you would consider "dynamic".
Also "static" resources you want them to be GPU-only accessible, so that they always get allocated in the fastest memory (GPU device memory); while dynamic resources need obviously CPU access.

Second, you don't need 3x number of resources and handles. Most of the things you'll be dealing with are going to be just buffers in memory.
This means all you need to do is reserve 3x memory size; and then have a starting offset:
 

currentOffset = baseOffset + (currentFrame % 3) * bufferSize;

That's it. The "grand design of things" is having an extra variable to store the current offset.
There is one design issue you need to be careful: you can only write to that buffer once per frame. However you can violate that rule if you know what you're doing by "regressing" the currentOffset to a range you know its not in use (in GL terms this is the equivalent of doing GL_MAP_UNSYNCHRONIZED_BIT|GL_MAP_INVALIDATE_RANGE_BIT and in D3D11 of doing a map with D3D11_MAP_WRITE_NO_OVERWRITE).

In design terms this means you need to delay writing to the buffers as much as possible until you have everything you need, because "writing as you go" is a terrible approach as you may end up advancing the currentOffset too early (i.e. thinking that you're done when you're not), and now you don't know how to go regress currentOffset to where it was before; so you need to grab a new buffer (which is also 3x size; so you end wasting memory).

 

If you're familiar with the concept of render queues, then this should be natural; as all you need is for Render Queues to collect everything and once you're done; start rendering what's in those queues.

 

Last but not least, there are cases where you want to do something as an exception; in which cases you may want to implement a "fullStall()" which waits for everything to finish. It's slow, it's not pretty; but it's great for debugging problems and for saving you in a pinch.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

Thanks guys, this is all really good stuff. Currently I am still working on wrapping the APIs, (DX11, DX12 and Vulkan) under a common interface. DX11 and Vulkan are now both rendering my GUI and the next piece of work is to get DX12 to that point. My plan is to rewrite large parts of the high-level renderer to make better use of the GPU, but leave other parts as-is for now, eg. the GUI and debug rendering. It would be nice to go the route of allocating larger buffers and offsetting based on the frame, but for now I am using a pool, ala Ryan_001's suggestion, where I can acquire temporary buffers and command buffers. The buffers as still as small as they used to be, there are just more of them. This is probably not the most performant way, but it gets the job done.

Regarding the "full stall" I actually had to implement something like that already for shutdown (ie. you want to wait until all GPU work is done before destroying resources) and for swap chain recreations. In Vulkan this is easy, you can just do:

void RenderDeviceVulkan::waitUntilDeviceIdle()
{
    vkDeviceWaitIdle(mDevice);
}

However, I am a little confused about how to do that on DX12. This is what I have come up with but it has not been tested yet. What do you think?

void RenderDevice12::waitUntilDeviceIdle()
{
    mCommandQueue->Signal(mFullStallFence.Get(), ++mFullStallFenceValue);

    if(mFullStallFence->GetCompletedValue() < mFullStallFenceValue)
    {
        HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);
        mFullStallFence->SetEventOnCompletion(mFullStallFenceValue, eventHandle);
        WaitForSingleObject(eventHandle, INFINITE);
        CloseHandle(eventHandle);
    }
}

That would obviously only stall the one queue, but I think that might be enough for now. Is there an easier way to wait until the GPU has finished all work on DX12?

Cheers!

Share this post


Link to post
Share on other sites
That's pretty much what I do when shutting down a queue:
	//TODO - look into this
	u64 frameCount = m_frameCount + 1;
	m_mainQueue->Signal(m_mainFence, frameCount);
	{
		YieldThreadUntil([this, frameCount](){ return (s64)m_mainFence->GetCompletedValue() >= ((s64)frameCount); });
	}

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Advertisement