• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By fleissi
      Hey guys!

      I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
      The full source code is hosted on github:

      I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
      I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

      I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

      Features I have so far:
      - Dynamic shader generation based on material properties
      - Dynamic sorting of meshes to be renderd based on shader and material
      - Rendering large amounts of static meshes
      - Hierarchical culling (detail + view frustum)
      - Limited support for dynamic (i.e. moving) meshes
      - Normal, Parallax and Relief Mapping implementations
      - Wind animations based on vertex displacement
      - A very basic integration of the Bullet physics engine
      - Procedural Grass generation
      - Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
      - Caching mechanisms for textures, shaders, materials and meshes

      Features I would like to have:
      - Global illumination methods
      - Scalable physics
      - Occlusion culling
      - A nice procedural terrain generator
      - Scripting
      - Level Editing
      - Sound system
      - Optimization techniques

      Books I have so far:
      - Real-Time Rendering Third Edition
      - 3D Game Programming with DirectX 11
      - Vulkan Cookbook (not started yet)

      I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
      There are some videos on youtube that demonstrate some of the features:
      Procedural grass on the GPU
      Procedural Terrain Engine
      Quadtree detail and view frustum culling

      The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.


    • By tj8146
      I have attached my project in a .zip file if you wish to run it for yourself.
      I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this? 
      Errors when I build my code:
      1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========  
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
      #pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
      #pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
      #pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }  
    • By lxjk
      Hi guys,
      There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
      Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
      On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
      This method can be naturally extended to clustered light culling as well.
      The following image shows the general ideas

      Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test

      I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!
    • By Fadey Duh
      Good evening everyone!

      I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
      Basically I'm trying to find more compatible version of it.

      Thank you!
    • By Jens Eckervogt
      Hello guys, 
      Please tell me! 
      How do I know? Why does wavefront not show for me?
      I already checked I have non errors yet.
      using OpenTK; using System.Collections.Generic; using System.IO; using System.Text; namespace Tutorial_08.net.sourceskyboxer { public class WaveFrontLoader { private static List<Vector3> inPositions; private static List<Vector2> inTexcoords; private static List<Vector3> inNormals; private static List<float> positions; private static List<float> texcoords; private static List<int> indices; public static RawModel LoadObjModel(string filename, Loader loader) { inPositions = new List<Vector3>(); inTexcoords = new List<Vector2>(); inNormals = new List<Vector3>(); positions = new List<float>(); texcoords = new List<float>(); indices = new List<int>(); int nextIdx = 0; using (var reader = new StreamReader(File.Open("Contents/" + filename + ".obj", FileMode.Open), Encoding.UTF8)) { string line = reader.ReadLine(); int i = reader.Read(); while (true) { string[] currentLine = line.Split(); if (currentLine[0] == "v") { Vector3 pos = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inPositions.Add(pos); if (currentLine[1] == "t") { Vector2 tex = new Vector2(float.Parse(currentLine[1]), float.Parse(currentLine[2])); inTexcoords.Add(tex); } if (currentLine[1] == "n") { Vector3 nom = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inNormals.Add(nom); } } if (currentLine[0] == "f") { Vector3 pos = inPositions[0]; positions.Add(pos.X); positions.Add(pos.Y); positions.Add(pos.Z); Vector2 tc = inTexcoords[0]; texcoords.Add(tc.X); texcoords.Add(tc.Y); indices.Add(nextIdx); ++nextIdx; } reader.Close(); return loader.loadToVAO(positions.ToArray(), texcoords.ToArray(), indices.ToArray()); } } } } } And It have tried other method but it can't show for me.  I am mad now. Because any OpenTK developers won't help me.
      Please help me how do I fix.

      And my download (mega.nz) should it is original but I tried no success...
      - Add blend source and png file here I have tried tried,.....  
      PS: Why is our community not active? I wait very longer. Stop to lie me!
      Thanks !
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL Fast Way To Determine If All Pixels In Opengl Depth Buffer Were Drawn At Least Once?

This topic is 557 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, I am programming a FPS game and I simply want to make it faster. I tried a lot of things, from which few of them worked. My testing map has 15k vertexes and 26k triangles, and I am using partitioning space by X*Y*Z orthogonal cubes. Thats fine cause it works. Next thing that helped a lot was to order partitions by metric from partition where I am and display them from nearest, which caused OpenGl to not overdraw it so much. Also I use face culling and my own per-triangle frustum clipping, and also overdraw check that makes it sure that same triangle is rendered only once. Also before loading a map, the triangle lists of each partition are ordered by texture to minimalize need of switching glEnd and glBegin, which caused big slowdown. Also, I tried arranging triangles into triangle strips, but they suck arse, and also I tried using VBOs instead glBegin and glEnd, but it didnt helped that much as internet promised. My next, yet undone idea is to compute all this stuff just when the player position and rotation changes, and if not just render it as before without any computations.


Anyway, in every tic I count number of triangles being actually drawn. As a testing map I use Hell Gate from Quake III (that one with crazy mouth in a room, if somebody knows that one :D ) loaded from exported .obj file. When I am on the end of map and looking outside (in the mouth), 36 triangles are being drawn, which seems fair to me. However, turning by 180 degrees causes me to look INSIDE the map and my frustum to contain nearly all partitions, and 22k triangles from 26k are displayed. Now I get to my idea - display few partitions, then CHECK IF ALL PIXELS WERE DISPLAYED, if not, display some more partitions, and so on. That could make it really fast, cause it would cut of everything excpet the first room. Problém is that extracting depth buffer and checking all the 1920x1080 of that little guys is so slow that it would be contraproductive (proven by try).


So my question is - is there actually a FAST way how to check if all pixels are rendered at least once? (= if the depth value is not 127 anywhere) I did like 3-hours research which didnt found answer. Also, if people here will say "no" it will encourage me in writing my own rasterization (at least I will have totally full control).

Share this post

Link to post
Share on other sites

There's a hardware feature called 'occlusion queries', which do exactly what you're looking for -- determine a yes/no answer to whether something was drawn or not. To find out if there's "holes" in the depth buffer, you can draw a quad that's very far away using an occlusion query, and check if the result is "yes - the quad was visible".

Now I get to my idea - display few partitions, then CHECK IF ALL PIXELS WERE DISPLAYED, if not, display some more partitions, and so on. That could make it really fast, cause it would cut of everything excpet the first room. Problém is that extracting depth buffer and checking all the 1920x1080 of that little guys is so slow that it would be contraproductive (proven by try).

 A bigger problem is that the CPU and GPU have a very large latency between them. When you call any glDraw function, the driver is actually writing a command packet into a queue (like networking!), and the GPU might not execute that command until, say, 30ms later. This is perfectly fine in most situations, as the CPU and GPU form a pipeline with huge throughput, but long latency.
e.g. a healthy timeline looks like:

CPU: | Frame 1 | Frame 2 | Frame 3 | ...
GPU: | wait    | Frame 1 | Frame 2 | ...

If you ever try to read GPU data back to the CPU during a frame -- e.g. you split your frame into two parts (A/B) with a read-back operation in between them, you end up with a timeline like this:

CPU: | Frame1.A | wait      |Copy| Frame1.B | Frame2.A | wait      |Copy| Frame2.B | Frame3.A | ...
GPU: | wait     | Frame 1.A |Copy| wait     | Frame1.B | Frame 2.A |Copy| wait     | Frame2.B | ...

Now, both the CPU and GPU spend roughly half of the time idle, waiting on the other processor.
If you're going to read back GPU data, you need to wait at least one frame before requesting the results, to avoid causing a pipeline bubble :(
That means that reading back GPU data to use in CPU-driven occlusion culling is a dead-end for performance.

Edited by Hodgman

Share this post

Link to post
Share on other sites

Writing your own rasterizer isn't really going to solve your problem, since you won't be utilizing your GPU at all (or if you use compute, not as efficiently as you could be). Just leave that stuff to the GPU guys, they know what they're doing. :)


Anyway, do you really need such precise culling? I mean, are you absolutely sure you're GPU bound? Going into such detail just to cull a few triangles might not be worth it, and could hurt your performance rather than help if you're actually CPU bound since modern GPUs prefer to eat big chunks of data more than they like to issue a draw call for each individual triangle. If you have bounding box culling on your objects, and frustum culling, then I think that's all you'll really need unless you're writing a big AAA title with a very high scene complexity.


Just bear in mind that Quake levels were built for some different hardware constraints, so you should probably break up the obj model you have into small sections to avoid processing the entire mesh in one chunk so that you can leverage those two culling systems a little more.


That said, if you really want to have some proper occlusion culling for triangles, you can either check out the Frostbite approach (it's quite complicated iirc), or try implementing a simple Hi-Z culling system using Geometry Shaders (build a simple quad-tree out of your zbuffer and do quad-based culling on each triangle using the geometry shader). The later is simpler to implement and I've had pretty good results with it.

Edited by Styves

Share this post

Link to post
Share on other sites

So I tried occlusion culling. Principially it works, but guess what. :D


It made it slower. Initially I got framerate 42. When I try to do test of gl_samples count every 20 partitions = 38 fps. Every 10 partitions = 32 fps :(

Making list of displayed triangles in previous frame helped, when I dont move - I get 41, but when I move, it for sure drops on 32.

And yes I think it is worth, cause I see like 5k triangles max. When I have to pass 15k triangles to OpenGl more, it is very noobish performance leak.

Anyway thank you guys.

Share this post

Link to post
Share on other sites

"Thousands of triangles" should not be something that alters the framerate so dramatically. A modern game can draw hundreds of thousands of triangles at 1000fps -- which is one of the reasons that VBO's replaced begin/end (same number of gl function calls required for any number of triangles).


You need to profile your game to find out where the time is being spent. Set up a class that records the high-frequency timer at two points in time, subtracts the difference, and logs the result, and then put instances of this class in any function that you think might be a performance hog.

It's common to do this with a constructor/destructor:

struct ProfileLogger { ProfileLogger(const char* name) { PushProfileScope(name); } ~ProfileLogger() { PopProfileScope(); } };
#define Profile(name) ProfileLogger _profile_(name);
void Test()
  Profile("Test");// calls PushProfileScope("Test") here
}// calls PopProfileScope("Test") here

From this data you can get a hierarchical breakdown of where all your CPU time is spent per frame. Trying to optimize without this data is just shooting in the dark.


From the sounds of it, your game is almost certainly CPU-bound, so you can start here. Later on though, you can use gl timer queries to do the same thing on the GPU side -- wrapping parts of the scene in two timer queries to find out how long it took the GPU to process those commands.

Share this post

Link to post
Share on other sites

Displaying is the actual bottleneck, cause I have done this stuff before. What I didnt knew was that on testing computer there was Nvidia set to best antialiasing and texture filtering, so I turned them off and now it runs stably on 60fps without any visible change to worse :)


... anyway, if the occlusion culling queries are so slow, what is their point then? Will for example GL_ANY_SAMPLES_PASSED_CONSERVATIVE speed it up? Another idea is lowering viewport resolution when doing queries and then setting it back to normal ...

Share this post

Link to post
Share on other sites

The problem with this use of occlusion queries is the pipeline bubble caused by reading back results on the same frame. Even if the query itself is free, this bubble will halve your framerate.


Reading back an occlusion query is fine if you wait one frame before requesting the result, as this won't disrupt the pipeline. This is useful for things like lens flares or special effects where you don't care about the data being one frame old, but is dangerous for deciding what parts of the scene to draw :(

Generally they're a pretty useless API feature...


In a modern engine though, you can move all your culling and "what to draw" code off the CPU and into a compute shader. You can then issue draw-indirect commands to say "you will be drawing something, but I don't know which triangles, yet. The number and offset will be present in this buffer later (which is filled in by the compute shader).


For something like a quake3 level though, you should be able to have a constant draw cost regardless of how many triangles are visible. The level triangle data is static, so put it in a VBO once and never update it again.

Share this post

Link to post
Share on other sites

That Hi-Z culling looks promising, I will leave it as a backup idea for optimization. I dont doubt that lighting model will fuck up the framerate significantly when I will do it, so there for sure will be need for any optimizing stuff that works. But since I have already 60fps now and people around me demand mainly the basic functionality ("How its going with a game?" "I made it faster" "And can I play it?" "Not yet"), I must move on networking and multiplayer ASAP.


By the way I dont think that constant draw cost is good. I could measure time of the loop and if some time will remain, then do some filler work like pre-loading chunk of next map to second map buffer or something :)

Share this post

Link to post
Share on other sites

Since you're using a Quake 3 map, the classic Quake method of solving this was to precompute a potentially visible set (or PVS) using an offline pre-processor, then do checks against that PVS at runtime to determine what should (or should not) be drawn.


In (very basic) outline, you divide your map into what I'll call "areas"; these could be nodes/leafs in a BSP tree (which was what Quake used), rooms, cubes in a grid, whatever.  Then for each such area, you use some brute-force method to determine what other areas are potentially visible from it (I believe that the "potentially" part is on account of some coarseness in the algorithm, as well as the fact that this stage ignores frustum culling which is still done at runtime).  Store out the result in some fast and compact data format (Quake used a bitfield array).  Then at runtime you're just looking up those stored results, draw calls, overdraw, etc all go down, the map runs faster, and everbody is happy.


The downside is that the pre-processing can take time, needs to be re-run even if you make trivial changes to your map, and needs a custom map format to store the data.  And while we're on the subject of formats, .obj is a horrible, horrible, horrible, horrible, horrible format to use for game maps.  The only reason to use it is if you really love writing text parsers.  The ideal format is where you memory-map a file, read some headers to set up some sizes, then glBufferData the rest.  Simple, quick to load, no faffing about.  And while we're on the subject of glBufferData, if your observation is that VBOs are slower than glBegin/glEnd, then you're using them wrong: probably by writing a glBegin/glEnd-alike wrapper around the VBO API.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement