• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Achivai
      Hey, I am semi-new to 3d-programming and I've hit a snag. I have one object, let's call it Object A. This object has a long int array of 3d xyz-positions stored in it's vbo as an instanced attribute. I am using these numbers to instance object A a couple of thousand times. So far so good. 
      Now I've hit a point where I want to remove one of these instances of object A while the game is running, but I'm not quite sure how to go about it. At first my thought was to update the instanced attribute of Object A and change the positions to some dummy number that I could catch in the vertex shader and then decide there whether to draw the instance of Object A or not, but I think that would be expensive to do while the game is running, considering that it might have to be done several times every frame in some cases. 
      I'm not sure how to proceed, anyone have any tips?
    • By fleissi
      Hey guys!

      I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
      The full source code is hosted on github:

      I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
      I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

      I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

      Features I have so far:
      - Dynamic shader generation based on material properties
      - Dynamic sorting of meshes to be renderd based on shader and material
      - Rendering large amounts of static meshes
      - Hierarchical culling (detail + view frustum)
      - Limited support for dynamic (i.e. moving) meshes
      - Normal, Parallax and Relief Mapping implementations
      - Wind animations based on vertex displacement
      - A very basic integration of the Bullet physics engine
      - Procedural Grass generation
      - Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
      - Caching mechanisms for textures, shaders, materials and meshes

      Features I would like to have:
      - Global illumination methods
      - Scalable physics
      - Occlusion culling
      - A nice procedural terrain generator
      - Scripting
      - Level Editing
      - Sound system
      - Optimization techniques

      Books I have so far:
      - Real-Time Rendering Third Edition
      - 3D Game Programming with DirectX 11
      - Vulkan Cookbook (not started yet)

      I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
      There are some videos on youtube that demonstrate some of the features:
      Procedural grass on the GPU
      Procedural Terrain Engine
      Quadtree detail and view frustum culling

      The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.


    • By tj8146
      I have attached my project in a .zip file if you wish to run it for yourself.
      I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this? 
      Errors when I build my code:
      1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========  
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
      #pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
      #pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
      #pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }  
    • By lxjk
      Hi guys,
      There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
      Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
      On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
      This method can be naturally extended to clustered light culling as well.
      The following image shows the general ideas

      Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test

      I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!
    • By Fadey Duh
      Good evening everyone!

      I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
      Basically I'm trying to find more compatible version of it.

      Thank you!
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL Extreme long compile times and bad performance on Nvidia

This topic is 429 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

...continuing from this older thread https://www.gamedev.net/topic/686395-simple-shader-causes-nvidia-driver-to-hang/
after days of debugging i found out: The driver does not hang, it just takes a very long time to return from calls to vkCreateComputePipelines()
(about 5 minutes for a simple shader like in the code snippet :( ).

I'm using a GTX 670.

It takes 3 hours to compile all shaders of my project.
Few shaders compile in seconds like they should, but almost all need minutes.

It is strange that changing a number in the code snippet makes the problem go away - do you think there could be some limit for practical buffer sizes?
I use 12 buffers for a total of 170MB, but i could split to more smaller buffers. (All storage buffers, all compute shaders)

Also the performance seems not right:

FuryX: 1.4 ms
7950: 2.3 ms

GTX 670: 30ms (!)

I did expect NV to perform much worse than AMD, but a factor more than 10 seems at least twice too much.
However there is no bad spot showing up in the profiler - relative runtimes match those i see on AMD, there's just a constant scale.

Anyone had similar issues?

I'll try my OpenCL implementation to see if it runs faster...


OpenCL performance:

FuryX: 2.2 ms
GTX 670: 12.5 ms

This looks familiar to me and makes sense.
Reminds me to OpenGL compute shaders where OpenCL was two times faster than OpenGL on Nvidia.

Why do i always need to learn this the hard way? Argh!
Seems i'll have to go to DX12 hoping they care a bit more there.


#version 450

layout (local_size_x = 64) in;

layout (std430, binding = 0) buffer bS_V4 { vec4 _G_sample_V4[]; };
layout (std430, binding = 1) buffer bS_SI { uint _G_sample_indices[]; };
layout (std430, binding = 2) buffer bLL { uint _G_lists[]; };
layout (std430, binding = 3) buffer bDBG { uint _G_dbg[]; };

void main ()
	uint lID = gl_LocalInvocationID.x;

	uint listIndex = gl_WorkGroupID.x * 64 + lID;
	if (listIndex < 100)
		_G_dbg[8] = _G_lists[786522]; // changing to a smaller number like 78652 would compile fast
Edited by JoeJ

Share this post

Link to post
Share on other sites

Sounds like you ought to make a test case and send it to NVidia. I'm sure this is the sort of thing that someone on their team would want to know about and to fix. Whether you can get the information to the relevant person is another matter, posting here would be my first port of call: https://devtalk.nvidia.com/default/board/166/vulkan/.


I don't have much experience with the compute side of things, but I wonder whether the SPIR-V of the shader looks as trivial as the GLSL. Maybe there's some clues in there.

Share this post

Link to post
Share on other sites
Thanks. I'll do that but first i'll test with a 1070 and run my shader in a project of some other person to ensure it's nothing on my side.
I've already wasted so much time on this, some hours more won't matter :)

Share this post

Link to post
Share on other sites
I loaded my shader into one of Sascha Willems examples.
First time it took 72 sec. to compile the simple shader.
Second time it took only one sec, bacause Sascha uses the pipeline cache but i'm not.

Going back to my own project, the shader has a different filename and date but the NV driver recoginzed it is the same shader, took it from cache so it also took only one sec.
(Notice they do this although i'm still not using pipeline cache).

Then after changing a number in the shader it takes 72 sec. again in my project.

I don't know if it's possible to ship pipeline cache results with a game so the user does not need to wait 3 hours, but at least it's a great improvement :)
Unfortunately lot's of my shaders have a mutation per tree level so even with the pipeline cache i have to wait up to half an hour to see the effect of a single changed shader.

Share this post

Link to post
Share on other sites
Very interesting: For the first 10 frames i get good VK performance of 10 ms with the GTX 670! - did not notice yesterday.

I assume the 670 goes to some kind of power saving mode after 10 frames (it's an abrupt change from 10 to 30 ms).
Most time of my frame is spent on GPU<->CPU data transfer, causing 1 FPS, so the 670 might think it's time to rest.
I'll see when i'm done and transfer is not necessary anymore.

10ms seems right. Some years back AMD 280X was twice as fast than Kepler Titan and IIRC 4-5 times faster than GTX 670 in compute.
I still wonder why there's such a huge difference (670 and 5970 have similar specs), but it's probably a hardware limit.
Still waiting and hoping for 1070 to perform better...

So finally there seems nothing wrong at all.
It's good to see NV is faster with VK than with OpenCL 1.2 too,
and the waiting on compiler is ok with the pipeline cache (but maybe i can still improve this by telling the driver i want my current GPU only, not all existing NV generations).

Forgot to mention: I can confirm it works to have both AMD and NV in one machine. Can use one for rendering and the other for compute. Can optimize for both etc. Awesome! :) Edited by JoeJ

Share this post

Link to post
Share on other sites

Forgot to mention: I can confirm it works to have both AMD and NV in one machine. Can use one for rendering and the other for compute. Can optimize for both etc. Awesome! :)

You mean it's possible for a game to use simultaneously my integrated Intel graphics together with my GTX1070? Cool :-)

Share this post

Link to post
Share on other sites

You mean it's possible for a game to use simultaneously my integrated Intel graphics together with my GTX1070?

I think so, but i don't have iGPU and some people say it gets turned off if it is unplugged and a dedicated GPU is detected.
But i guess this is not true for modern APIs or even OpenCL. It should be available for compute.

Although iGPU throttles CPU cores due to heat and bandwidth, i think its perfect for things like physics.

(just checked: NV allows to use the GTX670 for PhysX even i'm using an AMD GPU - some years back they prevented this :) )

Share this post

Link to post
Share on other sites

I assume the 670 goes to some kind of power saving mode after 10 frames (it's an abrupt change from 10 to 30 ms).

Can be fixed in driver settings (prefer maximum performance).

the waiting on compiler is ok with the pipeline cache (but maybe i can still improve this by telling the driver i want my current GPU only, not all existing NV generations)

This is already the case, after switching to GTX 1070 all shaders had to be recompiled again.

Compute performance with Pascal is much better than Kepler:

AMD FuryX: 1.37ms
NV 1070: 2.01 ms
AMD 7950: 2.3 ms
NV 670: 9.5 ms

Currently i have optimized my shaders for FuryX and GTX670, but 1070 runs better with the settings from Fury.
At the time i don't see a reason to do much vendor optimizations at all - would be nice if this holds true for future chips as well.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement