• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Achivai
      Hey, I am semi-new to 3d-programming and I've hit a snag. I have one object, let's call it Object A. This object has a long int array of 3d xyz-positions stored in it's vbo as an instanced attribute. I am using these numbers to instance object A a couple of thousand times. So far so good. 
      Now I've hit a point where I want to remove one of these instances of object A while the game is running, but I'm not quite sure how to go about it. At first my thought was to update the instanced attribute of Object A and change the positions to some dummy number that I could catch in the vertex shader and then decide there whether to draw the instance of Object A or not, but I think that would be expensive to do while the game is running, considering that it might have to be done several times every frame in some cases. 
      I'm not sure how to proceed, anyone have any tips?
    • By fleissi
      Hey guys!

      I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
      The full source code is hosted on github:
      https://github.com/fleissna/flyEngine

      I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
      I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

      I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

      Features I have so far:
      - Dynamic shader generation based on material properties
      - Dynamic sorting of meshes to be renderd based on shader and material
      - Rendering large amounts of static meshes
      - Hierarchical culling (detail + view frustum)
      - Limited support for dynamic (i.e. moving) meshes
      - Normal, Parallax and Relief Mapping implementations
      - Wind animations based on vertex displacement
      - A very basic integration of the Bullet physics engine
      - Procedural Grass generation
      - Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
      - Caching mechanisms for textures, shaders, materials and meshes

      Features I would like to have:
      - Global illumination methods
      - Scalable physics
      - Occlusion culling
      - A nice procedural terrain generator
      - Scripting
      - Level Editing
      - Sound system
      - Optimization techniques

      Books I have so far:
      - Real-Time Rendering Third Edition
      - 3D Game Programming with DirectX 11
      - Vulkan Cookbook (not started yet)

      I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
      There are some videos on youtube that demonstrate some of the features:
      Procedural grass on the GPU
      Procedural Terrain Engine
      Quadtree detail and view frustum culling

      The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.

      Bye,

      Phil
    • By tj8146
      I have attached my project in a .zip file if you wish to run it for yourself.
      I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this? 
       
      Errors when I build my code:
      1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========  
       
      Renderer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
      #pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
      #pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
      #pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }  
      Game.zip
    • By lxjk
      Hi guys,
      There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
      Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
      On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
      This method can be naturally extended to clustered light culling as well.
      The following image shows the general ideas

       
      Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test
       

       
      I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!
       
      Eric
    • By Fadey Duh
      Good evening everyone!

      I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
      Basically I'm trying to find more compatible version of it.

      Thank you!
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL GL_ARB_separate_shader_objects Performance ?

This topic is 1475 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

As soon as I learned about GL_ARB_separate_shader_objects I imagined that I would have huge performance benefits because with this one OpenGL would look and feel more like DirectX9+ and my cross-API code could remain similar while maintaining high performance.

To my surprise, I implemented GL_ARB_separate_shader_objects only to find out that my performance is halved, and my GPU usage dropped from ~95% to 45%. So basically, having them as a monolithic program is twice as fast as being separate. This is on an AMD HD7850 under Windows 8 and OpenGL 4.2 Core.

I originally imagined that this extension was created to boost performance by separating constant buffers and shader stages, but it seems it might have been created for people wanting to port DirectX shaders more directly, with disregard to any performance hits.

So my question, is if you have implemented this feature in a reasonable scene, what is your performance difference compared to monolitic programs ?

Share this post


Link to post
Share on other sites
Advertisement

Updating my post cause I was wondering for a long time how is Nvidia doing with separate shader objects.

Turns out, not that great. Got a GTX 650 running on latest 335 drivers, got 290 FPS with SSO On, and 308 without them, so still a heavy performance penatly. And I'm doing a pipeline object per shader pair.

 

And I'm also seeing some weird behavior with objects that don't use textures.

Share this post


Link to post
Share on other sites

That's stunning. I've always abstained from using that extension (without even trying once!) because I had the weird "feeling" that it would probably run much slower, since the shader compiler is necessarily not able to optimize as agressively as it could otherwise.

 

I remember a statement from a nVidia guy years ago that basically said "Yeah, but not much of that is happening anyway, and the hardware works in a mix-and-match way in the sense of ARB_separate_shader_objects, it has been working like that for many years".

 

Now you've done back-to-back comparisons on two major vendors and see that while nVidia is at least in the same ballpark, both major IHVs are noticeably slower with separate shader objects. Bummer.

 

One should not forget that getting 50% of 300fps is still 150fps, which is way good enough (and many people will argue that practically, with VSync, there is absolutely no difference).

But of course only utilizing the GPU 50% means the customer who needs to meet your minimum hardware requirements could as well only have spent half as much money on hardware... or they could run your game with 2x or 4x supersampling at the same frame rate instead, with the hardware they have.

Share this post


Link to post
Share on other sites

When using this, make absolutely sure that your vertex shader output and fragment shader inputs match perfectly. The same variables, same semantics, same order, etc, etc. Otherwise the driver will have to patch the fragment program to fetch the correct interpolators.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement