• 10
• 11
• 12
• 14
• 15
• Similar Content

• By Achivai
Hey, I am semi-new to 3d-programming and I've hit a snag. I have one object, let's call it Object A. This object has a long int array of 3d xyz-positions stored in it's vbo as an instanced attribute. I am using these numbers to instance object A a couple of thousand times. So far so good.
Now I've hit a point where I want to remove one of these instances of object A while the game is running, but I'm not quite sure how to go about it. At first my thought was to update the instanced attribute of Object A and change the positions to some dummy number that I could catch in the vertex shader and then decide there whether to draw the instance of Object A or not, but I think that would be expensive to do while the game is running, considering that it might have to be done several times every frame in some cases.
I'm not sure how to proceed, anyone have any tips?
• By fleissi
Hey guys!

I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
The full source code is hosted on github:
https://github.com/fleissna/flyEngine

I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

Features I have so far:
- Dynamic shader generation based on material properties
- Dynamic sorting of meshes to be renderd based on shader and material
- Rendering large amounts of static meshes
- Hierarchical culling (detail + view frustum)
- Limited support for dynamic (i.e. moving) meshes
- Normal, Parallax and Relief Mapping implementations
- Wind animations based on vertex displacement
- A very basic integration of the Bullet physics engine
- Procedural Grass generation
- Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
- Caching mechanisms for textures, shaders, materials and meshes

Features I would like to have:
- Global illumination methods
- Scalable physics
- Occlusion culling
- A nice procedural terrain generator
- Scripting
- Level Editing
- Sound system
- Optimization techniques

Books I have so far:
- Real-Time Rendering Third Edition
- 3D Game Programming with DirectX 11
- Vulkan Cookbook (not started yet)

I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
There are some videos on youtube that demonstrate some of the features:
Procedural grass on the GPU
Procedural Terrain Engine
Quadtree detail and view frustum culling

The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.

Bye,

Phil
• By tj8146
I have attached my project in a .zip file if you wish to run it for yourself.
I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this?

Errors when I build my code:
1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Renderer.cpp
#include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
#pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
#include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
#pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
#pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }
Game.zip
• By lxjk
Hi guys,
There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
This method can be naturally extended to clustered light culling as well.
The following image shows the general ideas

Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test

I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!

Eric

• Good evening everyone!

I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
Basically I'm trying to find more compatible version of it.

Thank you!

OpenGL Most efficient way to batch drawings

This topic is 1937 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hi, I'm interested on a bit of theory about the best methods of optimization for OpenGL 3.0 (where a lot of function became deprecated).

On my current 2D framework, every sprite has own program with own values inside the uniform. Every sprite is draw separately and, now that I switched from 2.1 to 3.0, every sprite has own matrix Projection and View. Now my goal is to batch most vertexes possible and these are some ideas:

1) Use only one program for everything. The projection matrix is one, i can group the vertexes and send via glVertexAttribArray the values for shader and draw everything with one call. The problem is the model view matrix, that should be one for every vertex and this isn't the thing that I want because every sprite has own matrix.

2) Continue to use various shader. The projection matrix is shared between programs (how can I do it?), every sprite has own shader with own model view matrix and uniform values. The problem here is that I need to switch the program between sprite draws.

None of these ideas work as I expected so now I'm here to ask you what is it the most efficient way to batch drawings in OpenGL 3.0.

Share on other sites
If only one or two transform matrices are unique for every sprite, I see no reason why you can't draw them all in one single call. You can index into a uniform array or into a buffer texture to read these, using e.g. gl_InstanceID in the vertex shader if you use instancing (or gl_VertexID divided by 4 otherwise).

Or, you can generate quads from points in the geometry shader and use either gl_VertexID or gl_PrimitiveID (which are the same in that particular case) as an index (in that case, transform is done in the GS too). A sprite likely does not have a dozen output attributes, so the geometry shader should be reasonably efficient, too.

Either solution is a thousand times more efficient than binding different uniforms (or even shaders!) for every sprite, or for some subset of sprites that you have determined with some clever batching algorithm. Edited by samoth

Share on other sites
1) Use only one program for everything. The projection matrix is one, i can group the vertexes and send via glVertexAttribArray the values for shader and draw everything with one call. The problem is the model view matrix, that should be one for every vertex and this isn't the thing that I want because every sprite has own matrix.
I'm not sure what you mean by this, but maybe what you want is instancing.

2) Continue to use various shader. The projection matrix is shared between programs (how can I do it?), every sprite has own shader with own model view matrix and uniform values. The problem here is that I need to switch the program between sprite draws.
Uniform buffers make sharing easy. Edited by max343

Share on other sites
If only one or two transform matrices are unique for every sprite, I see no reason why you can't draw them all in one single call. You can index into a uniform array or into a buffer texture to read these, using e.g. gl_InstanceID in the vertex shader if you use instancing (or gl_VertexID divided by 4 otherwise).

Or, you can generate quads from points in the geometry shader and use either gl_VertexID or gl_PrimitiveID (which are the same in that particular case) as an index (in that case, transform is done in the GS too). A sprite likely does not have a dozen output attributes, so the geometry shader should be reasonably efficient, too.

Either solution is a thousand times more efficient than binding different uniforms (or even shaders!) for every sprite, or for some subset of sprites that you have determined with some clever batching algorithm.

So if I have 100 sprites I should send 100 view model matrix with glUniformMatrix4fv and select them with gl_VertexID/4?

1) Use only one program for everything. The projection matrix is one, i can group the vertexes and send via glVertexAttribArray the values for shader and draw everything with one call. The problem is the model view matrix, that should be one for every vertex and this isn't the thing that I want because every sprite has own matrix.
I'm not sure what you mean by this, but maybe what you want is instancing.

2) Continue to use various shader. The projection matrix is shared between programs (how can I do it?), every sprite has own shader with own model view matrix and uniform values. The problem here is that I need to switch the program between sprite draws.
Uniform buffers make sharing easy.

Yes, I mean instancing (I saw what instancing is it only now). Do you recommend me to send matrices in an uniform array or in a texture?

Share on other sites
I always prefer using uniform buffers. Initially the piping is a bit tricky to understand, but once you grasp that part, their advantages over textures are apparent.

BTW, OpenGL 3 supports instancing. Edited by max343

Share on other sites
So if I have 100 sprites I should send 100 view model matrix with glUniformMatrix4fv and select them with gl_VertexID/4?

I didn't read into it the first time, but the answer is no. A big no. It's much better to use uniform buffers for something this big (or for something that you're going to share). In fact it's better to limit the usage of global uniforms only to those cases in which the overhead of using the buffer is greater.

Share on other sites

Okay, I reduced the uses of shaders to one only and I've implemented the uses of VBO. I'm unpacking the triangle strip to a triangle list into a structure with 512 * sizeof(Vertex) size. I'm building and drawing the VBO when the structure is filled with this:

glBufferData(GL_ARRAY_BUFFER, m_vertexcacheIndex * sizeof(SuperVertex), m_vertexcache, GL_DYNAMIC_DRAW);
glVertexAttribPointer(vert_position, 3, GL_FLOAT, GL_FALSE, sizeof(SuperVertex), BUFFER_OFFSET(0 * sizeof(float)));
glVertexAttribPointer(vert_texture, 3, GL_FLOAT, GL_FALSE, sizeof(SuperVertex), BUFFER_OFFSET(3 * sizeof(float)));
glVertexAttribPointer(vert_color, 4, GL_FLOAT, GL_FALSE, sizeof(SuperVertex), BUFFER_OFFSET(6 * sizeof(float)));
glDrawArrays(GL_TRIANGLES, 0, m_vertexcacheIndex);
m_vertexcacheIndex = 0;

where m_vertexcacheIndex is the vertices count inside th structure, m_vertexcache is the structure itself and supervertex is the structure definition. I debugged the software with gDEBugger, before VBO I was doing 12k gl calls per frame, now only 120 calls but I have bad performances. Before 720fps, now 350...

Edited by Retsu90

Share on other sites

I did some tests:

1) Call glVertexAttribPointer and glDrawArrays with GL_TRIANGLE_STRIP for every sprite (the original mode before to create this post), reaches 498fps. The stride here is 0, this mean that vertex position, texture position and color are in separate structures.

2) Cache the vertices in an array of 1024 structures. I'm copying the vertices that I'm passing to the cache with a memcpy. When the array is full, the content is drawn with glVertexAttribPointer and glDrawElements with GL_TRIANGLE_STRIP. I'm indexing the vertices here. The stride is 0. 589fps!!!

3) Same as above, but vertex position, texture position and color are on the same structure, this mean that I need to call memcpy to copy the sprite model, only once. I was expecting an improvment. 399fps.

4) Same as 4, but this time I'm unpacking the vertices from GL_TRIANGLE_STRIP to GL_TRIANGLES. I'm passing the 4 vertices and a function unpack them to 6 vertices. With this I don't need of indexed vertices. This takes much memory but the fps reached are 562!

5) Same as 3, but this time I'm using VBO: only 270fps.

6) Same as 4 but with VBO: 278fps.

Supposing that I'm not doing nothing's wrong, the best mode is the second. It doesn't take much memory and the indexing mode is easy to do. With this I can hardcode some basic models and indexing them. The vertex unpacking from STRIP to LIST can takes a lot of resources and it doesn't improve so much. I should avoid the structures all-in-one (I read from OpenGL documentation that it's implemented for D3D compatibility) and stores every attrib in a separate structure. For some reason, VBO decrease the performances and with this, SwapBuffer takes a lot of CPU. However all this methods are CPU-limited, because the GPU isn't totally used. Much of the CPU is drawined by memcpy and SwapBuffer.

EDIT: I tried the same tests with the same software without edits on another computer that handle a Intel HD3000 (the first tests run on a Radeon 4870HD): 62, 178, 124, 163, 97, 207fps. VBO with triangle list is much faster this time. I'm starting to be confused...

Edited by Retsu90

Share on other sites
For some reason, VBO decrease the performances and with this, SwapBuffer takes a lot of CPU. However all this methods are CPU-limited, because the GPU isn't totally used. Much of the CPU is drawined by memcpy and SwapBuffer.

EDIT: I tried the same tests with the same software without edits on another computer that handle a Intel HD3000 (the first tests run on a Radeon 4870HD): 62, 178, 124, 163, 97, 207fps. VBO with triangle list is much faster this time. I'm starting to be confused...

If you gave us more details about the way you have measured the time, maybe we could find the cause. SwapBuffers is not a time-consuming instruction. The reason it take time is waiting for drawing to finish. That implies your measured time is incorrect. How did you measured it?

Share on other sites
For some reason, VBO decrease the performances and with this, SwapBuffer takes a lot of CPU. However all this methods are CPU-limited, because the GPU isn't totally used. Much of the CPU is drawined by memcpy and SwapBuffer.

EDIT: I tried the same tests with the same software without edits on another computer that handle a Intel HD3000 (the first tests run on a Radeon 4870HD): 62, 178, 124, 163, 97, 207fps. VBO with triangle list is much faster this time. I'm starting to be confused...

If you gave us more details about the way you have measured the time, maybe we could find the cause. SwapBuffers is not a time-consuming instruction. The reason it take time is waiting for drawing to finish. That implies your measured time is incorrect. How did you measured it?

I'm measuring it with gDEBugger, setting SwapBuffer as end-of-frame. With the profiling of Visual Studio, I can see clearly that SwapBuffers takes the 50% of the CPU in a single frame.