• 11
• 9
• 11
• 9
• 11
• ### Similar Content

• By tj8146
I am using immediate mode for OpenGL and I am creating a 2D top down car game. I am trying to configure my game loop in order to get my car-like physics working on a square shape. I have working code but it is not doing as I want it to. I am not sure as to whether it is my game loop that is incorrect or my code for the square is incorrect, or maybe both! Could someone help because I have been trying to work this out for over a day now
I have attached my .cpp file if you wish to run it for yourself..
WinMain code:
/******************* WIN32 FUNCTIONS ***************************/ int WINAPI WinMain( HINSTANCE hInstance, // Instance HINSTANCE hPrevInstance, // Previous Instance LPSTR lpCmdLine, // Command Line Parameters int nCmdShow) // Window Show State { MSG msg; // Windows Message Structure bool done=false; // Bool Variable To Exit Loop Car car; car.x = 220; car.y = 140; car.dx = 0; car.dy = 0; car.ang = 0; AllocConsole(); FILE *stream; freopen_s(&stream, "CONOUT$", "w", stdout); // Create Our OpenGL Window if (!CreateGLWindow("OpenGL Win32 Example",screenWidth,screenHeight)) { return 0; // Quit If Window Was Not Created } while(!done) // Loop That Runs While done=FALSE { if (PeekMessage(&msg,NULL,0,0,PM_REMOVE)) // Is There A Message Waiting? { if (msg.message==WM_QUIT) // Have We Received A Quit Message? { done=true; // If So done=TRUE break; } else // If Not, Deal With Window Messages { TranslateMessage(&msg); // Translate The Message DispatchMessage(&msg); // Dispatch The Message } } else // If There Are No Messages { if(keys[VK_ESCAPE]) done = true; void processKeys(Car& car); //process keyboard while (game_is_running) { loops = 0; while (GetTickCount() > next_game_tick && loops < MAX_FRAMESKIP) { update(car); // update variables next_game_tick += SKIP_TICKS; loops++; } display(car); // Draw The Scene SwapBuffers(hDC); // Swap Buffers (Double Buffering) } } } // Shutdown KillGLWindow(); // Kill The Window return (int)(msg.wParam); // Exit The Program } //WIN32 Processes function - useful for responding to user inputs or other events. LRESULT CALLBACK WndProc( HWND hWnd, // Handle For This Window UINT uMsg, // Message For This Window WPARAM wParam, // Additional Message Information LPARAM lParam) // Additional Message Information { switch (uMsg) // Check For Windows Messages { case WM_CLOSE: // Did We Receive A Close Message? { PostQuitMessage(0); // Send A Quit Message return 0; // Jump Back } break; case WM_SIZE: // Resize The OpenGL Window { reshape(LOWORD(lParam),HIWORD(lParam)); // LoWord=Width, HiWord=Height return 0; // Jump Back } break; case WM_LBUTTONDOWN: { mouse_x = LOWORD(lParam); mouse_y = screenHeight - HIWORD(lParam); LeftPressed = true; } break; case WM_LBUTTONUP: { LeftPressed = false; } break; case WM_MOUSEMOVE: { mouse_x = LOWORD(lParam); mouse_y = screenHeight - HIWORD(lParam); } break; case WM_KEYDOWN: // Is A Key Being Held Down? { keys[wParam] = true; // If So, Mark It As TRUE return 0; // Jump Back } break; case WM_KEYUP: // Has A Key Been Released? { keys[wParam] = false; // If So, Mark It As FALSE return 0; // Jump Back } break; } // Pass All Unhandled Messages To DefWindowProc return DefWindowProc(hWnd,uMsg,wParam,lParam); } C++ and OpenGL code: int mouse_x=0, mouse_y=0; bool LeftPressed = false; int screenWidth=1080, screenHeight=960; bool keys[256]; float radiansFromDegrees(float deg) { return deg * (M_PI / 180.0f); } float degreesFromRadians(float rad) { return rad / (M_PI / 180.0f); } bool game_is_running = true; const int TICKS_PER_SECOND = 50; const int SKIP_TICKS = 1000 / TICKS_PER_SECOND; const int MAX_FRAMESKIP = 10; DWORD next_game_tick = GetTickCount(); int loops; typedef struct { float x, y; float dx, dy; float ang; }Car; //OPENGL FUNCTION PROTOTYPES void display(const Car& car); //called in winmain to draw everything to the screen void reshape(int width, int height); //called when the window is resized void init(); //called in winmain when the program starts. void processKeys(Car& car); //called in winmain to process keyboard input void update(Car& car); //called in winmain to update variables /************* START OF OPENGL FUNCTIONS ****************/ void display(const Car& car) { const float w = 50.0f; const float h = 50.0f; glClear(GL_COLOR_BUFFER_BIT); glLoadIdentity(); glTranslatef(100, 100, 0); glBegin(GL_POLYGON); glVertex2f(car.x, car.y); glVertex2f(car.x + w, car.y); glVertex2f(car.x + w, car.y + h); glVertex2f(car.x, car.y + h); glEnd(); glFlush(); } void reshape(int width, int height) // Resize the OpenGL window { screenWidth = width; screenHeight = height; // to ensure the mouse coordinates match // we will use these values to set the coordinate system glViewport(0, 0, width, height); // Reset the current viewport glMatrixMode(GL_PROJECTION); // select the projection matrix stack glLoadIdentity(); // reset the top of the projection matrix to an identity matrix gluOrtho2D(0, screenWidth, 0, screenHeight); // set the coordinate system for the window glMatrixMode(GL_MODELVIEW); // Select the modelview matrix stack glLoadIdentity(); // Reset the top of the modelview matrix to an identity matrix } void init() { glClearColor(1.0, 1.0, 0.0, 0.0); //sets the clear colour to yellow //glClear(GL_COLOR_BUFFER_BIT) in the display function //will clear the buffer to this colour. } void processKeys(Car& car) { if (keys[VK_UP]) { float cdx = sinf(radiansFromDegrees(car.ang)); float cdy = -cosf(radiansFromDegrees(car.ang)); car.dx += cdx; car.dy += cdy; } if (keys[VK_DOWN]) { float cdx = sinf(radiansFromDegrees(car.ang)); float cdy = -cosf(radiansFromDegrees(car.ang)); car.dx += -cdx; car.dy += -cdy; } if (keys[VK_LEFT]) { car.ang -= 2; } if (keys[VK_RIGHT]) { car.ang += 2; } } void update(Car& car) { car.x += car.dx*next_game_tick; } game.cpp • By tj8146 I am using immediate mode for OpenGL and I am creating a 2D top down car game. I am trying to configure my game loop in order to get my car-like physics working on a square shape. I have working code but it is not doing as I want it to. I am not sure as to whether it is my game loop that is incorrect or my code for the square is incorrect, or maybe both! Could someone help because I have been trying to work this out for over a day now I have attached my .cpp file if you wish to run it for yourself.. This is my C++ and OpenGL code: int mouse_x=0, mouse_y=0; bool LeftPressed = false; int screenWidth=1080, screenHeight=960; bool keys[256]; float radiansFromDegrees(float deg) { return deg * (M_PI / 180.0f); } float degreesFromRadians(float rad) { return rad / (M_PI / 180.0f); } bool game_is_running = true; const int TICKS_PER_SECOND = 50; const int SKIP_TICKS = 1000 / TICKS_PER_SECOND; const int MAX_FRAMESKIP = 10; DWORD next_game_tick = GetTickCount(); int loops; typedef struct { float x, y; float dx, dy; float ang; }Car; //OPENGL FUNCTION PROTOTYPES void display(const Car& car); //called in winmain to draw everything to the screen void reshape(int width, int height); //called when the window is resized void init(); //called in winmain when the program starts. void processKeys(Car& car); //called in winmain to process keyboard input void update(Car& car); //called in winmain to update variables /************* START OF OPENGL FUNCTIONS ****************/ void display(const Car& car) { const float w = 50.0f; const float h = 50.0f; glClear(GL_COLOR_BUFFER_BIT); glLoadIdentity(); glTranslatef(100, 100, 0); glBegin(GL_POLYGON); glVertex2f(car.x, car.y); glVertex2f(car.x + w, car.y); glVertex2f(car.x + w, car.y + h); glVertex2f(car.x, car.y + h); glEnd(); glFlush(); } void reshape(int width, int height) // Resize the OpenGL window { screenWidth = width; screenHeight = height; // to ensure the mouse coordinates match // we will use these values to set the coordinate system glViewport(0, 0, width, height); // Reset the current viewport glMatrixMode(GL_PROJECTION); // select the projection matrix stack glLoadIdentity(); // reset the top of the projection matrix to an identity matrix gluOrtho2D(0, screenWidth, 0, screenHeight); // set the coordinate system for the window glMatrixMode(GL_MODELVIEW); // Select the modelview matrix stack glLoadIdentity(); // Reset the top of the modelview matrix to an identity matrix } void init() { glClearColor(1.0, 1.0, 0.0, 0.0); //sets the clear colour to yellow //glClear(GL_COLOR_BUFFER_BIT) in the display function //will clear the buffer to this colour. } void processKeys(Car& car) { if (keys[VK_UP]) { float cdx = sinf(radiansFromDegrees(car.ang)); float cdy = -cosf(radiansFromDegrees(car.ang)); car.dx += cdx; car.dy += cdy; } if (keys[VK_DOWN]) { float cdx = sinf(radiansFromDegrees(car.ang)); float cdy = -cosf(radiansFromDegrees(car.ang)); car.dx += -cdx; car.dy += -cdy; } if (keys[VK_LEFT]) { car.ang -= 2; } if (keys[VK_RIGHT]) { car.ang += 2; } } void update(Car& car) { car.x += car.dx*next_game_tick; } My WinMain code: /******************* WIN32 FUNCTIONS ***************************/ int WINAPI WinMain( HINSTANCE hInstance, // Instance HINSTANCE hPrevInstance, // Previous Instance LPSTR lpCmdLine, // Command Line Parameters int nCmdShow) // Window Show State { MSG msg; // Windows Message Structure bool done=false; // Bool Variable To Exit Loop Car car; car.x = 220; car.y = 140; car.dx = 0; car.dy = 0; car.ang = 0; AllocConsole(); FILE *stream; freopen_s(&stream, "CONOUT$", "w", stdout); // Create Our OpenGL Window if (!CreateGLWindow("OpenGL Win32 Example",screenWidth,screenHeight)) { return 0; // Quit If Window Was Not Created } while(!done) // Loop That Runs While done=FALSE { if (PeekMessage(&msg,NULL,0,0,PM_REMOVE)) // Is There A Message Waiting? { if (msg.message==WM_QUIT) // Have We Received A Quit Message? { done=true; // If So done=TRUE break; } else // If Not, Deal With Window Messages { TranslateMessage(&msg); // Translate The Message DispatchMessage(&msg); // Dispatch The Message } } else // If There Are No Messages { if(keys[VK_ESCAPE]) done = true; void processKeys(Car& car); //process keyboard while (game_is_running) { loops = 0; while (GetTickCount() > next_game_tick && loops < MAX_FRAMESKIP) { update(car); // update variables next_game_tick += SKIP_TICKS; loops++; } display(car); // Draw The Scene SwapBuffers(hDC); // Swap Buffers (Double Buffering) } } } // Shutdown KillGLWindow(); // Kill The Window return (int)(msg.wParam); // Exit The Program } //WIN32 Processes function - useful for responding to user inputs or other events. LRESULT CALLBACK WndProc( HWND hWnd, // Handle For This Window UINT uMsg, // Message For This Window WPARAM wParam, // Additional Message Information LPARAM lParam) // Additional Message Information { switch (uMsg) // Check For Windows Messages { case WM_CLOSE: // Did We Receive A Close Message? { PostQuitMessage(0); // Send A Quit Message return 0; // Jump Back } break; case WM_SIZE: // Resize The OpenGL Window { reshape(LOWORD(lParam),HIWORD(lParam)); // LoWord=Width, HiWord=Height return 0; // Jump Back } break; case WM_LBUTTONDOWN: { mouse_x = LOWORD(lParam); mouse_y = screenHeight - HIWORD(lParam); LeftPressed = true; } break; case WM_LBUTTONUP: { LeftPressed = false; } break; case WM_MOUSEMOVE: { mouse_x = LOWORD(lParam); mouse_y = screenHeight - HIWORD(lParam); } break; case WM_KEYDOWN: // Is A Key Being Held Down? { keys[wParam] = true; // If So, Mark It As TRUE return 0; // Jump Back } break; case WM_KEYUP: // Has A Key Been Released? { keys[wParam] = false; // If So, Mark It As FALSE return 0; // Jump Back } break; } // Pass All Unhandled Messages To DefWindowProc return DefWindowProc(hWnd,uMsg,wParam,lParam); }
game.cpp
• By lxjk
Hi guys,
There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
This method can be naturally extended to clustered light culling as well.
The following image shows the general ideas

Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test

I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!

Eric

• Good evening everyone!

I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
Basically I'm trying to find more compatible version of it.

Thank you!

• Hello guys,

How do I know? Why does wavefront not show for me?
I already checked I have non errors yet.

And my download (mega.nz) should it is original but I tried no success...
- Add blend source and png file here I have tried tried,.....

PS: Why is our community not active? I wait very longer. Stop to lie me!
Thanks !

# OpenGL Efficient instancing in OpenGL

This topic is 1391 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

The game I'm working on should be able to render dense forests with many trees and detailed foliage. I have been using instancing for drawing pretty much everything, but even so, I have lately hit some performance issues.

My implementation is based on storing instance data in uniforms. I restrict the object transformations so that only translation, uniform scale and rotation along one axis are allowed. For the rotation part, I pass sin(angle) and cos(angle) as uniforms. Thus 6 floats are passed per instance. This way, I can easily draw 256 instances at once by invoking glUniform4fv, glUniform2fv and glDrawElementsInstancedBaseVertex per batch. The particular draw command is used, because I use large VBO:s that store multiple meshes.

Lately I have noticed, that the performance is too low for my purposes. I used gDebugger in an attempt to finding the bottleneck. The FPS count was initially roughly 40. Lowering texture resolution had no effect. Disabling raster commands had negligible effect. Disabling draw commands boosted FPS to over 100. Thus I guess the conclusion is that the excecution is not CPU nor raster operation bound, but has to do with vertex processing.

I'm also using impostors for the trees, and level of detail for the meshes, but I have the feeling that I should be able to draw more instances of the meshed trees than what I'm currently able to. I actually had quite ok FPS of 80 with just the trees in place, but adding the foliage (a lot of instances of small low poly meshes) dropped the FPS to 40. Disabling either the trees or the foliage increases the FPS significantly. Disabling the terrain, which uses a lot of polygons, has no effect, so I think the issue is not being just bound by polygon count.

Could it be that uploading the uniform data is the limiting factor?

For some of the instanced object types, such as the trees, the transformation data is static and is stored in the leaf nodes of a bounding volume hierarchy (BVH) in proper arrays, so that glUniform* can be called without further assembly of data. It would then make sense to actually store these arrays in video memory. What is the best way to do this these days? I think that VBO:s are used in conjuction with glVertexAttribDivisor. To me this does not seem very neat approach, as "vertex attributes" are used for something that are clearly of "uniform" nature. But anyway, I could then make a huge VBO for the entire BVH and store a base instance value and number of instances for each leaf node. To render a leaf node, I would then use glDrawElementsInstancedBaseVertexBaseInstance. This is GL 4.2 core spec. which might be a bit too high. Are there better options? I also have objects (the foliage), for which the transformation data is dynamic (updated occasionally), as they are only placed around the camera. What would be the best way to store/transfer the transformation data in this case?

##### Share on other sites

I actually had quite ok FPS of 80 with just the trees in place, but adding the foliage (a lot of instances of small low poly meshes) dropped the FPS to 40.

Don't use FPS to measure performance. Always use millisecond per frame. 80fps is 12.5ms per frame. 40fps is 25ms per frame.
This probably means that your foliate taking 25-12.5 = 12.5ms on the GPU.
Instancing ONLY helps you out on the CPU-side of things - it does nothing to lighten the GPU-side workload. What kind of pixel shader is used on the foliage poly's? How much overdraw is there? Is blending / alpha testing enabled? How many pixels are covered (counting overdrawn pixels multiple times)? Does changing the frame-buffer resolution impact performance? What kind of GPU are you using?

You can use the high performance timer to measure the CPU duration of different operations per frame and EXT_timer_query to measure the GPU duration of different parts per frame. Measure the CPU and GPU costs so you're sure which one is actually the bottleneck.

It's also a good idea to time how long glSwapBuffers takes to execute -- if significant CPU time is spent in that call, it usually indicates that the CPU is waiting on the GPU (or is waiting for a vblank with vsync enabled -- disable vsync when profiling to avoid this  )

Could it be that uploading the uniform data is the limiting factor?

If so, you'd probably notice a significant amount of CPU time spent inside the functions that map/update those buffers.

##### Share on other sites

Don't use FPS to measure performance. Always use millisecond per frame. 80fps is 12.5ms per frame. 40fps is 25ms per frame.
This probably means that your foliate taking 25-12.5 = 12.5ms on the GPU.

Good remark. I'm actually aware of this, but old habits die hard. I know that drop from 80 to 40 fps is significant in terms of frame time. If I was experiencing a drop from 200 to 160, I would not be writing this.

Instancing ONLY helps you out on the CPU-side of things - it does nothing to lighten the GPU-side workload. What kind of pixel shader is used on the foliage poly's? How much overdraw is there? Is blending / alpha testing enabled? How many pixels are covered (counting overdrawn pixels multiple times)? Does changing the frame-buffer resolution impact performance?

The shaders for trees and foliage are mostly simple, although the vertices are animated in vertex shader, where a sin function is evaluated. I'm using alpha testing, not blending. Objects are roughly (render batches) ordered front to back, but there is probably still significant overdraw. However, I though that these factors were ruled out by the fact that disabling raster commands in gDebugger had practically no effect. I would expect that this also rules out being limited by frame-buffer bandwidth, although I will try to lower frame-buffer resolution when I get a chance.

You can use the high performance timer to measure the CPU duration of different operations per frame and EXT_timer_query to measure the GPU duration of different parts per frame.

It's also a good idea to time how long glSwapBuffers takes to execute -- if significant CPU time is spent in that call, it usually indicates that the CPU is waiting on the GPU.

Didn't know about EXT_timer_query. This should prove useful in general. In this case, I could the time spent by glUniform* to upload the instance data. I will try to time glSwapBuffers.

##### Share on other sites

I tried measuring the time that my "drawFoliage" takes by averaging over 10 frames. This function basically keeps the instance data up to date and calls glUniform*, does shader and texture binds and issues the instantiated draw call. I get 0 ms or 1 ms. I tried this with both SDL_GetTicks and clock_gettime. I think both yield 1 ms accuracy on my machine so the result is not that accurate. The timer library Hodgman linked also seems to promise at least 1 ms accuracy. Maybe I'll give it a try to see if I'm lucky and get more precision.

I also measured the time taken by SDL_GL_SwapBuffers(). It turned out to be between 0 ms and 36 ms! It seems also to be the case that zero and a larger value alternate.

I changed the shaders for the foliage to simpler ones with no normal mapping nor animation, but it had no effect.

Wouldn't the most accurate way to measure CPU time spent in functions be the use of a profiler? Often the large amount of time used in the initialization routines makes the results a bit difficult to read, but the call graph is pretty instructive. Of course per frame timing is not possible this way.

##### Share on other sites

It looks a lot like you're seeing the effects of buffering here.  I suggest adding a glFinish call before you get your start time, and another before you get your end time, which will give you the actual time that the driver and your hardware spend doing work; otherwise all that you're really measuring is the time taken to add everything to a command buffer: effectively the equivalent of a handful of memcpy calls.

The glFinish before getting your start time ensures that any pending work is completed before it returns, so that doesn't skew your results.

The glFinish before getting your end time ensures that the work you've just submitted is completed before it returns, otherwise the GL calls you're profiling may not actually issue on the GPU until up to 3 frames later.

##### Share on other sites

It looks a lot like you're seeing the effects of buffering here.  I suggest adding a glFinish call before you get your start time, and another before you get your end time, which will give you the actual time that the driver and your hardware spend doing work; otherwise all that you're really measuring is the time taken to add everything to a command buffer: effectively the equivalent of a handful of memcpy calls.

The glFinish before getting your start time ensures that any pending work is completed before it returns, so that doesn't skew your results.

The glFinish before getting your end time ensures that the work you've just submitted is completed before it returns, otherwise the GL calls you're profiling may not actually issue on the GPU until up to 3 frames later.

Thanks for the tip.

I tried this:

glFinish();
long int time = getTimeMilliSec();
SDL_GL_SwapBuffers();
glFinish();
cout << getTimeMilliSec()-time << endl;


Now I get 0 or 1 ms.

##### Share on other sites

It once again turned out that something completely different was going on. I first removed the call to the "drawFoliage" function, but surprisingly the framerate was still low. After removing the piece of code that pushes the foliage data into the BVH the framerate increased back to a good level. I then enabled my BVH visualizer code to see what was going on. I made a little illustration of this: http://www.perilouspenguin.com/pics/illustration.jpg .

By BVH works so that it starts off as a quadtree, but each object belongs to a single leaf only. This is achieved by refitting the bounding box of a node each time the objects have been divided into four subnodes. This makes it possible to have ready-to-use render lists in the leaves. Of course it is assumed that no single object spans a significant portion of the map. This has worked well thus far, but for some reason adding the foliage groups (which are small area collections of low poly foliage objects) totally screwed the BVH node distribution. Thus much more objects were considered as visible at any location than before. I managed to mitigate this issue somewhat by increasing the BVH subdivision depth, but I feel that this is not the proper fix.