# Renderer too slow

This topic is 2104 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have a renderer which, with no game logic, no shaders, just blitting only lets me draw 1200 sprites per frame and retain 60 FPS, is this too slow?

How can I improve it?

The main idea stems from a previous thread about renderers and decoupling the rendering logic from the game logic. To that end, I created a renderInfo struct to contain the important sprite drawing information.

Each game loop, every object that wants to get drawn adds a renderInfo (or set of renderInfos if it is composed of multiple sprites) to the renderList.

As renderInfos are added they are sorted first by layer, then by yposition... (originally i also sorted by spriteSheet to draw all sprites from a particular spritesheet at the same time, but that could endup with a sprite from a lower y pos being drawn before one above b/c it shares a spritesheet with something even higher, although i suppose I could sort by spriteSheet after sorting by y, but I don't know how useful that would be)

The weird thing is that, if I scale down the sprite (original or at runtime), or if i change the size of the display... i still get the exact same performance.

Out of curiosity, approximatly how many sprites are generally drawn onto a screen? I realize that simpler games may have less, but what about say, a bullet hell game?

#pragma once
#include "allegro5\allegro.h"
#include "allegro5\allegro_primitives.h"
#include "boost\property_map\vector_property_map.hpp"
#include "boost\foreach.hpp"
#include <vector>
#include <map>
#include <cstdlib>
#include "AssocVector.h"

struct BOX
{
float X1;
float Y1;
float X2(){return X1+W;}
float Y2(){return Y1+H;}
float W;
float H;
};

struct renderInfo
{
ALLEGRO_BITMAP* SpriteSheet;
BOX SpriteBox;
BOX DrawBox;
ALLEGRO_COLOR Tint;
float Rotation;
int Layer;
};

class Renderer
{
private:
std::vector<renderInfo> renderList;

renderInfo ri;
ALLEGRO_COLOR bgClearClr;
public:
ALLEGRO_DISPLAY* display;
Renderer::Renderer(void){
display=al_get_current_display();
bgClearClr=al_map_rgb(0,0,0);
}

Renderer::~Renderer(void){}

void Renderer::Add(ALLEGRO_BITMAP* spriteSheet, BOX spriteBox, BOX drawBox, float rot, ALLEGRO_COLOR tint, int Layer)
{
renderInfo r;
r.SpriteSheet=spriteSheet;
r.SpriteBox=spriteBox;
r.DrawBox=drawBox;
r.Rotation=rot;
r.Tint=tint;
r.Layer=Layer;
binaryInsert(r,0,renderList.size());
}

bool renderInfoCompare(renderInfo ri1,renderInfo ri2)
{
if (ri1.Layer!=ri2.Layer) return ri1.Layer<ri2.Layer;
//if((int)(ri1.SpriteSheet)!=(int)(ri2.SpriteSheet)) return (int)(ri1.SpriteSheet)<(int)(ri2.SpriteSheet);
if(ri1.DrawBox.Y1!=ri2.DrawBox.Y1) return ri1.DrawBox.Y1<ri2.DrawBox.Y1;
return false;
}

void binaryInsert(renderInfo ri,int low, int hi)
{
if (low==hi)
{
renderList.insert(renderList.begin()+low,ri);
return;
}
int indx=floor((low+hi)/2.0);
if (renderInfoCompare(ri,renderList[indx]))
return binaryInsert(ri,low,indx);
return binaryInsert(ri,indx+1,hi);
}

void Render()
{
al_clear_to_color(bgClearClr);
int dCount=0;
al_hold_bitmap_drawing(true);
for(std::vector<renderInfo>::iterator ri=renderList.begin();ri!=renderList.end();ri++)
{
dCount++;
renderInfo r= *ri;
al_draw_tinted_scaled_rotated_bitmap_region(r.SpriteSheet,
r.SpriteBox.X1,r.SpriteBox.Y1,r.SpriteBox.W,r.SpriteBox.H,
r.Tint,
r.SpriteBox.X1+r.SpriteBox.W/2.0,r.SpriteBox.Y1+r.SpriteBox.H/2.0,
r.DrawBox.X1,r.DrawBox.Y1,
r.DrawBox.W/r.SpriteBox.W, r.DrawBox.H/r.SpriteBox.H,
r.Rotation,0);
}
al_hold_bitmap_drawing(false);
renderList.clear();
printf("Draws: %i",dCount);
/*
for(int j=1;j<al_get_display_height(display);j+=10)
{
al_draw_line(0,j,al_get_display_width(display),j,al_map_rgb(255,0,0),1);
}
for(int i=1;i<al_get_display_width(display);i+=10)
{
al_draw_line(i,0,i,al_get_display_height(display),al_map_rgb(255,0,0),1);
}
*/
al_flip_display();
}

};



##### Share on other sites

Dunno, have you profiled it?

I see you are passing renderInfo by value a lot though you should use a reference. And you copy renderInfo in your render function for each sprite why don't you just use ri->stuff instead of making a copy?

But a profiler is going to give you more useful information than me.

EDIT: And you are testing in release mode with optimisations enabled I hope...

##### Share on other sites

Have you checked whether your card maintains vertical synchronization with your displays refresh rate? (That could, and likely would explain your 60fps)

Edited by SuperVGA

##### Share on other sites

I'm not too sure on this, but isn't allegro using software rendering? (Like GDI or something)

To use hardware accellerated rendering, you should use AllegroGL.

##### Share on other sites

The first thing to do is to profile your code. Humans are notoriously bad at guessing where their performance issues are in code. Don't guess, let hard data be your guide!

Build the executable in optimized mode, and make sure you have debugging info still (in visual studio these are both under project->properties). I use this to profile. It's very easy to use and gives you pretty clear call graphs:

http://www.codersnotes.com/sleepy

Run your program, then select it for profiling in Very Sleepy. Wait a few minutes while Very Sleepy samples what you program is doing, and then you can look at the results. It will tell you exactly where in your program you are spending CPU time, and how much. This should allow you to pinpoint the functions that are causing your performance loss. If you need help beyond that, post the call graph sorted by total time and I will help.

Just browsing your code, though, one thing that you absolutely should change is doing a binary sort when you insert a new request. This is extremely expensive, as you have to move (which is a memory copy) on average half of the existing vector elements everytime you insert in the middle. It's generally much, much, MUCH faster to just keep appending the requests to the end of the list, and then do a single sort with std::sort just before you render them.

To answer your other question, on modern hardware sprites are really just flat 3D objects. You can render *millions* of them if you can feed the card efficiently. As previously noted, though, allegro is software rendered, so your CPU will have to pick up that slack and your throughput will be much lower. You should still be able to do more that 1200 unless the sprites are absolutely massive.

##### Share on other sites

I made some changes, including appending all renderInfos and sorting the list afterwards

I also recompiled in Release mode with optimizations (I am fairly certain i did it correctly)

I looked into allegro and I am now attempting to not use VSync

Even so, the performance is still identical...

I ran sleepy on the app... but I am not sure how to use the data it collected...

Attached profiler information for one minute (From point of rendering so some setup is not included)

Attached is captureCSV.txt, which is the csv file from export to CSV

and also captureSleepy.txt which is the .sleepy file from save as

I had to rename the ext to be able to attach them.

Edited by Paragon123

##### Share on other sites

*EDIT*

Allegro 5 is the latest major revision of the library, designed to take advantage of modern hardware (e.g. hardware acceleration using 3D cards) and operating systems. Although it is not backwards compatible with earlier versions, it still occupies the same niche and retains a familiar style.

So it seems to be saying that it uses hardware acceleration... although they go on to say:

Allegro only supports 2D graphics primitives natively, but it is perfectly reasonable to use Allegro alongside a 3D API(e.g. OpenGL, Direct3D, and higher level libraries), while Allegro handles the other tasks. Allegro is also designed to be modular; e.g. if you prefer, you can substitute another audio library.

But that doesn't sound to me like it does 2D with software rendering, rather that it was designed for 2D use rather than 3D...

##### Share on other sites

To read the call graph, open your .sleepy file in very sleepy and sort by %inclusive or %exclusive. %exclusive is the total amount of running time each function uses, not counting other functions it calls. %inclusive is the amount of time spent in the function and all child calls. I like to sort by %inclusive, normally, as scanning down the list will give you a pretty good top down idea of where you are slow.

WaitForSingleObject is a thread block, so you appear to be limited by something in the video driver.

The first suspicious entry you can do something about is al_d3d_create_bitmap, at 17%. How are you creating your bitmaps? You aren't recreating it every frame, are you? That alone would probably cause your perf issues. The other thing to check is if you are using ALLEGRO_MEMORY_BITMAP. I think that will put the images in video ram so they can be accelerated. Also, how big are the images, and what kind of video card do you have?

##### Share on other sites

Thanks everyone,

Turns out In Allegro, an ALLEGRO_MEMORY_BITMAP is rendered using software rendering, an ALLEGRO_VIDEO_BITMAP uses hardware accelleration. I had been using a Memory bitmap, fixing that gave me a decent boost in speed. Also, using std::sort rather then sorting on insertion also gave a boost in speed. But it turns out that the majority of time was being spent adding elements to the vector. As it turns out, using vector.push_back along with vector.reserve, is much faster than vector.insert(vector.end()) so that also gave a pretty good boost in speed. Compiling in release mode with optimization also gave a decent boost in speed.

Now, using OpenGL It is able to add, sort, render and remove up to 8000 sprites per frame while retaining 60 Frames per second

and using D3D it can do 9000

There still seems to be an extreamly large amount of time being spent on al_d3d_create_bitmap though, and I don't know what can be causing that... I initialize all my bitmaps and displays etc, before the main loop is even entered so I don't even know where that would be being called from...

Edited by Paragon123

##### Share on other sites

Can you get the callstack from the profiler when al_d3d_create_bitmap is called? Assuming you aren't calling it yourself. If you have the source and symbols for the library you could put a breakpoint there and see what the callstack is as well.

Are you sure you are not inadvertently copying the bitmap? You seem to call a lot of functions passing parameters by value and by copying and passing the address of a copy in your original code.

• 12
• 18
• 29
• 11
• 24