Back to General and Gameplay Programming

Devide a scene to increase fps.

General and Gameplay Programming Programming Unity

Started by babaliaris June 14, 2017 02:46 PM

27 comments, last by Wyrframe 6 years, 10 months ago

Scouting Ninja

4,573

June 14, 2017 10:50 PM

As you can see only six images are rendered to the screen (the other 14996 are off the screen).

Since you count these off screen ones does that mean you don't cull them?

rendering 15000 sprite objects.

15000 * 2 = 30000 triangles = half a mesh buffer worth of triangles; so you don't batch your meshes.

Learn batching, it is a process where you merge a lot of sprites into one as a way of saving performance.

If you did use batching you could get 64 000 triangles per mesh = 32 000 sprites per mesh = 32 000 * 300 (Mid range graphics card) = 96 000 00 static sprites. On a brand new graphics card you will get millions of static sprites.

You can start here: https://www.gamedev.net/resources/_/technical/opengl/opengl-batch-rendering-r3900

You could also search on the web, someone should have done it using Python.

babaliaris

136

Author

June 14, 2017 10:51 PM

No it does not, but i'm trying to make my engine be able to handle as many gameobjects per scene it can, without lagging. This is my objective.

Well, if you want to render as much sprites as possible then you could look into rendering using OpenGL or Direct3D. For fast collision you could try a Python port of Box2D or maybe somebody has made a different physics package for Python.

But I'm wondering, you are going to use this engine to create a game with right? So wouldn't you rather focus on what that game needs?

Actually i'm trying to make something like Unity 3D but for 2D games with python :p Of course not too powerfull, just a start.


void life()
{
  while (!succeed())
    try_again();

  die_happily();
}

babaliaris

136

Author

June 14, 2017 11:02 PM

As you can see only six images are rendered to the screen (the other 14996 are off the screen).

Since you count these off screen ones does that mean you don't cull them?

rendering 15000 sprite objects.

15000 * 2 = 30000 triangles = half a mesh buffer worth of triangles; so you don't batch your meshes.

Learn batching, it is a process where you merge a lot of sprites into one as a way of saving performance.

If you did use batching you could get 64 000 triangles per mesh = 32 000 sprites per mesh = 32 000 * 300 (Mid range graphics card) = 96 000 00 static sprites. On a brand new graphics card you will get millions of static sprites.

You can start here: https://www.gamedev.net/resources/_/technical/opengl/opengl-batch-rendering-r3900

You could also search on the web, someone should have done it us

So the basic idea is, instead of saying "Ok draw me 15000 rectangles", group pictures together to create one large image and draw this one on the screen? Like for example, 15000 / 50 = 300, so create 300 "big" images and draw these instead.

Also the method blit of pygame is not using graphics accelarator hardware.

But still, i don't think that this is the problem, because in this example ONLY six images are being rendered on the screen the rest 14996 are just skipped because of the if statement:


def render(self, screen):

    if self.onScreen(screen) and self.image != None:
        screen.blit( self.image, self.pos )


void life()
{
  while (!succeed())
    try_again();

  die_happily();
}

babaliaris

136

Author

June 15, 2017 12:00 AM

OK! So i asked pygame developpers about the blitting method and they answered me:

(1) PyGame is essentially CPU-bound rasterization. The fastest way to do this is with the Sprite module, or so I'm told.

(2) Using PyOpenGL instead of PyGame (for blitting, anyway), you instead use the GPU, which is deigned for doing just this. You have to know what you're doing, but it will probably be faster.

In your case, most of your objects are off the screen, so you're probably CPU-bound anyway. If you really need more performance, and you really need to process every object every frame, you'll need to switch to a language with less overhead, like C++, or jitted like PyPy.


void life()
{
  while (!succeed())
    try_again();

  die_happily();
}

Scouting Ninja

4,573

June 15, 2017 01:36 AM

So the basic idea is, instead of saying "Ok draw me 15000 rectangles", group pictures together to create one large image and draw this one on the screen? Like for example, 15000 / 50 = 300, so create 300 "big" images and draw these instead.

No that is atlasing and is part of how batching works.

In short every sprite object has a once off cost, known as a a draw call, that cost the same no matter how many polygons the sprite is made from.

So you can merge a lot of sprites into one, keep the same image and only have one draw call; this is how Unity get's it's performance.

Here I used your image to give a better idea of how it works.

The idea is that you merge all sprites into one batch, this allows them all to sneak in, paying only one draw call for all of them to get in.

To understand you should know that each sprite is drawn on a quad. Each quad is made of two triangles.

If you make a object for each quad, then each quad will have it's own draw call. If you make only one object for a lot of quads, they share a draw call.

Using PyOpenGL instead of PyGame

I advice this also, I don't know if it is possible to batch using PyGame. Making your own engine using PyGame, would be near impossible to match Unity.

You could still use Python, however if speed is your focus then you will need to change to a other language.

__Toz__

320

June 15, 2017 06:34 AM

Also the method blit of pygame is not using graphics accelarator hardware.

Yup, which means it'll never be as fast as using a sprite renderer that uses OpenGL or Direct3D.

But still, i don't think that this is the problem, because in this example ONLY six images are being rendered on the screen the rest 14996 are just skipped because of the if statement:

Why don't you profile your code using the script I provided? Then you'll know for sure if it's the sprite blitting that's slow or the onScreen check or something else.

babaliaris

136

Author

June 15, 2017 08:55 AM

Also the method blit of pygame is not using graphics accelarator hardware.

Yup, which means it'll never be as fast as using a sprite renderer that uses OpenGL or Direct3D.

But still, i don't think that this is the problem, because in this example ONLY six images are being rendered on the screen the rest 14996 are just skipped because of the if statement:

Why don't you profile your code using the script I provided? Then you'll know for sure if it's the sprite blitting that's slow or the onScreen check or something else.

I tried but it shows me so many other methods that python runs and its a mess. Isn't there any way to tell the profiler "ok, show me only my methods"?


void life()
{
  while (!succeed())
    try_again();

  die_happily();
}

__Toz__

320

June 15, 2017 06:28 PM

I tried but it shows me so many other methods that python runs and its a mess. Isn't there any way to tell the profiler "ok, show me only my methods"?

You could modify the script to take the lines of the profile results and delete every line that doesn't have the path of your project in it. Although the profile results being a mess shouldn't be a problem, you only need to look at the top results to figure out what is slowest.

The column of interest is the 'tottime' column which shows the amount of time spent in a function but not its sub-functions. So that way the main() function of the program doesn't show on top, since we already know that everything else is indirectly called from it.

So what functions does it show on top? Is it something like {pygame.sprite.draw}?

SeanMiddleditch

17,596

June 15, 2017 08:07 PM

Someone else mentioned the word "culling" which is a key part of the equation here. This is related to those for loops.

If 15,000 objects aren't on your screen, there's no need to loop over them when drawing. You already have a quad tree for physics so you've got the basics of the solution in place already: use a scene partitioning system to efficiently find the subset of objects that are actually visible. If 6 sprites are visible and 15,000 are not, your rendering code should only even try to draw those 6 sprites and entirely ignore the other 15,000. A good scene graph (or even just a quad-tree) helps here a massive amount. You can hypothetically reduce your O(N) rendering to O(log(N)) or less which is a huge win.

One of your for loops then basically goes away. You never need to iterate over the static objects. Never. Both graphics and physics should be using an efficient graph of some kind to find relevant objects. (Note that physics and graphics might use _different_ graphs, because what makes the most sense for physics isn't necessarily true for graphics.)

You can then also heavily reduce the load on your dynamic objects for loop. After all, collision detection is already using an efficient data structure and you'll be changing your graphics to also use a tree of some kind, so the only work for dynamic objects that remains is to apply velocity. Luckily, that work only needs to be done on objects that _have_ a non-zero velocity. You can thus split your dynamic objects into two sets: active and inactive objects. The active objects are the ones that are currently moving.

After that split, you might notice that there's no longer any practical difference between static and dynamic objects, so you can perhaps remove that entire concept from the code. Objects with velocity (or some other forces) are put into the active set and are removed from the active set when they stop moving.

The physics code need only loop over the active objects to apply velocity and forces. The collision system uses an efficient graph. Rendering uses an efficient graph. Unless you (foolishly) make all 15,000 objects active at the same time, you never need to loop over all your objects even once, much less twice.

The next big step then is the batching improvements others have recommended.

The third big step I'd recommend is to use an accelerator like PyPy or whatever's in vogue today. Remember that even the simplest operation like adding two small integers should just be a single ADD instruction on the CPU, but in plain ol' CPython that same operation can take hundreds of CPU instructions, branches, memory accesses, and so on. Python does not translate to CPU code very well and requires a sophisticated JIT engine to do even marginally well in terms of performance, and Python does not include such a JIT engine out of the box like JavaScript engines do, so you have to use an add-on/replacement like PyPy.

Sean Middleditch – Game Systems Engineer – Join my team!

babaliaris

136

Author

June 15, 2017 08:15 PM

The time is in seconds?


void life()
{
  while (!succeed())
    try_again();

  die_happily();
}

Devide a scene to increase fps.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Devide a scene to increase fps.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines