Search the Community

Showing results for tags 'Optimization'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Categories

  • Audio
    • Music and Sound FX
  • Business
    • Business and Law
    • Career Development
    • Production and Management
  • Game Design
    • Game Design and Theory
    • Writing for Games
    • UX for Games
  • Industry
    • Interviews
    • Event Coverage
  • Programming
    • Artificial Intelligence
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Engines and Middleware
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
  • Archive

Categories

  • News

Categories

  • Audio
  • Visual Arts
  • Programming
  • Writing

Categories

  • Audio Jobs
  • Business Jobs
  • Game Design Jobs
  • Programming Jobs
  • Visual Arts Jobs

Categories

  • GameDev Unboxed

Forums

  • Audio
    • Music and Sound FX
  • Business
    • Games Career Development
    • Production and Management
    • Games Business and Law
  • Game Design
    • Game Design and Theory
    • Writing for Games
  • Programming
    • Artificial Intelligence
    • Engines and Middleware
    • General and Gameplay Programming
    • Graphics and GPU Programming
    • Math and Physics
    • Networking and Multiplayer
  • Visual Arts
    • 2D and 3D Art
    • Critique and Feedback
  • Topical
    • Virtual and Augmented Reality
    • News
  • Community
    • GameDev Challenges
    • For Beginners
    • GDNet+ Member Forum
    • GDNet Lounge
    • GDNet Comments, Suggestions, and Ideas
    • Coding Horrors
    • Your Announcements
    • Hobby Project Classifieds
    • Indie Showcase
    • Article Writing
  • Affiliates
    • NeHe Productions
    • AngelCode
  • Workshops
    • C# Workshop
    • CPP Workshop
    • Freehand Drawing Workshop
    • Hands-On Interactive Game Development
    • SICP Workshop
    • XNA 4.0 Workshop
  • Archive
    • Topical
    • Affiliates
    • Contests
    • Technical

Calendars

  • Community Calendar
  • Games Industry Events
  • Game Jams

Blogs

There are no results to display.

There are no results to display.

Marker Groups

  • Members

Group


About Me


Website


Industry Role


Twitter


Github


Twitch


Steam

Found 32 results

  1. Hello all, My question is a bit hard to describe but hopefully it will do... I just wonder what you guys think is the 'best' way of getting info about the model in your view(s).. To clearify (i hope ;-) ) If the model is updating itself every game-cycle and the (deep) nested objects all do there jobs how do you get info where only the view is interested in? So my question is not how to do it but more what do you people think is the best way to do it ? Regards, Alex
  2. How I halved apk size

    Originally posted on Medium You coded your game so hard for several months (or even years), your artist made a lot of high-quality assets, and the game is finally ready to be launched. Congratulation! You did a great job. Now take a look at the apk size and be prepared to be scared. What is the size — 60, 70 or even 80 megabytes? As it might be sounds strange to hear (in the era of 128GB smartphones) but I have some bad news — the size it too big. That’s exactly what happened to me after I’ve finished the game Totem Spirits. In this article I want to share several advises about how to reduce the size of a release apk file and yet not lose the quality. Please, note, that for development I used quite popular game development engine Libgdx, but tips below should be applicable for other frameworks as well. Moreover, my case is about rather simple 2D game with a lot of sprites (i.e. images), so it might be not that useful for large 3D products. To keep you motivated to read this article further I want to share the final result: I managed to halve the apk size — from 64MB to 32.36MB. Memory management The very first thing that needs to be done properly is a memory management. You should always have only necessary objects loaded into the memory and release resources once they are not in use. This topic requires a lot of details, so I’d rather cover it in a separate article. Next, I want to analyze the size of current apk file. As for my game I have four different types of game resources: 1. Intro — the resources for intro screen. Intro background Loaded before the game starts, disposed immediately after the loading is done. (~0.5MB) 2. In menu resources — used in menu only (location backgrounds, buttons, etc). Loaded during the intro stage and when a player exits a game level. Disposed during “in game resources” loading. (~7.5MB images + ~5.4MB music) 3. In game resources — used on game levels only (objects, game backgrounds, etc.). Loaded during a game level loading, disposed when a player exits the game level. Note, that those resources are not disposed when a player navigates between levels (~4.5MB images + ~10MB music) 4. Common — used in all three above. Loaded during the intro stage, disposed only once the game is closed. This one also includes fonts. (~1.5MB). The summed size of all resources is ~30MB, so we can conclude that the size of apk is basically the size of all its assets. The code base is only ~3MB. That’s why I want to focus on the assets in the first place (still, the code will be discussed too). Images optimization The first thing to do is to make the size of images smaller while not harming the quality. Fortunately, there are plenty services that offer exactly this. I used this one. This resulted in 18MB reduction already! Compare the two images below: Not optimized Optimized the sizes are 312KB and 76KB respectively, so the optimized image is 4 times smaller! But a human eye can’t notice the difference. Images combination You should combine the same images programmatically rather than having almost the same images (especially if they are quite big). Consider the following example: Before After God of Fire God of Water Rather than having four full-size images with different Gods but same background I have only one big background image and four smaller images of Gods that are then combined programmatically into one image. Although, the reduction is not so big (~2MB) for some cases it can make a difference. Images format I consider this as my biggest mistake so far. I had several images without transparency saved in PNG format. The JPG version of those images is 6 times more lightweight! Once I transformed all images without transparency into JPG the apk size became 5MB smaller. Music optimization At first the music quality was 256 kbps. Then I reduced it to 128 kbps and saved 5MB more. Still think that tracks can be compressed even more. Please, share in comments if you ever used 64 kbps in your games. Texture Packs This item might be a bit Libgdx-specific, although I think similar functionality should exist in other engines as well. Texture pack is a way to organize a bunch of images into one big pack. Then, in code you treat each pack as one unit, so it’s quite handy for memory management. But you should combine images wisely. As for my game, at first I had resources packed quite badly. Then, I separated all transparent and non-transparent images and gained about 5MB more. Dependencies and Optimal code base Now let’s see the other side of development process — coding. I will not dive into too many details about the code-writing here (since it deserves separate article as well). But still want to share some general rules that I believe could be applied to any project. The most important thing is to reduce the quantity of 3d party dependencies in the project. Do you really need to add Apache Commons if you use only one method from StringUtils? Or gson if you just don’t like the built-in json functionality? Well, you do not. I used Libgdx as a game development engine and quite happy with it. Quite sure that for the next game I’ll use this engine again. Oh, do I need to say that you should have the code to be written the most optimal way? :) Well, I mentioned it. Although, the most of the tips I’ve shared here can be applied at the late development stage, some of them (especially, optimization of memory management) should be designed right from the very beginning of a project. Stay tuned for more programming articles!
  3. Sorry for making a new thread about this, but I have a specific question which I couldn't find an answer to in any of the other threads I've looked at. I've been trying to get the method shown here to work several days now and I've run out of things to try. I've more or less resorted to using the barebones example shown there (with some very minor modifications as it wouldn't run otherwise), but I still can't get it to work. Either I have misunderstood something completely, or there's a mistake somewhere. My shader code looks like this: Vertex shader: #version 330 core //Vertex shader //Half the size of the near plane {tan(fovy/2.0) * aspect, tan(fovy/2.0) } uniform vec2 halfSizeNearPlane; layout (location = 0) in vec3 clipPos; //UV for the depth buffer/screen access. //(0,0) in bottom left corner (1, 1) in top right corner layout (location = 1) in vec2 texCoord; out vec3 eyeDirection; out vec2 uv; void main() { uv = texCoord; eyeDirection = vec3((2.0 * halfSizeNearPlane * texCoord) - halfSizeNearPlane , -1.0); gl_Position = vec4(clipPos.xy, 0, 1); } Fragment shader: #version 330 core //Fragment shader layout (location = 0) out vec3 fragColor; in vec3 eyeDirection; in vec2 uv; uniform mat4 persMatrix; uniform vec2 depthrange; uniform sampler2D depth; vec4 CalcEyeFromWindow(in float windowZ, in vec3 eyeDirection, in vec2 depthrange) { float ndcZ = (2.0 * windowZ - depthrange.x - depthrange.y) / (depthrange.y - depthrange.x); float eyeZ = persMatrix[3][2] / ((persMatrix[2][3] * ndcZ) - persMatrix[2][2]); return vec4(eyeDirection * eyeZ, 1); } void main() { vec4 eyeSpace = CalcEyeFromWindow(texture(depth, uv).x, eyeDirection, depthrange); fragColor = eyeSpace.rbg; } Where my camera settings are: float fov = glm::radians(60.0f); float aspect = 800.0f / 600.0f; And my uniforms equal: uniform mat4 persMatrix = glm::perspective(fov, aspect, 0.1f, 100.0f) uniform vec2 halfSizeNearPlane = glm::vec2(glm::tan(fov/2.0) * aspect, glm::tan(fov/2.0)) uniform vec2 depthrange = glm::vec2(0.0f, 1.0f) uniform sampler2D depth is a GL_DEPTH24_STENCIL8 texture which has depth values from an earlier pass (if I linearize it and set fragColor = vec3(linearizedZ), it shows up like it should, so nothing seems wrong there). I can confirm that it's wrong because it doesn't give me similar results to what saving position in the G-buffer or reconstructing using inverse matrices does. Is there something obvious I'm missing? To me the logic seems sound, and from the description on the Khronos wiki I can't see where I go wrong. Thanks!
  4. Hello everyone! Right now I am writing my own physics engine in java for LWJGL3 3D game and I would like to consult my ideas with you guys. It's not about writing the actual code, but asking if my solution is good, and/or can it be better. And I would like to make it easy to refactor to much others render engine and "game-loop engine". So lets get started! The base game architecture looks like this: The Core holds just the information about the game itself, so whenever I decided to write some new game I would just have to edit this module. The render engine holds just the information about rendering the models, however it only gets the material and mesh data from the model. The Model module holds 4 basic information about model: Models - basic Model that holds only information about ModelView, position, rotation and scale. Other types of models inherits it and add unique params (AnimatedModel adds Animation mesh data). ModelView is build of ModelPart which are build from TexturedMeshes (will be explained later). Loaders - classes to load specific model type (i.e. Assimp loader for *.obj files) and process classes - to create necessary data to render model (ie. create Mesh which holds vboID, vertices/textures/normals arrays etc). Components - every model can have some component, ie. moveable - which allows to move the object arround the world. Materials - used together with Mesh to create TexturedMesh. Material holds information about diffuse, ambient, etc colors, diffuse, normal textures. PhysicsEngine module has the core (initiation of physics world), collision detection, CollidableComponent (inherit from BaseComponent) and Shapes (i.e AABB, Sphere, Cylinder, MeshCollider). This is the part I would like to discuss with you guys (however if you have something to say about other parts - please go for it!). Core: PhysicState - initiation of physics world, update methods, holds default data (i.e. Default narrow collision shape) Collision: Broad Phase Collision Detection (BPCD) and Narrow Phase Collision Detection (NPCD) CollidableComponent - component that can be added to model to make it collidable (in future I was planning to add other components such as: WindComponent for grass model - adds reaction to wind). Only models with CollidableComponent are checked in BPCD and NPCD, the rest are ignored. CollidableComponent has also a boolean isMoveable - i.e. Rock - it is collidable, but its never, ever gonna move. so it doesn't have to be checked with other non-moveable components at BPCD and NPCD. Shapes - basic shapes and info about them (AABB - points min/max, Sphere - center, radius, etc.) More info are shown below on diagram: Right now it works like this: I create a model and add a CollidableComponent to it like this: public CollidableComponent(Model model, TypeOfShape typeOfShape, boolean isMoveable) TypeOfShape declares the basic Broad Phase Collision Shape (AABB, Sphere, Cylinder, Mesh). The Shape is created from the raw data of the model and transformed to actual data (position, rotation*, scale).If I want to I can add the Narrow Phase Collision Shape MAP - which declares the CollisionShape for each Model Part inside the ModelView. In most cases for me it's going to be MeshCollider (since the game I'm planning to create is in Low Poly Style). IDEA 1: When the CollidableComponent is created it is automatically added to BPCD map to check its collision. Of course it's just temporary, later on I would have to set limit to the map size (i.e. to 500) or split the world to smaller parts and add just the entities which are in this world's part to BPCD. So this is the part where you guys could give me some advice IDEA 2: Collision Detection update: Right now the update works like this: public void update() { narrowPhase.narrowPhaseCollisionMap.clear(); if (!broadPhaseCollisionMap.isEmpty()) { for (Model model : broadPhaseCollisionMap.keySet()) { if ((model.getComponent(CollidableComponent.class)).isMoveable()) { for (Model model2 : broadPhaseCollisionMap.keySet()) { if (!model.equals(model2)) { CollisionShape cs1 = getCollisionShape(model); CollisionShape cs2 = getCollisionShape(model2); if (checkCollision(cs1, cs2)) { narrowPhase.narrowPhaseCollisionMap.add(model); narrowPhase.narrowPhaseCollisionMap.add(model2); } } } } } } if (!narrowPhase.narrowPhaseCollisionMap.isEmpty()) { narrowPhase.update(); } } so: 1. It checks if the BPC Map is not empty, and if its not it proceed, else nothing happens. 2. It loops through all the models inside the map and check if it's isMoveable - as I said, I ignore collision detection with objects that doesn't move 3. 2nd loop throught models and check the model from 1st loop isn't the model from the 2nd loop. If they are - lets ignore it. 4. If they are 2 different models it retrieve the BPC shapes from the models and if it is the moveable model it updates its CollisionShape data (by the current the position, rotation,* scale*) 5. Check the intersection between these 2 shapes, and if it true it's added to NPC List 6. After the BPCD loops if the NPC List is not empty it runs its update The NPCD update is pretty similar to BPCD with just 2 exceptions: 1. It used the List<Models> instead of Map<Model,CollidableComponents> (from models I can retrieve the info about the CollidableComponent, so I might use the List in BPCD aswell instand of Map **) 2. It checks the collision intersection same as for BPCD but for each ModelPart of Model_1 with each ModelPart of Model_2, returns true and/or excact collision point, etc, and breaks this model-model loop check (so it doesn't check if other parts of the models collide with each other). With my calculations for 50 objects - 27 is static and 23are movable with random position (but some collides): the NP Took: 0.0ms for: 1224 collision checks and: 24 positive collisions for BPCD Took: 10. ms for: 55776 collision checks and: 576 positive collisions for NPCD Took: 11.0ms in total for BPCD and NPCD I can see a huge space to improve the Collision Detection update methods, but I can't find them, so I hope you guys can help me out Maybe not all models has to be checked in NPCD, i.e. check how far from camera they are, and after some point just ignore NP since it won't be anyhow visible? Well, that's all! Sorry for a bit long post, but I hope you at least enjoyed reading it *Actually just forgot about adding it to calculation **Just came to my head when I was writing this topic
  5. Hey everyone! Currently I am making my engine and I got one thing I am worried about. I am using text data format to store my assets (e.g. JSON objects or kinda "human-readable" formats), it requires to register every field I want to serialize and read it manually from file data: void Resource::Save(IFile* file) { file->Serialize("myField", myFieldValue); } void Resource::Load(IFile* file) { file->Deserialize("myField", &myFieldValue) .. and so on, manually, nothing else! } But I can't breathe calmly since I saw UE4 serialization/asset storage system, it uses rtti, it's MUCH easier to serialize objects and now I am unsure which method I should use: should I give all responsibility to rtti system(with lots of code hidden) or load everything I need manually just like first code example? I know I can code rtti that way so it will output "human-readable" files, but is it good thing to use?
  6. Hey devs! Want to get rid of all the bugs in your game? I co-own a video game QA company called Level_0 Studios, we perform professional QA testing for game developers. Our goal is to help you create bug free games your players will enjoy. Partnering with Level_0 allows you to focus more time towards game development while we find those pesky bugs. If you’re interested and in need of professional game testers contact us at contact@level0studios.com and check out our website at https://level0studios.com for more information.
  7. Hi all, More than a decade ago, a problem came up on this forum for computing a fast transpose of a 3x3 matrix using SSE. The most sensible implementation stores the matrix internally as a 3x4 matrix (so, one row stores 4 elements, aligned in a vector). A version, which I believe to be the fastest currently known, was presented: I am pleased to report that I have been able to come up with a version which should be faster: inline void transpose(__m128& A, __m128& B, __m128& C) { //Input rows in __m128& A, B, and C. Output in same. __m128 T0 = _mm_unpacklo_ps(A,B); __m128 T1 = _mm_unpackhi_ps(A,B); A = _mm_movelh_ps(T0,C); B = _mm_shuffle_ps( T0,C, _MM_SHUFFLE(3,1,3,2) ); C = _mm_shuffle_ps( T1,C, _MM_SHUFFLE(3,2,1,0) ); } This should be 5 instructions instead of ajas95's 8 instructions. Of course, to get that level of performance with either version, you need to inline everything, or else you spend tons of time on moving floating point arguments to/from input registers. The other thing that is crucial is that the instruction set be VEX encoded. This allows generating instructions that take three arguments, like `vunpcklps`, instead of instructions like `unpcklps` that take only two. VEX is only available in AVX and higher (usually passing e.g. `-mavx` is sufficient to get the compiler to generate VEX instructions). -G
  8. After a feeled million hours of coding in the past 16 years there have been many ways to write code in many different languages. Some seemed correct to the time they were used, some seemed to be too strict or too chaotic and I also evolved my coding style with each new line written. Now considering the results of over 5 years in professionall game development, tools and engine code as hobbyist and on small and large commercial projects up to AAA titles, there are still many ways one could write code in different languages but also in the same language on different projects in one and the same but also different companies. I mostly agree with; see some trends in C#, C++ coding guidelines that are fully worth to go for but the major difference is on the naming conventions. Because I have currently to write my own coding guidelines (not for a special project but primary as a personal convention to refer to when coding) and seek for a way I'm happy with, I did some research on different guidelines and came up with following references: When Epic Games write about Unreal This seems a bit confusing when seeking for some type like Animation or Skin (that are both different prefixed with A and F) but prevents various naming conflicts to types and variables when writing a function that accepts FSkin Skin as parameter for example. Googles c++ guidelines point into a completely different direction when they write So they heavily make use of typos, underscores and also lower case prefixes to identify different kinds of member, static, nonstatic and function names and in the same breath except there rules for various special cases. Some other examples from different projexts I was invovled to also use and do not use prefixing types or use underscores class Class { const <type> cConstant; const <type> Constant; const <type> __Constant; <type> _myClassMember; <type> _MyClassMember; <type> myClassmember; <type> mMyClassMember; <type> function(<type> parameter); <type> Function(<type> parameter); <type> Function(<type> aParameter); } class NAClass //avoid using namespaces, instead prefix anything with a 2 letter namespace like identifier { ... } Dont need to mention that Visual Studio will raise a Warning/Exception that a type is named as same as a function parameter when using a class class Container { private int size; //current size public Resize(int size) //will cause compiler telling that type matches a member type { //do resize here } } So in the end anyone does he or she thinks that it is worth to be done and so me do too. I would like to hear your opinions to why and what codings style do you prefer or are involved to in whatever way. What do you think makes a good standard especially for the most common point, Naming Convetions? Will be corious to read your opinions
  9. Hi, I am releasing my 2d game on Steam, so sent it to review and they said it has some black and white rectangles covering parts of the screen. I don't have this issue on my old pc of course (amd 6670 vga), and I tested on a laptop too (amd gpu), without any problem. The game uses Direct3d11 with c++, nothing fancy, 2 layer of tiles, and some sprites for decorations and some postprocess effects. I have no idea what to do. Released it a while ago on Itch.io, had some 20+ downloads, nobody said anything about not working - or anything at all, sadly. So anyone does have any tips, how to figure out a graphical bug that is not reproductable on your end, and you doesn't even have a screenshot?
  10. I've been starting to optimize my code in anyway possible. I saw that some cpu's actually have 256-bit SIMD, but I was wondering if there is a way to detect this and fallback to the 128-bit on an unsupported cpu, or how else to deal with this.
  11. Hi guys, I'm writing my math library and implemented some matrix inverse function I would like to share. The SIMD version I got is more than twice as fast as non-SIMD version (which is what Unreal is using). It is also faster than some other math libraries like Eigen or DirectX Math. (result from my test, the first 3 columns are my methods) If you are interested in either theory or implementation, I put together my math derivation and source code in this post: https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html I would appreciate any feedback
  12. When Apple announced a redesign for its decade old App Store just a few months ago, app makers for the platform all over the world felt expectant and apprehensive about the kind of impact it's going to have to their apps. Apps with a star-studded presence in App Store were most apprehensive, quite naturally. For a vast majority of mobile app development companies, it was rather a good one as it could make their app rating and positioning better if not worse. In all considerations, such a big update received a lot of buzzfrom the developer community. Apart from the general awe, apprehension and expectations, what does an update of such nature mean for the apps and their prospects? That is precisely what we would like to explain here. Finding apps would be easier than ever If you take a deeper look at how the apps generate revenue and get discovered, you are bound to recognise that apps that deserve most buzz for their long-term usefulness often remain undiscovered while apps that become popular for shorter span get the most limelight. Many users simply cannot know of certain useful apps just because they remain unknown and undiscovered. The latest update of the App Store will help us deal with this issue of discoverability. From now on the editorial team of the Apple will choose apps for featured list and various chosen categories as per the quality of the app. With the new focus on quality, the App Store now through a card based system of a display will showcase best apps of each category. The various featured cards that will help to showcase best apps include Sneak Peak, Apple of the Day, Major Update, Now Trending, etc. Obviously, this new system will make finding apps easier than ever before. Optimised product page One of the best things with the new update is the optimised product page which will allow offering more detailed information about the apps. Having a good preview of the app is always impressive and boosting for users to download an app. The new App Store update will have value-added previews, localization details of the app, and new text fields. The app previews in the new App Store only got better and detailed with an array of attributes. The product page also allows showcasing in-app purchases, and users can make purchases way before downloading the app. Far better search function Another impressive way new App Store can add value to the user experience and app discoverability is the new and better search function. Users can find apps and related contents about the apps more quickly with enhanced search. Search results now will consist of detailed layers of information including in-app purchases, app developers, ratings, a collection of apps from the same publisher, categories, editorial remarks and stories, tips, etc. A search function allowing users having so many information about an app right from the App Store will obviously render positive impacts on download. Editors have a lot to say and for the better If you look at the new and updated App Store, you are bound to recognise that instead of depending on so-called machine algorithms, Apple this time is bent on improving quality through its editorial team. App Store this time is all set to deliver an editorial experience to the users just for the sake of making the user experience better. Based on the quality of the apps in each category Apple introduced a card based selection system to feature quality apps across categories. From introducing users it's the regularly updated contents through selection like Meet the Developer and Behind the Scenes or What’s on My iPhone to more need focused contents through selections like Pro Tip, Life Hack, The Basics, the curated and edited contents of the new App Store will help us access apps better as per preference and needs. Ratings revamped for the better The new App Store helped apps coming with their app updates without needing to be concerned about messing with the app rating. Unlike earlier times when ratings were meant separately for each different update, app ratings are now considered with all subsequent updates together. This will help developers coming with a freshly updated app to come out clean and get a rating based on the latest update. This will obviously help developers to come with more frequent updates as this is not likely to bring down the rating of an app. The focus is on user experience and nothing else The focus of the new App Store primarily rests on user experience. Apple is head bent to help users find apps they need while allowing quality app producers more exposure to the users their apps are meant for. Apple has realised that the App Store has come of age and is a densely crowded place with a multitude of apps. To give more exposure to quality apps for specific user contexts and needs, Apple had to devise a redesign to clear the clutter with a consistent focus on quality. For mobile app developers, the new App Store unleashed a bounty of never before opportunity to reach their target audience more easily and garner more traction and downloads from the users. In the long run, the new App Store will only push the qualitative focus and make a better place for the users as well as developers of the iOS platform.
  13. In DirectX 11 we have a 24 bit integer depth + 8bit stencil format for depth-stencil resources ( DXGI_FORMAT_D24_UNORM_S8_UINT ). However, in an AMD GPU documentation for consoles I have seen they mentioned, that internally this format is implemented as a 64 bit resource with 32 bits for depth (but just truncated for 24 bits) and 32 bits for stencil (truncated to 8 bits). AMD recommends using a 32 bit floating point depth buffer instead with 8 bit stencil which is this format: DXGI_FORMAT_D32_FLOAT_S8X24_UINT. Does anyone know why this is? What is the usual way of doing this, just follow the recommendation and use a 64 bit depthstencil? Are there performance considerations or is it just recommended to not waste memory? What about Nvidia and Intel, is using a 24 bit depthbuffer relevant on their hardware? Cheers!
  14. Multipacker is an Unreal Engine 4 Plugin editor for manipulate AtlasTextures & Channels(and experimental various Masks inside the same Channel) inside Unreal Engine. Its greatlly helpfull for Mobile Projects, allowing a great save of texture memory. The plugin is intended to be simple and the same time powerfull. The Plugin is on Gumroad now: And have a CodeOffer discount 7.50: initialfeedback https://gumroad.com/l/cYyEo Will be updated ASAP adding new features. More Info: https://drive.google.com/drive/folders/0B63pISMLaAAgcHc2Y1BBcXV1c0k?usp=sharing Daily information of the progress on my Twitter https://twitter.com/turbocheke Whats done now: Version 0.2: -Get from a TextureAtlas a number of opacity Masks: -Set 1 Opacity Mask on a Channel RGB/RGBA. -Set 3 Opacity Masks on a Channel RGB/RGBA (allowing 9 opacity Masks on RGB, and 12 on RGBA). -One or more Texture Inputs -Input by Specific Channel (RGB, Red, Green, Blue, Alpha, RGBA) What will be on future releases: 0.25: -Save TextureAtlas 0.3: -Save a Texture Database for a faster icon management -Blueprint functions to manipulate the texture with the Database -Base Blueprint to generate buttons icons(press, normal); and a differrent types of procedural ussage of the icons. 0.4: -SDF from texture mask -Can save SDF on Atlas and Channels RGBA 0.5: -Hot reload Textures based on the AutoImport functionality of the Unreal Engine Editor.
  15. For example I have 6000 entities on a 2D map and I want to get about 180 which are on my player's screen. When my player is moving new entities may appear on the map, at the same time some entities or enemies die so they disappear. I used to clear the entire spatial hash and insert everything again a few times a second. But I am thinking maybe it is better to only update those entities that change on the map, the number of changes may be huge though, but still, compared to the number of entities on the map it is still small. I am just not sure if this is worthy or not.
  16. Hi I have finished my multiplayer game interpolation code. It works fine. But I am looking for ways to optimize it and what could be done to improve performance. As for now I have too many temporary variables and objects inside the interpolation function which is running at 60 times per second on client side. I've read some articles that creating those temp variables and objects too many times may slow my game. To make things simple, I create a object literal like this in the class that defines the interpolation function { timestamp:0, allStates: {}, allOtherStuff: {} } In each interpolation I will clear the object, which means I am just creating a new empty object. Then in the interpolation function I use a for loop to push the interpolated value and other unchanged properties from previous state into object.allStates and object.allOtherStuff. Although this works perfectly but the interpolation function looks bloated. Is there anything I can do to improve this?
  17. I'm a college game design major, but I have also picked up on some basic programming. I've done some stuff in Javascript, AS3, Visual Basic, Stencyl, and I'm now moving into using C# in Unity. Just as a general programming question, what are some tricks/techniques that any programmer can do, regardless of coding language, to make sure that their game runs as smoothly as it possibly can, in terms of frame rate?
  18. Hello everyone! Im using mvc framework for wrapper. So i have UIController, UIView for this problem. The object of type UIView is display object and works as container(like in ActionScript). So it holds some static buttons wich control sound,fullscreen and etc. and another dynamic controls wich control the state and current rules of the game. When the game is in different states the buttons can be active or inactive for pressing, can be with different texts over them, wich can cause different functionallity to be performed in the controller. So i have around 10-13 states of the game in wich the ui buttons are in different state so: Control state of the buttons in the controller. It is not ok cuz the controller is more than 300 rows and its ugly for reading and understanding. It shouldnt be. To create custom object in the controller and to pass to it the ui buttons and with some function called changeState to set the appropriate appearance to the buttons. How to structure and name that custom object to be enough descriptive. Or is there some well known pattern ? Or do u have better idea ?
  19. Didn't do much today mainly messed with the player hub and Optimized the game code to where I can hold so far 60 AIs at once! I will be working on the Ai's code more tomorrow and will show a video seeing how much AI's I can fit in one battlefield! Also for those of yall who are interested in keeping up-to-date more consistently I have a new twitter account follow me here: https://twitter.com/ACTNS_Ent
  20. Hi, probably a very stupid question but anyway: My lecturer told me that this "++i" is faster than "i++" when incrementing a for loop. Can anybody argue this point? And if it's true, could you possibly explain why it's faster?  Cheers
  21. Performance is everybody's responsibility, no matter what your role. When it comes to the GPU, 3D programmers have a lot of control over performance; we can optimize shaders, trade image quality for performance, use smarter rendering techniques... we have plenty of tricks up our sleeves. But there's one thing we don't have direct control over, and that's the game's art. We rely on artists to produce assets that not only look good but are also efficient to render. For artists, a little knowledge of what goes on under the hood can make a big impact on a game's framerate. If you're an artist and want to understand why things like draw calls, LODs, and mipmaps are important for performance, read on! To appreciate the impact that your art has on the game's performance, you need to know how a mesh makes its way from your modelling package onto the screen in the game. That means having an understanding of the GPU - the chip that powers your graphics card and makes real-time 3D rendering possible in the first place. Armed with that knowledge, we'll look at some common art-related performance issues, why they're a problem, and what you can do about it. Things are quickly going to get pretty technical, but if anything is unclear I'll be more than happy to answer questions in the comments section. Before we start, I should point out that I am going to deliberately simplify a lot of things for the sake of brevity and clarity. In many cases I'm To appreciate the impact that your art has on the game's performance, you need to know how a mesh makes its way from your modelling package onto the screen in the game. That means having an understanding of the GPU - the chip that powers your graphics card and makes real-time 3D rendering possible in the first place. Armed with that knowledge, we'll look at some common art-related performance issues, why they're a problem, and what you can do about it. Things are quickly going to get pretty technical, but if anything is unclear I'll be more than happy to answer questions in the comments section. Before we start, I should point out that I am going to deliberately simplify a lot of things for the sake of brevity and clarity. In many cases I'm generalizing, describing only the typical case, or just straight up leaving things out. In particular, for the sake of simplicity the idealized version of the GPU I describe below more closely matches that of the previous (DX9-era) generation. However when it comes to performance, all of the considerations below still apply to the latest PC & console hardware (although not necessarily all mobile GPUs). Once you understand everything described here, it will be much easier to get to grips with the variations and complexities you'll encounter later, if and when you start to dig deeper. Part 1: The rendering pipeline from 10,000 feet For a mesh to be displayed on the screen, it must pass through the GPU to be processed and rendered. Conceptually, this path is very simple: the mesh is loaded, vertices are grouped together as triangles, the triangles are converted into pixels, each pixel is given a colour, and that's the final image. Let's look a little closer at what happens at each stage. After you export a mesh from your DCC tool of choice (Digital Content Creation - Maya, Max, etc.), the geometry is typically loaded into the game engine in two pieces; a Vertex Buffer (VB) that contains a list of the mesh's vertices and their associated properties (position, UV coordinates, normal, color etc.), and an Index Buffer (IB) that lists which vertices in the VB are connected to form triangles. Along with these geometry buffers, the mesh will also have been assigned a material to determine what it looks like and how it behaves under different lighting conditions. To the GPU this material takes the form of custom-written shaders - programs that determine how the vertices are processed, and what colour the resulting pixels will be. When choosing the material for the mesh, you will have set various material parameters (eg. setting a base color value or picking a texture for various maps like albedo, roughness, normal etc.) - these are passed to the shader programs as inputs. The mesh and material data get processed by various stages of the GPU pipeline in order to produce pixels in the final render target (an image to which the GPU writes). That render target can then be used as a texture in subsequent shader programs and/or displayed on screen as the final image for the frame. For the purposes of this article, here are the important parts of the GPU pipeline from top to bottom: Input Assembly. The GPU reads the vertex and index buffers from memory, determines how the vertices are connected to form triangles, and feeds the rest of the pipeline. Vertex Shading. The vertex shader gets executed once for every vertex in the mesh, running on a single vertex at a time. Its main purpose is to transform the vertex, taking its position and using the current camera and viewport settings to calculate where it will end up on the screen. Rasterization. Once the vertex shader has been run on each vertex of a triangle and the GPU knows where it will appear on screen, the triangle is rasterized - converted into a collection of individual pixels. Per-vertex values - UV coordinates, vertex color, normal, etc. - are interpolated across the triangle's pixels. So if one vertex of a triangle has a black vertex color and another has white, a pixel rasterized in the middle of the two will get the interpolated vertex color grey. Pixel Shading. Each rasterized pixel is then run through the pixel shader (although technically at this stage it's not yet a pixel but 'fragment', which is why you'll see the pixel shader sometimes called a fragment shader). This gives the pixel a color by combining material properties, textures, lights, and other parameters in the programmed way to get a particular look. Since there are so many pixels (a 1080p render target has over two million) and each one needs to be shaded at least once, the pixel shader is usually where the GPU spends a lot of its time. Render Target Output. Finally the pixel is written to the render target - but not before undergoing some tests to make sure it's valid. For example in normal rendering you want closer objects to appear in front of farther objects; the depth test can reject pixels that are further away than the pixel already in the render target. But if the pixel passes all the tests (depth, alpha, stencil etc.), it gets written to the render target in memory. There's much more to it, but that's the basic flow: the vertex shader is executed on each vertex in the mesh, each 3-vertex triangle is rasterized into pixels, the pixel shader is executed on each rasterized pixel, and the resulting colors are written to a render target. Under the hood, the shader programs that represent the material are written in a shader programming language such as HLSL. These shaders run on the GPU in much the same way that regular programs run on the CPU - taking in data, running a bunch of simple instructions to change the data, and outputting the result. But while CPU programs are generalized to work on any type of data, shader programs are specifically designed to work on vertices and pixels. These programs are written to give the rendered object the look of the desired material - plastic, metal, velvet, leather, etc. To give you a concrete example, here's a simple pixel shader that does Lambertian lighting (ie. simple diffuse-only, no specular highlights) with a material color and a texture. As shaders go it's one of the most basic, but you don't need to understand it - it just helps to see what shaders can look like in general. float3 MaterialColor; Texture2D MaterialTexture; SamplerState TexSampler; float3 LightDirection; float3 LightColor; float4 MyPixelShader( float2 vUV : TEXCOORD0, float3 vNorm : NORMAL0 ) : SV_Target { float3 vertexNormal = normalize(vNorm); float3 lighting = LightColor * dot( vertexNormal, LightDirection ); float3 material = MaterialColor * MaterialTexture.Sample( TexSampler, vUV ).rgb; float3 color = material * lighting; float alpha = 1; return float4(color, alpha); } A simple pixel shader that does basic lighting. The inputs at the top like MaterialTexture and LightColor are filled in by the CPU, while vUV and vNorm are both vertex properties that were interpolated across the triangle during rasterization. And the generated shader instructions: dp3 r0.x, v1.xyzx, v1.xyzx rsq r0.x, r0.x mul r0.xyz, r0.xxxx, v1.xyzx dp3 r0.x, r0.xyzx, cb0[1].xyzx mul r0.xyz, r0.xxxx, cb0[2].xyzx sample_indexable(texture2d)(float,float,float,float) r1.xyz, v0.xyxx, t0.xyzw, s0 mul r1.xyz, r1.xyzx, cb0[0].xyzx mul o0.xyz, r0.xyzx, r1.xyzx mov o0.w, l(1.000000) ret The shader compiler takes the above program and generates these instructions which are run on the GPU; a longer program produces more instructions which means more work for the GPU to do. As an aside, you might notice how isolated the shader steps are - each shader works on a single vertex or pixel without needing to know anything about the surrounding vertices/pixels. This is intentional and allows the GPU to process huge numbers of independent vertices and pixels in parallel, which is part of what makes GPUs so fast at doing graphics work compared to CPUs. We'll return to the pipeline shortly to see where things might slow down, but first we need to back up a bit and look at how the mesh and material got to the GPU in the first place. This is also where we meet our first performance hurdle - the draw call. The CPU and Draw Calls The GPU cannot work alone; it relies on the game code running on the machine's main processor - the CPU - to tell it what to render and how. The CPU and GPU are (usually) separate chips, running independently and in parallel. To hit our target frame rate - most commonly 30 frames per second - both the CPU and GPU have to do all the work to produce a single frame within the time allowed (at 30fps that's just 33 milliseconds per frame). To achieve this, frames are often pipelined; the CPU will take the whole frame to do its work (process AI, physics, input, animation etc.) and then send instructions to the GPU at the end of the frame so it can get to work on the next frame. This gives each processor a full 33ms to do its work at the expense of introducing a frame's worth of latency (delay). This may be an issue for extremely time-sensitive twitchy games like first person shooters - the Call of Duty series for example runs at 60fps to reduce the latency between player input and rendering - but in general the extra frame is not noticeable to the player. Every 33ms the final render target is copied and displayed on the screen at VSync - the interval during which the monitor looks for a new frame to display. But if the GPU takes longer than 33ms to finish rendering the frame, it will miss this window of opportunity and the monitor won't have any new frame to display. That results in either screen tearing or stuttering and an uneven framerate that we really want to avoid. We also get the same result if the CPU takes too long - it has a knock-on effect since the GPU doesn't get commands quickly enough to do its job in the time allowed. In short, a solid framerate relies on both the CPU and GPU performing well. Here the CPU takes too long to produce rendering commands for the second frame, so the GPU starts rendering late and thus misses VSync. To display a mesh, the CPU issues a draw call which is simply a series of commands that tells the GPU what to draw and how to draw it. As the draw call goes through the GPU pipeline, it uses the various configurable settings specified in the draw call - mostly determined by the mesh's material and its parameters - to decide how the mesh is rendered. These settings, called GPU state, affect all aspects of rendering, and consist of everything the GPU needs to know in order to render an object. Most significantly for us, GPU state includes the current vertex/index buffers, the current vertex/pixel shader programs, and all the shader inputs (eg. MaterialTexture or LightColor in the above shader code example). This means that to change a piece of GPU state (for example changing a texture or switching shaders), a new draw call must be issued. This matters because these draw calls are not free for the CPU. It costs a certain amount of time to set up the desired GPU state changes and then issue the draw call. Beyond whatever work the game engine needs to do for each call, extra error checking and bookkeeping cost is introduced by the graphics driver, an intermediate layer of code written by the GPU vendor (NVIDIA, AMD etc.) that translates the draw call into low-level hardware instructions. Too many draw calls can put too much of a burden on the CPU and cause serious performance problems. Due to this overhead, we generally set an upper limit to the number of draw calls that are acceptable per frame. If this limit is exceeded during gameplay testing, steps must be taken such as reducing the number of objects, reducing draw distance, etc. Console games will typically try to keep draw calls in the 2000-3000 range (eg. on Far Cry Primal we tried to keep it below 2500 per frame). That might sound like a lot, but it also includes any special rendering techniques that might be employed - cascaded shadows for example can easily double the number of draw calls in a frame. As mentioned above, GPU state can only be changed by issuing a new draw call. This means that although you may have created a single mesh in your modelling package, if one half of the mesh uses one texture for the albedo map and the other half uses a different texture, it will be rendered as two separate draw calls. The same goes if the mesh is made up of multiple materials; different shaders need to be set, so multiple draw calls must be issued. In practice, a very common source of state change - and therefore extra draw calls - is switching texture maps. Typically the whole mesh will use the same material (and therefore the same shaders), but different parts of the mesh will use different sets of albedo/normal/roughness maps. With a scene of hundreds or even thousands of objects, using many draw calls for each object will cost a considerable amount of CPU time and so will have a noticeable impact on the framerate of the game. To avoid this, a common solution is to combine all the different texture maps used on a mesh into a single big texture, often called an atlas. The UVs of the mesh are then adjusted to look up the right part of the atlas, and the entire mesh (or even multiple meshes) can be rendered in a single draw call. Care must be taken when constructing the atlas so that adjacent textures don't bleed into each other at lower mips, but these problems are relatively minor compared to the gains that can be had in terms of performance. A texture atlas from Unreal Engine's Infiltrator demo Many engines also support instancing, also known as batching or clustering. This is the ability to use a single draw call to render multiple objects that are mostly identical in terms of shaders and state, and only differ in a restricted set of ways (typically their position and rotation in the world). The engine will usually recognize when multiple identical objects can be rendered using instancing, so it's always preferable to use the same object multiple times in a scene when possible, instead of multiple different objects that will need to be rendered with separate draw calls. Another common technique for reducing draw calls is manually merging many different objects that share the same material into a single mesh. This can be effective, but care must be taken to avoid excessive merging which can actually worsen performance by increasing the amount of work for the GPU. Before any draw call gets issued, the engine's visibility system will determine whether or not the object will even appear on screen. If not, it's very cheap to just ignore the object at this early stage and not pay for any draw call or GPU work (also known as visibility culling). This is usually done by checking if the object's bounding volume is visible from the camera's point of view, and that it is not completely blocked from view (occluded) by any other objects. However, when multiple meshes are merged into a single object, their individual bounding volumes must be combined into a single large volume that is big enough to enclose every mesh. This increases the likelihood that the visibility system will be able to see some part of the volume, and so will consider the entire collection visible. That means that it becomes a draw call, and so the vertex shader must be executed on every vertex in the object - even if very few of those vertices actually appear on the screen. This can lead to a lot of GPU time being wasted because the vertices end up not contributing anything to the final image. For these reasons, mesh merging is the most effective when it is done on groups of small objects that are close to each other, as they will probably be on-screen at the same time anyway. A frame from XCOM 2 as captured with RenderDoc. The wireframe (bottom) shows in grey all the extra geometry submitted to the GPU that is outside the view of the in-game camera. As an illustrative example take the above capture of XCOM 2, one of my favourite games of the last couple of years. The wireframe shows the entire scene as submitted to the GPU by the engine, with the black area in the middle being the geometry that's actually visible by the game camera. All the surrounding geometry in grey is not visible and will be culled after the vertex shader is executed, which is all wasted GPU time. In particular, note the highlighted red geometry which is a series of bush meshes, combined and rendered in just a few draw calls. Since the visibility system determined that at least some of the bushes are visible on the screen, they are all rendered and so must all have their vertex shader executed before determining which can be culled... which turns out to be most of them. Please note this isn't an indictment of XCOM 2 in particular, I just happened to be playing it while writing this article! Every game has this problem, and it's a constant battle to balance the CPU cost of doing more accurate visibility tests, the GPU cost of culling the invisible geometry, and the CPU cost of having more draw calls. Things are changing when it comes to the cost of draw calls however. As mentioned above, a significant reason for their expense is the overhead of the driver doing translation and error checking. This has long been the case, but the most modern graphics APIs (eg. Direct3D 12 and Vulkan) have been restructured in order to avoid most of this overhead. While this does introduce extra complexity to the game's rendering engine, it can also result in cheaper draw calls, allowing us to render many more objects than before possible. Some engines (most notably the latest version used by Assassin's Creed) have even gone in a radically different direction, using the capabilities of the latest GPUs to drive rendering and effectively doing away with draw calls altogether. The performance impact of having too many draw calls is mostly on the CPU; pretty much all other performance issues related to art assets are on the GPU. We'll now look at what a bottleneck is, where they can happen, and what we can do about them. Part 2: Common GPU bottlenecks The very first step in optimization is to identify the current bottleneck so you can take steps to reduce or eliminate it. A bottleneck refers to the section of the pipeline that is slowing everything else down. In the above case where too many draw calls are costing too much, the CPU is the bottleneck. Even if we performed other optimizations that made the GPU faster, it wouldn't matter to the framerate because the CPU is still running too slowly to produce a frame in the required amount of time. 4 draw calls going through the pipeline, each being the rendering of a full mesh containing many triangles. The stages overlap because as soon as one piece of work is finished it can be immediately passed to the next stage (eg. when three vertices are processed by the vertex shader then the triangle can proceed to be rasterized). You can think of the GPU pipeline as an assembly line. As each stage finishes with its data, it forwards the results to the following stage and proceeds with the next piece of work. Ideally every stage is busy working all the time, and the hardware is being utilized fully and efficiently as represented in the above image - the vertex shader is constantly processing vertices, the rasterizer is constantly rasterizing pixels, and so on. But consider what happens if one stage takes much longer than the others: What happens here is that an expensive vertex shader can't feed the following stages fast enough, and so becomes the bottleneck. If you had a draw call that behaved like this, making the pixel shader faster is not going to make much of a difference to the time it takes for the entire draw call to be rendered. The only way to make things faster is to reduce the time spent in the vertex shader. How we do that depends on what in the vertex shader stage is actually causing the bottleneck. You should keep in mind that there will almost always be a bottleneck of some kind - if you eliminate one, another will just take its place. The trick is knowing when you can do something about it, and when you have to live with it because that's just what it costs to render what you want to render. When you optimize, you're really trying to get rid of unnecessary bottlenecks. But how do you identify what the bottleneck is? Profiling Profiling tools are absolutely essential for figuring out where all the GPU's time is being spent, and good ones will point you at exactly what you need to change in order for things to go faster. They do this in a variety of ways - some explicitly show a list of bottlenecks, others let you run 'experiments' to see what happens (eg. "how does my draw time change if all the textures are tiny", which can tell you if you're bound by memory bandwidth or cache usage). Unfortunately this is where things get a bit hand-wavy, because some of the best performance tools available are only available for the consoles and therefore under NDA. If you're developing for Xbox or Playstation, bug your friendly neighbourhood graphics programmer to show you these tools. We love it when artists get involved in performance, and will be happy to answer questions and even host tutorials on how to use the tools effectively. Unity's basic built-in GPU profiler The PC already has some pretty good (albeit hardware-specific) profiling tools which you can get directly from the GPU vendors, such as NVIDIA's Nsight, AMD's GPU PerfStudio, and Intel's GPA. Then there's RenderDoc which is currently the best tool for graphics debugging on PC, but doesn't have any advanced profiling features. Microsoft is also starting to release its awesome Xbox profiling tool PIX for Windows too, albeit only for D3D12 applications. Assuming they also plan to provide the same bottleneck analysis tools as the Xbox version (tricky with the wide variety of hardware out there), it should be a huge asset to PC developers going forward. These tools can give you more information about the performance of your art than you will ever need. They can also give you a lot of insight into how a frame is put together in your engine, as well as being awesome debugging tools for when things don't look how they should. Being able to use them is important, as artists need to be responsible for the performance of their art. But you shouldn't be expected to figure it all out on your own - any good engine should provide its own custom tools for analyzing performance, ideally providing metrics and guidelines to help determine if your art assets are within budget. If you want to be more involved with performance but feel you don't have the necessary tools, talk to your programming team. Chances are they already exist - and if they don't, they should be created! Now that you know how GPUs work and what a bottleneck is, we can finally get to the good stuff. Let's dig into the most common real-world bottlenecks that can show up in the pipeline, how they happen, and what can be done about them. Shader instructions Since most of the GPU's work is done with shaders, they're often the source of many bottlenecks of the you'll see. When a bottleneck is identified as shader instructions (sometimes referred to as ALUs from Arithmetic Logic Units, the hardware that actually does the calculations), it's simply a way of saying the vertex or pixel shader is doing a lot of work and the rest of the pipeline is waiting for that work to finish. Often the vertex or pixel shader program itself is just too complex, containing many instructions and taking a long time to execute. Or maybe the vertex shader is reasonable but the mesh you're rendering has too many vertices which adds up to a lot of time spent executing the vertex shader. Or the draw call covers a large area of the screen touching many pixels, and so spends a lot of time in the pixel shader. Unsurprisingly, the best way to optimize a shader instruction bottleneck is to execute less instructions! For pixel shaders that means choosing a simpler material with less features to reduce the number of instructions executed per pixel. For vertex shaders it means simplifying your mesh to reduce the number of vertices that need to be processed, as well as being sure to use LODs (Level Of Detail - simplified versions of your mesh for use when the object is far away and small on the screen). Sometimes however, shader instruction bottlenecks are instead just an indication of problems in some other area. Issues such as too much overdraw, a misbehaving LOD system, and many others can cause the GPU to do a lot more work than necessary. These problems can be either on the engine side or the content side; careful profiling, examination, and experience will help you to figure out what's really going on. One of the most common of these issues - overdraw - is when the same pixel on the screen needs to be shaded multiple times, because it's touched by multiple draw calls. Overdraw is a problem because it decreases the overall time the GPU has to spend on rendering. If every pixel on the screen has to be shaded twice, the GPU can only spend half the amount of time on each pixel and still maintain the same framerate. A frame capture from PIX with the corresponding overdraw visualization mode Sometimes overdraw is unavoidable, such as when rendering translucent objects like particles or glass-like materials; the background object is visible through the foreground, so both need to be rendered. But for opaque objects, overdraw is completely unnecessary because the pixel shown in the buffer at the end of rendering is the only one that actually needs to be processed. In this case, every overdrawn pixel is just wasted GPU time. Steps are taken by the GPU to reduce overdraw in opaque objects. The early depth test (which happens before the pixel shader - see the initial pipeline diagram) will skip pixel shading if it determines that the pixel will be hidden by another object. It does that by comparing the pixel being shaded to the depth buffer - a render target where the GPU stores the entire frame's depth so that objects occlude each other properly. But for the early depth test to be effective, the other object must have already been rendered so it is present in the depth buffer. That means that the rendering order of objects is very important. Ideally every scene would be rendered front-to-back (ie. objects closest to the camera first), so that only the foreground pixels get shaded and the rest get killed by the early depth test, eliminating overdraw entirely. But in the real world that's not always possible because you can't reorder the triangles inside a draw call during rendering. Complex meshes can occlude themselves multiple times, or mesh merging can result in many overlapping objects being rendered in the "wrong" order causing overdraw. There's no easy answer for avoiding these cases, and in the latter case it's just another thing to take into consideration when deciding whether or not to merge meshes. To help early depth testing, some games do a partial depth prepass. This is a preliminary pass where certain large objects that are known to be effective occluders (large buildings, terrain, the main character etc.) are rendered with a simple shader that only outputs to the depth buffer, which is relatively fast as it avoids doing any pixel shader work such as lighting or texturing. This 'primes' the depth buffer and increases the amount of pixel shader work that can be skipped during the full rendering pass later in the frame. The drawback is that rendering the occluding objects twice (once in the depth-only pass and once in the main pass) increases the number of draw calls, plus there's always a chance that the time it takes to render the depth pass itself is more than the time it saves from increased early depth test efficiency. Only profiling in a variety of cases can determine whether or not it's worth it for any given scene. Particle overdraw visualization of an explosion in Prototype 2 One place where overdraw is a particular concern is particle rendering, given that particles are transparent and often overlap a lot. Artists working on particle effects should always have overdraw in mind when producing effects. A dense cloud effect can be produced by emitting lots of small faint overlapping particles, but that's going to drive up the rendering cost of the effect; a better-performing alternative would be to emit fewer large particles, and instead rely more on the texture and texture animation to convey the density of the effect. The overall result is often more visually effective anyway because offline software like FumeFX and Houdini can usually produce much more interesting effects through texture animation, compared to real-time simulated behaviour of individual particles. The engine can also take steps to avoid doing more GPU work than necessary for particles. Every rendered pixel that ends up completely transparent is just wasted time, so a common optimization is to perform particle trimming: instead of rendering the particle with two triangles, a custom-fitted polygon is generated that minimizes the empty areas of the texture that are used. Particle 'cutout' tool in Unreal Engine 4 The same can be done for other partially transparent objects such as vegetation. In fact for vegetation it's even more important to use custom geometry to eliminate the large amount of empty texture space, as vegetation often uses alpha testing. This is when the alpha channel of the texture is used to decide whether or not to discard the pixel during the pixel shader stage, effectively making it transparent. This is a problem because alpha testing can also have the side effect of disabling the early depth test completely (because it invalidates certain assumptions that the GPU can make about the pixel), leading to much more unnecessary pixel shader work. Combine this with the fact that vegetation often contains a lot of overdraw anyway - think of all the overlapping leaves on a tree - and it can quickly become very expensive to render if you're not careful. A close relative of overdraw is overshading, which is caused by tiny or thin triangles and can really hurt performance by wasting a significant portion of the GPU's time. Overshading is a consequence of how GPUs process pixels during pixel shading: not one at a time, but instead in 'quads' which are blocks of four pixels arranged in a 2x2 pattern. It's done like this so the hardware can do things like comparing UVs between pixels to calculate appropriate mipmap levels. This means that if a triangle only touches a single pixel of a quad (because the triangle is tiny or very thin), the GPU still processes the whole quad and just throws away the other three pixels, wasting 75% of the work. That wasted time can really add up, and is particularly painful for forward (ie. not deferred) renderers that do all lighting and shading in a single pass in the pixel shader. This penalty can be reduced by using properly-tuned LODs; besides saving on vertex shader processing, they can also greatly reduce overshading by having triangles cover more of each quad on average.' A 10x8 pixel buffer with 5x4 quads. The two triangles have poor quad utilization -- left is too small, right is too thin. The 10 red quads touched by the triangles need to be completely shaded, even though the 12 green pixels are the only ones that are actually needed. Overall, 70% of the GPU's work is wasted. (Random trivia: quad overshading is also the reason you'll sometimes see fullscreen post effects use a single large triangle to cover the screen instead of two back-to-back triangles. With two triangles, quads that straddle the shared edge would be wasting some of their work, so avoiding that saves a minor amount of GPU time.) Beyond overshading, tiny triangles are also a problem because GPUs can only process and rasterize triangles at a certain rate, which is usually relatively low compared to how many pixels it can process in the same amount of time. With too many small triangles, it can't produce pixels fast enough to keep the shader units busy, resulting in stalls and idle time - the real enemy of GPU performance. Similarly, long thin triangles are bad for performance for another reason beyond quad usage: GPUs rasterize pixels in square or rectangular blocks, not in long strips. Compared to a more regular-shaped triangle with even sides, a long thin triangle ends up making the GPU do a lot of extra unnecessary work to rasterize it into pixels, potentially causing a bottleneck at the rasterization stage. This is why it's usually recommended that meshes are tessellated into evenly-shaped triangles, even if it increases the polygon count a bit. As with everything else, experimentation and profiling will show the best balance. Memory Bandwidth and Textures As illustrated in the above diagram of the GPU pipeline, meshes and textures are stored in memory that is physically separate from the GPU's shader processors. That means that whenever the GPU needs to access some piece of data, like a texture being fetched by a pixel shader, it needs to retrieve it from memory before it can actually use it as part of its calculations. Memory accesses are analogous to downloading files from the internet. File downloads take a certain amount of time due to the internet connection's bandwidth - the speed at which data can be transferred. That bandwidth is also shared between all downloads - if you can download one file at 6MB/s, two files only download at 3MB/s each. The same is true of memory accesses; index/vertex buffers and textures being accessed by the GPU take time, and must share memory bandwidth. The speeds are obviously much higher than internet connections - on paper the PS4's GPU memory bandwidth is 176GB/s - but the idea is the same. A shader that accesses many textures will rely heavily on having enough bandwidth to transfer all the data it needs in the time it needs it. Shaders programs are executed by the GPU with these restrictions in mind. A shader that needs to access a texture will try to start the transfer as early as possible, then do other unrelated work (for example lighting calculations) and hope that the texture data has arrived from memory by the time it gets to the part of the program that needs it. If the data hasn't arrived in time - because the transfer is slowed down by lots of other transfers, or because it runs out of other work to do (especially likely for dependent texture fetches) - execution will stop and it will just sit there and wait. This is a memory bandwidth bottleneck; making the rest of the shader faster will not matter if it still needs to stop and wait for data to arrive from memory. The only way to optimize this is to reduce the amount of bandwidth being used, or the amount of data being transferred, or both. Memory bandwidth might even have to be shared with the CPU or async compute work that the GPU is doing at the same time. It's a very precious resource. The majority of memory bandwidth is usually taken up by texture transfers, since textures contain so much data. As a result, there are a few different mechanisms in place to reduce the amount of texture data that needs to be shuffled around. Memory bandwidth might even have to be shared with the CPU or async compute work that the GPU is doing at the same time. It's a very precious resource. The majority of memory bandwidth is usually taken up by texture transfers, since textures contain so much data. As a result, there are a few different mechanisms in place to reduce the amount of texture data that needs to be shuffled around. First and foremost is a cache. This is a small piece of high-speed memory that the GPU has very fast access to, and is used to keep chunks of memory that have been accessed recently in case the GPU needs them again. In the internet connection analogy, the cache is your computer's hard drive that stores the downloaded files for faster access in the future. When a piece of memory is accessed, like a single texel in a texture, the surrounding texels are also pulled into the cache in the same memory transfer. The next time the GPU looks for one of those texels, it doesn't need to go all the way to memory and can instead fetch it from the cache extremely quickly. This is actually often the common case - when a texel is displayed on the screen in one pixel, it's very likely that the pixel beside it will need to show the same texel, or the texel right beside it in the texture. When that happens, nothing needs to be transferred from memory, no bandwidth is used, and the GPU can access the cached data almost instantly. Caches are therefore vitally important for avoiding memory-related bottlenecks. Especially when you take filtering into account - bilinear, trilinear, and anisotropic filtering all require multiple texels to be accessed for each lookup, putting an extra burden on bandwidth usage. High-quality anisotropic filtering is particularly bandwidth-intensive. Now think about what happens in the cache if you try to display a large texture (eg. 2048x2048) on an object that's very far away and only takes up a few pixels on the screen. Each pixel will need to fetch from a very different part of the texture, and the cache will be completely ineffective since it only keeps texels that were close to previous accesses. Every texture access will try to find its result in the cache and fail (called a 'cache miss') and so the data must be fetched from memory, incurring the dual costs of bandwidth usage and the time it takes for the data to be transferred. A stall may occur, slowing the whole shader down. It will also cause other (potentially useful) data to be 'evicted' from the cache in order to make room for the surrounding texels that will never even be used, reducing the overall efficiency of the cache. It's bad news all around, and that's not to even mention the visual quality issues - tiny movements of the camera will cause completely different texels to be sampled, causing aliasing and sparkling. This is where mipmapping comes to the rescue. When a texture fetch is issued, the GPU can analyze the texture coordinates being used at each pixel, determining when there is a large gap between texture accesses. Instead of incurring the costs of a cache miss for every texel, it instead accesses a lower mip of the texture that matches the resolution it's looking for. This greatly increases the effectiveness of the cache, reducing memory bandwidth usage and the potential for a bandwidth-related bottleneck. Lower mips are also smaller and need less data to be transferred from memory, further reducing bandwidth usage. And finally, since mips are pre-filtered, their use also vastly reduces aliasing and sparkling. For all of these reasons, it's almost always a good idea to use mipmaps - the advantages are definitely worth the extra memory usage. A texture on two quads, one close to the camera and one much further away The same texture with a corresponding mipmap chain, each mip being half the size of the previous one Lastly, texture compression is an important way of reducing bandwidth and cache usage (in addition to the obvious memory savings from storing less texture data). Using BC (Block Compression, previously known as DXT compression), textures can be reduced to a quarter or even a sixth of their original size in exchange for a minor hit in quality. This is a significant reduction in the amount of data that needs to be transferred and processed, and most GPUs even keep the textures compressed in the cache, leaving more room to store other texture data and increasing overall cache efficiency. All of the above information should lead to some obvious steps for reducing or eliminating bandwidth bottlenecks when it comes to texture optimization on the art side. Make sure the textures have mips and are compressed. Don't use heavy 8x or 16x anisotropic filtering if 2x is enough, or even trilinear or bilinear if possible. Reduce texture resolution, particularly if the top-level mip is often displayed. Don't use material features that cause texture accesses unless the feature is really needed. And make sure all the data being fetched is actually used - don't sample four RGBA textures when you actually only need the data in the red channels of each; merge those four channels into a single texture and you've removed 75% of the bandwidth usage. While textures are the primary users of memory bandwidth, they're by no means the only ones. Mesh data (vertex and index buffers) also need to be loaded from memory. You'll also notice in first GPU pipeline diagram that the final render target output is a write to memory. All these transfers usually share the same memory bandwidth. In normal rendering these costs typically aren't noticeable as the amount of data is relatively small compared to the texture data, but this isn't always the case. Compared to regular draw calls, shadow passes behave quite differently and are much more likely to be bandwidth bound. A frame from GTA V with shadow maps, courtesy of Adrian Courreges' great frame analysis This is because shadow maps are simply depth buffer that represent the distance from the light to the closest mesh, so most of the work that needs to be done for shadow rendering consists of transferring data to and from memory: fetch the vertex/index buffers, do some simple calculations to determine position, and then write the depth of the mesh to the shadow map. Most of the time, a pixel shader isn't even executed because all the necessary depth information comes from just the vertex data. This leaves very little work to hide the overhead of all the memory transfers, and the likely bottleneck is that the shader just ends up waiting for memory transfers to complete. As a result, shadow passes are particularly sensitive to both vertex/triangle counts and shadow map resolution, as they directly affect the amount of bandwidth that is needed. The last thing worth mentioning with regards to memory bandwidth is a special case - the Xbox. Both the Xbox 360 and Xbox One have a particular piece of memory embedded close to the GPU, called EDRAM on 360 and ESRAM on XB1. It's a relatively small amount of memory (10MB on 360 and 32MB on XB1), but big enough to store a few render targets and maybe some frequently-used textures, and with a much higher bandwidth than regular system memory (aka DRAM). Just as important as the speed is the fact that this bandwidth uses a dedicated path, so doesn't have to be shared with DRAM transfers. It adds complexity to the engine, but when used efficiently it can give some extra headroom in bandwidth-limited situations. As an artist you generally won't have control over what goes into EDRAM/ESRAM, but it's worth knowing of its existence when it comes to profiling. The 3D programming team can give you more details on its use in your particular engine. And there's more... As you've probably gathered by now, GPUs are complex pieces of hardware. When fed properly, they are capable of processing an enormous amount of data and performing billions of calculations every second. On the other hand, bad data and poor usage can slow them down to a crawl, having a devastating effect on the game's framerate. There are many more things that could be discussed or expanded upon, but what's above is a good place to start for any technically-minded artist. Having an understanding of how the GPU works can help you produce art that not only looks great but also performs well... and better performance can let you improve your art even more, making the game look better too. There's a lot to take in here, but remember that your 3D programming team is always happy to sit down with you and discuss anything that needs more explanation - as am I in the comments section below! Further Technical Reading Render Hell - Simon Trumpler Texture filtering: mipmaps - Shawn Hargreaves Graphics Gems for Games - Findings from Avalanche Studios - Emil Persson Triangulation - Emil Persson How bad are small triangles on GPU and why? - Christophe Riccio Game Art Tricks - Simon Trumpler Optimizing the rendering of a particle system - Christer Ericson Practical Texture Atlases - Ivan-Assen Ivanov How GPUs Work - David Luebke & Greg Humphreys Casual Introduction to Low-Level Graphics Programming - Stephanie Hurlburt Counting Quads - Stephen Hill Overdraw in Overdrive - Stephen Hill Life of a triangle - NVIDIA's logical pipeline - NVIDIA From Shader Code to a Teraflop: How Shader Cores Work - Kayvon Fatahalian A Trip Through the Graphics Pipeline (2011) - Fabian Giesen Note: This article was originally published on fragmentbuffer.com, and is republished here with kind permission from the author Keith O'Conor. You can read more of Keith's writing on Twitter (@keithoconor).
  22. Advice for LOD switching

    I'm currently working on an open-world, dense urban project using the dev tools released by a design studio for a game they have released.   I can run the base game (which is also set in dense urban areas) at 1080p with ultra settings with a solid 60FPS but if I go to 4K then I get about 30FPS.   The minimum specs of the game: CPU: 3.4GHZ GPU: 1GB VRAM   The recommended specs of the game: CPU: 4.0GHZ GPU: 2GB VRAM   My question is about LOD switching. Using the dev tools, you can create your own unique buildings but I'm worried about how I should create the LODs. All the different buildings I create will use many similar objects such as windows and detailed objects like air conditioners, chimneys e.c.t. It seems more convenient to me to create many smaller LODs rather than creating a new singular LOD for every building that's made to save time and also, if I edit the values of a smaller decoration, it would take effect across all the other smaller LODs already created. I hope this has made sense to you.   If I create a LOD for multiple parts of a single building then I can keep creating new ones easily and all the LODs are already created. But of course, that could effect the performance. However, with minimum specs like 3.4GHZ, could I compromise with more LOD switches?   I'm new to generating and creating LODs and could use a bit of advice and guidance. Unfortunately, I cannot disclose too much about the project or show screenshots as it is currently under wraps. Any help would be appreciated - thanks!  
  23. I've read in several places at this point that to get the most out of your CPU's cache, it's important to pack relevant data together, and access it in a linear fashion. That way the first access to that region of memory loads it into a cache line, and the remaining accesses will be much cheaper. One thing that isn't clear to me, however, is how many cache lines you can have "active" at once.   So for example, if you have a series of 3D vectors, and you lay them out like this: [xxxx...] [yyyy...] [zzzz...] And then you access your data as: for (std::size_t i = 0; i < len; ++i) { auto x_i = x[i]; auto y_i = y[i]; auto z_i = z[i]; // Do something with x, y, and z } Does each array get it's own cache line? Or does accessing the 'y' element push 'x' out of the cache, and then accessing the 'z' element push 'y' out of the cache? If you were to iterate backwards, would that cause more cache misses than iterating forwards?   On another note, while I try to follow best practices for this stuff where possible, I literally have zero idea how effective (or ineffective) it is, since I have no tools for profiling it, and I don't have time write one version of something and then test it against another. Are there any free (or free for students) tools for cache profiling on Windows? I'd love to use Valgrind, but I don't have anything that can run Linux that is also powerful enough to run my game.   Thanks!
  24. MMOG Optimizations

    Hi community, here is Emanuele from Crimson Games Development Department. An user asked me about how I am dealing with main MMOGs problems in Heroes of Asgard, so I prepared an article about this topic. So today we will discuss about main optimization problems that you can find in a MMO game development. I will be happy if someone will add his contribute, so we can learn together: I will add it to open post! DEFINITION By definition, a MMOG should allow you to play with a huge amount of people at once and interact with them as if you are in a normal multiplayer game, this in a persistent world. Now, if we want to dissect a little more this statement, we will see that this is impossible without applying various “tricks” behind the scenes. WE ARE COMPLAINING ABOUT PERFORMANCES You can definitely understand how when the amount of connected players grows, server performances will be degraded. Many operations on the server are required to operate on all connected players or a subset of them, on all objects around the world, on all monsters and their AI, etc. All these calculations are executed several times per second: imagine, then, to have to iterate over 200 players, having to iterate over 2,000 players or having to iterate over 20,000 players, frame each frame of your server simulation. For each iteration, I have to send packets, make calculations, change positions, etc. There is, therefore, an exponential growth of the computational load for each new connected player. As you can well imagine, is a very large amount of work for a single machine, this due to an obvious hardware limitation. Usually, therefore, there is a maximum threshold of concurrent players simultaneously processed, after which the server itself (the physical machine) can not keep up, creating a negative game experience (lag, unresponsive commands, etc). You can not accept new connections beyond this threshold until a seat becomes available, in order to not ruin the experience for those who are already connected and playing. You could then start multiple servers on different machines, so you can host more players, but of course they can not interact with players from other servers. The division into various “server instance” definitely does not fall within the definition of MMOG, as it does not allow you to interact with all players in a persistent world, but it creates different instances of the same world. It is acceptable, of course: but it isn’t what we want to achieve. That said, what can we do to “bypass” a little bit this problem? And what did I already do for Heroes of Asgard? What I describe is the result of my experience and, therefore, it is also what I provided for Heroes of Asgard, obviously trying to get the best. WHAT CAN WE DO? There are several measures that can be applied to improve the maximum threshold. Yes, improve it: there will always be a maximum threshold beyond which it is difficult to go (by maintaining the same hardware, of course). YOU ARE THE CODE THAT YOU WRITE As first thing: write good code, with your brain attached to this task and without unnecessary waste of resources. It may seem obvious and trite, but it is not. Wasting resources is equivalent to worsen server’s available resources. Wasting bandwidth means exhaust it in no time, every single piece of data that is transmitted has to be carefully selected. If I send an extra byte for each user, when my server hosts 20,000 players, it means sending about additional 20KB for each frame. Wasting CPU cycles is like shooting myself in the foot: the actions you perform must be kept to a bare minimum, add a single more function call per user may mean adding N additional CPU cycles, which for 20,000 users will be N x 20000 additional CPU cycles. Waste memory (and therefore to allocate unnecessary resources) is harmful: the allocation requires both additional CPU cycles and memory. And system memory ends. In managed environments, also leave resources allocated causes garbage collection, which may mean spending huge CPU cycles to free resources, instead of serving the players and simulate the world. Ultimately, wasting resources in your code will ensure that you will spend more money and more frequently to improve your servers (when your userbase increases), in order to maintain acceptable performance. FIX YOUR SIMULATION As you certainly know, the simulation of a virtual world can be executed a certain number of times per second by the server. This means that every second, all entities and systems in the world are “simulated” a certain number of times. The simulation can include AI routines, positions/rotations updates, etc. It allows you to infuse ”life” to your virtual world. The number of times your simulation is performed is called FPS or Frames Per Second. It is obvious that if the simulation is cumbersome and requires time, our hardware will tend to simulate the world less times in one second. This can lead to a degradation of the simulation. But consider: does we need a big amount of simulations performed by the server? Does we need to strive our hardware in this manner? Can we, however, improve this? Yes. For most games with few players in the same map, and a high game speed (see the FPS, with a high number of commands) our world can be simulated 60 times per second (or less, obviously it depends on game type). For a MMOG a more little amount can be enough, depending on the genre. There is no need to simulate the world many times per second as possible, since this will change the simulation in a minimal way, wasting more resources than necessary. In Heroes of Asgard, for example, the world is simulated 20 times per second (at the moment). DO WE NEED TO KNOW ABOUT THE ENTIRE WORLD? We said that in an MMOG we must be able to interact with other players and with the surrounding environment and I should be able to do it with anyone in the world at that time. Quite right, of course. But, from the point of view of a player, do you really need to know what a player is doing on the other side of the map? No, not always. Indeed, in the majority of cases this player isn’t interested to know if another player, as example, is walking or not in another far area. Send an information that can not be displayed on the user’s screen is a waste of resources. This observation is important, it allows us to implement a big optimization. How can I inform a particular player only on entities that may interest him? Why not break the map (or maps) in zones? A simple subdivision is grid one: divide the map in N x M zones, where N and M are greater than or equal to 1. This technique is also known as space partitioning or zones partitioning. In this way, a player can only receive information on the entities contained in its area, without needing to have knowledge of distant entities. If in my map 8000 entities are uniformly distributed and it is divided into a 4 x 4 grid, the player who is in the [1, 1] zone will have the burden of receiving information only about 500 entities. A great advantage, doesn’t it? But consider: what if the player is on zone’s borders? It will not see the players in the nearby zones, although they are visible. We can therefore understand that the player will have to be informed about the entities contained in its zone and in zones immediately contiguous. The size of the zones allows you to optimize a lot this method, so depending on the size of a map the size of the grid can vary , in order to obtain the best effect. Also the shape of the zones can vary, to better fit to the composition of the map. LOOK FAR AS THE EYE CAN SEE As mentioned, zone division already offers a decent level of optimization, allowing us to send information about a single entity to the players who really can benefit from them. But let us ask ourselves a question: can we identify useless information in our zone division (remember that also include those contiguous, so in a regular grid we have to dealt with 9 zones in the worst case)? Of course we can. Most likely a player does not affect entities outside of his field of view. If I can not see an entity, I do not care to trace what it is doing, although it may be in my own zone. Then sending information about that entity is a waste of resources. How can you determine what your server needs to send to a specific player? The easiest way is to trace, in fact, the field of view. Everything within that radius is what matters to the specific player, entities outside are not necessary to the specific player’s world simulation. And since we already have a zone subdivision, we can simply iterate over the entities in player’s zones of interest (instead of all entities in the map) to determine who is within our field of view. This concept is also called area of interest or AoI. So, continuing the example before, let’s iterate on 500 entities instead of 8000, to extrapolate those hypothetical 25 which fall within the visual range and exchange information through the network only with them. From 8000 to 25, a good result: doesn’t it? And without the user suffers of missing information as it does not see them. Indeed, it will notice less use of resources. You can further enhance the area of interest, by applying various measures:organize various levels of visual rays; the most distant entity will receive updates less frequentlyfilter the interesting entities depending on the morphology of the map; if an entity is in our sight, but behind a mountain, I can possibly ignore it. This measure, however, (in my opinion) only makes sense if you already use culling for other things, so you don’t introduce additional calculations to filter few other entitiesDISTRIBUTE YOUR COMPUTATION LOAD We already said that a single machine will still have a certain threshold beyond which, despite all the optimizations made, you will experience performance degradation (and thus a bad gaming experience). Fine, but then why not take advantage of multiple computers simultaneously? There are obviously different ways to do it. For example, in Heroes of Asgard each map that composes the world is hosted on a separate process. This causes each map can be hosted on a different physical machine. Obviously, however, you can go down even more and accommodate sets of zones on separate processes (so a single map may be divided into several parts and hosted by different servers). SLICE YOUR PIE You can also combine global services (such as chat) in different server processes, to give to your player the impression that, even being connected to different maps (so different servers), you can interact with distant players. Furthermore, break those services from the main world is getting an additional gain in performance. RECYCLE YOUR TOYS As mentioned, allocate memory costs a good amount of resources. So why not reuse what we already allocated? The use of objects pools is of great importance in the multiplayer development. It allows to shift the burden of allocating costs when it can be faced with no problems, for example during bootstrap of our app server. A monster is defeated and dies? Well, I put it aside. I can use it again when another monster must be spawned, just recovering from my pool. Of course it is clear that you have to use a certain criteria in order to choose which objects to keep in memory and which are not. Should I keep in memory a pool of a monsters that spawns once a month? No, it may be useless. Should I keep in memory a pool of objects representing the drop of the currency? Yes, it makes more sense. USEFUL LINKS Of course, an important part of this thread is for resources. Articles, papers: each thing you think that can be useful on this topic. Spatial Partitioning http://gameprogrammingpatterns.com/spatial-partition.html Objects Pooling http://gameprogrammingpatterns.com/object-pool.html Game loop http://gafferongames.com/game-physics/fix-your-timestep/ Feel free to add your questions or your contribute! Best regards, Emanuele
  25. when checking if some object is within a certain area, which approach would have better performance if using glm library?   vec3 pos;   // Using a bounding box to represent that area 1) if ( pos.x > box.MinX && pos.x < box.MaxX      && pos.y > box.MinY....) { // do something..}   // Using a sphere to represent that area 2) if ( glm::length ( pos - sphere.centerPosition ) <= sphere.radius ) { // do something..}