Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 12 Mar 2005
Offline Last Active Yesterday, 11:05 PM

#5307418 if party strength changes, quest encounters can become imbalanced.

Posted by on Yesterday, 09:44 AM

Why not reset the monsters to the level of the players? Many games I've played use dynamic monster levels.  When you start out you're fighting wolves with 50 to 100 hit points.  As the game progresses you're much higher level but the wolves you fight may have 300 or 400 hit points, still easy to kill but they are different than one ones you began with.  


Another option may be to not have stats change very much. In systems like D&D there is an initial spread of 3D6 (3-18) for the base with a cap of +5 bonus maximum. While they may get better equipment over time, and gain a handful of hit points or abilities with each new level, having a freshly rerolled character after another has died will have some impact with a party, but generally not so much that the game is destabilized.




Consider it in both directions.  It isn't that fun to have leveled up and suddenly have the old monsters fall down dead at the sight of you. It also isn't that fun to have someone leave and suddenly the world is destabilized because your party is too weak.


I know many games love to make leveling up and adding stats into an enormous event: increase the stats by 20%, 40%, or even more!  If adding 6 hitponits per level is good, then adding 60 hitponits should be better, right? The problem is you develop god-like characters far too quickly. While it can serve as a gate and encourage level-grinding, it doesn't really serve stories and narratives well.

#5307316 Errors that effect a computer's system.

Posted by on 22 August 2016 - 09:16 PM

It can still happen, primarily as damage to graphics cards or CPUs that have had safeguards removed. 



There are a small number of games that disable several good features on graphics cards (notably vsync and rate limiting) on top of the player manually overclocking and overvolting their graphics cards.  The game does a ton of heavy processing work for an extended time and with the safeguards removed, it overheats and takes damage.


Similarly there are games that heavily tax the CPU. While fewer people do it today, extreme overclocking was more popular than today. Games that were already known to severely tax a CPU, coupled with someone who intentionally disables the safeguards and increases CPU temperature, can overheat and damage their processor.




As a beginner you should't worry about it.


Even as a professional it is something you shouldn't worry about, unless perhaps you are developing the graphics code; then you can implement a hard limit of something like 300 FPS as a rate limiter, just in case someone manually overrides vsync settings and you are on your menu screens that run at an unlimited framerate.


And even then, it would be mostly the consumer's fault for disabling the safeguards on the system.

#5307313 Abstract Classes and returning a varying Type

Posted by on 22 August 2016 - 08:38 PM

Could you maybe elaborate what a "texture factory"-class would do?


Search for the term "factory method" or "abstract factory".  You might pass in a string, and the method builds something and returns it.


In this kind of example, you might pass a string labeled "someimage.dds". The method would crack open the file, figure out what kind of file it is, and do whatever creation work is necessary. Then it gives you back a pointer to a texture that represents whatever the contents of the image happen to be.


You mentioned DXT3 and 5, compression types, is that right? Would this part make the textures usable? Decompress them?


That would defeat the purpose. Those texture types are designed to go on to the card directly so you don't need to spend the time, compute power, or memory to decompress them.


It would be wonderful if you could give me an example for how these classes (store, loader, cache, and proxy) interact with each other, as this sounds really interesting to me!


The store is the thing that controls all the access.  You say "give me a resource" and the store returns something.  The thing it returns is not actually the final object at all, but is a proxy for the final object. A proxy allows for the actual data to be loaded as needed.  For example, a proxy for a texture may not be the actual loaded texture because there might be a limited amount of memory.  The proxy contains enough information that other systems can use the object and intelligently swap it out, which is why it needs to be custom for each resource type. The proxy object can be a placeholder initially, no other resources attached, so that other systems don't need to wait around. The loader is a thing that loads the data into the proxy, it might load from a file, load from a network stream, load from a database, wherever. The loader is able to get the resource into memory.  The cache is the thing that actually contains the data as long as it is loaded.  When resources are tight something identifies the pressure --- maybe another resource needs to be loaded and there isn't room, or maybe the system is facing other memory pressure --- the cache unloads a resource it isn't used any more and the proxy points back to the placeholder.



An example:


Let's say you are loading a level. Your rendering system needs models and textures.  The code looks up the names of the models and textures from your art asset information.  It calls the model store and requests a Model* for every one of the file names. The model store immediately returns a proxy object that implements the Model interface. (An alternative would be to wait several milliseconds for the data to load from disk.) Since the actual model data isn't available yet, it gives you a default placeholder object, maybe a small gray box. Similarly calls are made for all the textures, and the texture store returns texture proxies in the form of a Texture* for each one rather than waiting for the data to load from disk. Perhaps the default proxy value gives you a default placeholder object of a pink solid value.  You begin to add the items in your scene, and in the background the engine starts loading resources, probably from disk.  As the loader completes, it will internally provide the actual models and textures and replace the pointers inside the proxy object. The loader puts the object inside the cache, possibly evicting (unloading) other objects from the cache.  When the renderer renders the Model* and Texture* items it is written to be friendly with the other systems so they don't accidentally break anything.  Later, when you request another texture from the store, the store can check to see if the texture is already loaded or in use.  If it is, the store can immediately return a proxy that points to the existing model rather than pointing to the solid pink placeholder value.


Repeat the process for any other kind of resource.  Textures and models work with the graphics cards and need to be coordinated one way.  Audio clips need to be coordinated another way, and audio streams may need yet another way.  You may have other resource types, such as script files, shader files, level map files, or whatever else you need.

#5307270 Abstract Classes and returning a varying Type

Posted by on 22 August 2016 - 03:26 PM

I hope my case is clearer now : ) Nonetheless, I think this pretty much would suit the template-variant?


Usually that's a good smell that you're doing something wrong.


Usually different types of resources are generated by different types of factory methods.  If you use a factory method for game models and are passing in a texture resource, that's generally a problem.


In larger engines games tend to work in the pattern of a collection of a store, loader, cache, and proxy for whatever resource type you need.  Each is unique for the interface, such as a texture type, a shader type, a model type, and each can be specialized.  A texture factory may be able to provide many types of concrete texture types (DXT3, DXT5, etc) that all implement the same base class and are interchangeable as far as class users go. 



If you really believe in using a unified resource loader, the model suggested by noizex above can work.  You're still going to implement all the behavior for each resource type, it will be specialized via template specialization so all that work still needs to happen.  It's just that instead of TextureStore.Get and ModelStore.Get that return a pointer to their corresponding data types, you'll have resources.Get<Texture> and resources.Get<Model> that do exactly the same thing.

#5307259 C++ class properties initialization order on constructor?

Posted by on 22 August 2016 - 01:44 PM

The language standard states that they must be in increasing order, which means they match the order they appear in the file.


Specifically, "When an aggregate is initialized by an initializer list, as specified in 8.5.4, the elements of the initializer list are taken as initializers for the members of the aggregate, in increasing subscript or member order."   


Some compilers handle it in different order, but that's a compiler variance that is usually documented.  Visual C++ does not provide a warning for this, and has said multiple times (such as here) that they are thinking about it, but they never actually implement the warning.

#5307080 Friends living abroad have really laggy connection to me, why?(using raknet,...

Posted by on 21 August 2016 - 04:26 PM

A bit more on that...


You seem to already know some of this, but the nature of the questions means it bears repeating:


TCP is stream based.  The protocol makes sure that you get what was sent, in the exact order it was sent.  This is wonderful if you prefer a guarantee that everything arrive in the order it was sent.  If something was dropped on the network the underlying protocol will tell you it isn't ready yet, and it will do its job behind the scenes to request the missing data and ensure that your stream is given to you in the order it was sent.  This isn't necessarily wrong or bad, and if you need a stream based system it can be exactly what you need.


UDP is packet based.  The protocol gives you items as fast as they are received, and does not take steps to deal with network hiccups. This is wonderful if you prefer data is available as soon as it arrives.  However, if something was dropped on the network the protocol won't automatically request it for you; if a network hiccup gives you a packet multiple times, twice or three times or twenty times, you'll get copy after copy as they arrive.  This isn't necessarily wrong or bad, and if you want data as soon as it arrives it can be exactly what you need.




Raknet, like many game networking libraries use UDP for their communication and then implement an optional stream-based protocol on top of it.  This way you can specify how you want the data.  If you want the data as fast as it arrives then you can flag it to use that version of their protocol, but in exchange you need your code to deal with missing packets, duplicate packets, and the rest.  If you want your data stream based you can flag it to use that variation of their protocol, but in exchange you need your code to deal with the fact that sometimes the stream will stop after network hiccups and it may cause a delay.





Whichever version you end up taking you will need to design in ways to deal with latency and with network hiccups.  There are many tips and tricks out there for dealing with them. Some of them can be masked with simple tricks: an audio cue ("Yes, sir!") or an animation (slight gun lift and trigger pull) can buy you 100-200 milliseconds. For bullets, introducing flight time on the shot can gain you another few hundred milliseconds.  A bullet may travel about 4000 feet per second, so there's another 10 milliseconds of network time.  Dead reckoning and predictions with rewinding and injecting events are more difficult to implement, but they can further reduce the perceived network performance.  You can implement any or all of those things using either stream-based or packet-based networking protocols. 

#5307074 Preventing overengeneering

Posted by on 21 August 2016 - 03:07 PM

1- How do you handle complex task and avoid laziness and lack of motivation?

Keep it simple.  
Most critically, remember the scale of work is radically different:
At work you are part of a team. Hopefully each team member is working on areas they are confident and experienced with.  If you are used to working on a team of 20, they may produce in one week what would take you 30 or 40 weeks full time at home.
Second, remember you are not working on it full time. What might take you one week full time will probably take two or three weeks at home.
Combine the two, and realize that it may take you three or four years as an individual to do what your professional group does in a single week.
Your development speed at home is radically different from your development speed at work. I keep things simple be tracking my own speed at home and remembering it is completely unrelated to my speed at work.

2 - How do you deal with over engineering?

As above, I remember that I do not have time to get everything working polished and to my work standards. I can accomplish something minimal that gets the job done, and that is all the time I have for a hobby project. Always think small, then think smaller again.

3 - Do you think that an average programmer would do more for those 3 years?

Everyone's pace is unique. Doing something you are comfortable with should be much faster than doing something that requires learning.

I think in the grand scheme of things, people's skill at programming is a similar normal curve to any other skill. Most people who do the job day to day form a giant bell curve. There will be some who are faster, some who are slower. There are some who will be terrible and struggle to work professionally. And there will be the outliers at the top, the Michael Jordans, Magic Johnsons, Larry Birds, and Kobe Bryants out there who produce more than two or five normal people combined.

I've always thought it was funny how much emphasis many headhunters place on trying to land those rare superstars. They exist, but the effort (and high salaries) of trying to put together a team of actual superstars is off the charts. It is great to aim as far to the right of the bell curve as you can, the best people you can reasonably find, but expecting everyone to be in the top 1% is unrealistic. Most people are near the center of the bell curve. Some tasks they may perform better, some worse, but it all averages out in the end.

What you worked on is what you worked on. Don't compare yourself to superstars, don't compare yourself to domain experts working in their expert domains.

4 - Any thing you wanna add? personal experience and suggestions.

Scope smaller. Always think smaller. As an individual your time is exceedingly rare. Think about not doing the thing entirely, and if you decide to do it, figure out the minimum that meets the standards.

If you're still not sure what you can do in a month, work on a challenge of doing exactly one month of development. Build something in exactly one month, then put it away. Build something else the next month, then put it away.

The first few months you will produce things that are incomplete and maybe unusable. That's okay. After a few months you'll get better at figuring out how to scope your own personal projects into something that can be done in a reasonable time.

#5306937 Handling Mouse Movement

Posted by on 20 August 2016 - 07:21 PM

To prevent a mouse from leaving the window I was resetting the mouse to the centre of the window and just collecting the delta.


If you need to prevent the mouse from leaving any area, ClipCursor() is probably the function you are looking for.


For example, if you've got a click-and-drag inside your main window and you don't want the user flinging their units out of the play window, call ClipCursor() when they click, call it again with ClipCursor(0) when they've released and are allowed to exit the clipping area again.

#5306895 What's a room?

Posted by on 20 August 2016 - 11:03 AM

It is not as bad as you're making it out to be.




As shown in the picture, a room is a set of walls, which are impenetrable navigation objects or physics objects. Most physics engines make this easy, walls are just a rectangle or box added to the level/room.


Moving between rooms is a trigger area in the doorway. It does not have a visual component, just a collision area. Collision with the trigger causes the next room event.


As for them needing identifiers, EVERYTHING needs identifiers. As for them needing to be created/destroyed or loaded/unloaded, EVERYTHING needs to be loaded at some point. You will need to load the room, but that includes everything: the walls, the monsters, the keys/items, and whatever else you've got in your room. Level loading is bog-standard functionality you'll need in everything with a level, from what bricks to display in breakout, to the blocks and platforms in classic Mario games, to all the rocks and obstacles in an MMO. 

#5306835 Multiplayer Web Game - Is SQL fast enough?

Posted by on 19 August 2016 - 08:37 PM

"Fast enough" depends on your needs.  A SQL database works well for persistent data, but is terrible if you need things more than once. Load it, use it for a long time, store periodically as you modify the data.


Simple SQL database operations that hit the disk are usually on the order of about 10ms per call. It can be much faster if it doesn't need to hit the disk, it can be much longer if it is more complex or requires significant data.  For example if you only need simple values from a single table based on an indexed key on a fast machine with SSD storage, you'll likely be a single-digit milliseconds.  But if you are searching for aggregate data or using a non-indexed value and need a full table scan from a 200 gigabyte table on a slower spindle disk, you'll be waiting quite a while.  


For an interactive game, that is about equivalent to a full graphics call, 60 frames per second = 16ms, a bit less once you account for the overhead of other tasks.  However, if your game is involving a whole HTML page being requested and loaded and processed that task is usually on the order of 200ms-500ms, so a few database calls are just fine.  If you need something intermediate from the two, such as server processing, there are systems out there that cache data access so you only incur the full cost of database reads the first time they're encountered and not in the cache. 


Without knowing quite a lot more about your game needs and your architecture, an "it depends" answer is about the best you will get.

#5306834 Handling multiple "levels" or "scenes" within a world

Posted by on 19 August 2016 - 08:27 PM

It depends quite a lot on how your world system operates and what your engine supports.



Most projects I've worked on have used a scene or world hierarchy of sorts. 


You've got nodes in your hierarchy.  The basic world leaves them mostly empty or filled with proxy or placeholder objects for the true content.  This might be individual zones or lots or coordinate regions in a large world. This might be an arbitrary root world node that can have child nodes attached. When the time comes to load content into that area, you create a new node hierarchy, load all the data into it, and then attach the hierarchy by swapping out the proxy/placeholder and inserting the full bundle.


Regarding your confusion about dealing with copies and proxies of unloaded objects, that is resolved readily enough by using persistent ids and decoupling the representation of the data in the simulator from all the other data like models and textures and audio and animation and effects and world information. 


For your example of a tree, the actual tree data is a tiny bit of data about the tree health, possibly a flyweightpiece of data.  That tiny piece of data indicates that it is a tree and that has a health level. The choice of model to display would invoke your tree renderer.  Perhaps every tree is little more than world coordinates, an index of the type of tree, and a value for the tree health.  These 64 bits of data per tree (or maybe even less) are small enough that you can have hundreds of trees visible in your world a few kilobytes of data. 


You've put out a few options for ways you might save them, and they each have their own pros and cons. They might work well with the engine and tools you are using, or they might not. Your description of the scene being little more than a list of entity ids is essentially how many games do it.  When it is stored to disk there is often one set of data that contains the scene hierarchy as a collection of iDs, and another set of data that contains whatever was inside the ID.  Such a system lets all the items get persisted and replaced with their IDs as they are being written out; it ensures that if an item is a clone with multiple instances in the hierarchy then the contents are only written out once.

#5306688 Some problems with ECS design

Posted by on 19 August 2016 - 04:38 AM

Personally I'm in favor of having a virtual update(ComponentStorage& components, float dt) method for every system type, where ComponentStorage holds the components in arrays organized by component type. The systems can be viewed as operators on the state of the components in the engine, where on each frame the internal/external state is updated according to the elapsed time. Relationships between systems can be implemented by giving a system pointers to the other systems it depends on during initialization (where the concrete types of the systems are known). There's no need to shoehorn everything into the update method, systems are free to have other methods that perform other actions (like being notified of added/removed components).

Beware the pattern of virtual functions for all the things. Especially beware of actually calling all the virtual functions on all the things.

That is a pattern that will quickly destroy all performance. You pay a cost for every object you create even if you never implement the behavior.

You can provide virtual functions if you'd like in your interface, but if you do so take care that you don't actually call them if you don't need them. Otherwise you'll be calling hundreds, maybe even thousands, of unnecessary virtual functions. It starts with just one, but soon Update() isn't enough, then you'll add both PreUpdate and PostUpdate(). You'll end up with PreRender() and PostRender(), PrePhysics() and PostPhysics(), and probably more besides. Before long every object has a collection of virtual functions that are called all the time but do absolutely nothing.

I've seen it before on engines I've been brought in to help repair. It is a nasty pattern because it is deceptively appealing. It is easy, right? Just make a virtual method that everybody can implement or ignore with the base functionality. But when it is done the result tends to be that you are burning all your cycles on virtual dispatch to empty functions.

The best pattern tends to be to only call the functions on objects that actually want and need updating. That tends to be through registration. Alternatively it can be through introspective, reflective, or dynamic patterns in languages that support them and they are fast, but as this is tagged as C++, those later options don't really exist.

A less good pattern, but still far better than a ton of useless virtual function calls, is to provide a way to not call virtual functions by using a non-virtual test before the call is ever invoked. For example, a bool in the base class that gets changed in the base call that disables calling the function in the future. For example, in the non-virtual inline base class call you have: if(hasFeatureCall) { MyVirtualFeatureCall();} Then in your base class, MyVirtualFeatureCall() { hasFeatureCall = false; } You're still paying a penalty for every call on every object, but that penalty is far less than jumping out to a virtual function, finding it empty, and returning.

#5306580 Some problems with ECS design

Posted by on 18 August 2016 - 11:52 AM

It looks like these are all premature attempts at optimizations.  The things you are discussing are not problems in the real world.

1)fast (no vtables and random memory access)

If you need virtual dispatch then vtables are currently the fastest available way to implement that. You're going to need some way to call the function.  
On x86 processors since about 1995 virtual dispatch has approximately zero cost, the very first time they are accessed the CPU caches them, and assuming you touch them occasionally they'll stay on the CPU. Since you should be touching them around 60+ times per second, they CPU will happily keep them around for zero cost to you.
In other words, you say you don't want vtables as you think they are not fast; but vtables are the fastest known solution to the task.  Use them.


As for random memory access, unless you can somehow organize your world and scene graphs so data traversal of components is linear, you'll need to live with some of that.  Be smart about it so the jumping around lives in L2 cache.


Random memory access can be amazingly fast, or it can be tortuously slow. The only way to know is to run it on a computer, profile it with cache analysis tools, and determine how your real-world memory patterns are working.  


While you have clearly minimized size in your data (good), you have decreased cache friendliness by making then non-contiguous.  You have a continuous array of structures. The cache is generally most happy with a structure of arrays; parallel instructions are best when operating on a batch of elements at once rather than operating on a single item alone.


There is far more to cache friendliness than size of the data. The only way to know for certain how your program interacts with the cache is to run it with cache analysis tools to determine how your real-world memory patterns are working.

1 Component = 1 System

I need a new component + system

You keep using the word "system" in a way I'm not familiar with.  A system is any group of connected things. Any time you take any series of actions with any object you have created a system.  The interfaces you create define how the system is used.
Did you create a process or class or structure and give it a name "system"?

Intersystem interaction occurs via messaging

This can work reliably, but the thing most developers think of with messaging tends to add performance overhead, not remove it.
If you need to make a function call on an object then do so, that is the fastest way.  Going through a communications messaging service adds a lot of work. 


There are many excellent reasons to use messaging services: allow extension by adding message listeners, resolving threading issues and non-reentrant code, and processing load balancing are a few of them.  Faster execution time is not one of those reasons. 


1. How can I ensure the safety of pointers? std::vector can break my pointers while resizing. (P.S. I don't want have array with a static size)


If you use any type of dynamic array and you add or remove items, you cannot avoid it.  If you want the addresses to remain constant you cannot use a dynamic array.  I suggest learning more about fundamental data structures.
Some other options are to use a different structure (perhaps a linked list style) or to store a reference to an object (such as a container of pointers).  This will break your continuous access data pattern, but give you stable addresses. You need to decide which is more important.
Alternatively you can design your system to not store addresses of items, to work on clusters of items at once, and to not hold references to data they don't own.






If you really are concerned about performance you need to start up a profiler and look at the actual performance. You need to measure and compare with what you expect, you need to find the actual performance concerns. Performance is not something you can observe by just looking at the code alone. You need to actually see how it is moving through the computer.  Generally the best performing code looks complex and is larger than you first expect; the solutions that are simple and small tend to perform poorly as data grows because they don't fit the CPU's best features.


The things you are mentioning are tiny performance gains by themselves, and if you do have any performance concerns on your project they're not coming from these choices mentioned in the thread.

#5306474 Ecs Architecture Efficiency

Posted by on 17 August 2016 - 11:04 PM

Based on all these updates, I'm taking the broader question to be "How many items can I stuff in an ECS game system before it slows down?"



I've worked on systems with well over five thousand articulated models on screen with fully simulated game objects at once before the processing power started to bog down.  I've been brought in on contract to help a project with under 200 static model on screen that could barely maintain 30 frames per second on mainstream hardware. And I've worked on about 20 projects that have ranged far between.


The choice to use an ECS game system has absolutely nothing to do with those performance numbers. 






The biggest determining factors in performance is how you use your time.


Steam Hardware Survey says about half of gamers today (46.91%) still have 2 physical cores, and they're about 2.4 GHz.   So if you're targeting mainstream hardware, you get about five billion cycles per second if you use them all.   Each cycle takes about 0.41 nanoseconds, but we'll call it a half nanosecond for easier math.


You lose a big chunk of that to the operating system and other programs. Let's call your share about 4 billion per second, or about 66 million processor cycles per frame. What you do with those cycles is up to you and your game.


Some tasks are extremely efficient, others are terribly inefficient.  Some tasks are fast and others are slow. Some tasks can block processing until they are done, other tasks can be "fire-and-forget", scheduled for whenever is convenient for the processor.  Sometimes even doing what appears to be exactly the same thing can in fact be radically different things that you didn't know about, giving very different performance numbers for things you didn't think about. 






The most frequent performance factor, and usually the easiest to address, is the algorithm chosen to do a job.


There are algorithms that are extremely efficient and algorithms that are inefficient. As an example, when sorting a random collection of values the bubblesort algorithm is very easy to understand but will be slow.  The quicksort algorithm is harder to understand but will typically be fast.  And there are some more sorting routines out there like introsort that are quite a bit more difficult to implement correctly but can be faster still.


You can choose to use a compute-heavy algorithm when the program is run, or you can change the algorithm to use some data processing at build time in exchange for near-instant processing or precomputed values at runtime. Swap the algorithm to bring the time to nothing, or nearly so.  For example, rather than computing all the lighting and shadowing for a scene continuously, an engine may "bake" all or most of the lighting and shadowing directly into the world.


You can often choose to switch between compute time and compute space, similar to that above. Precomputed values and lookup tables are quite common. In graphics systems it is fairly common to encode all the computing information into a single texture, then replace the compute algorithm with a texture coordinate for lookup. Textures for spherical harmonics are commonplace these days; even if artists don't know the math behind them many can tell you how "SH Maps" work and that they improve performance.


Sometimes it is clear to see places with multiply nested loops, places with exponential computational requirements, code that has known-slow algorithms with known-fast alternatives.  And of course, the fastest work is the work that is never done. 


So you may have an algorithm in place that has n^3 growth.  With 5 items it may take 60 nanoseconds, and that's great. With 10 items it may take 500 nanoseconds, that's fine.  With 100 items it takes 500,000 nanoseconds, and that is not fine.  Swap out the algorithm with some that takes a bit more time per value but has linear performance and those times may become 180ns, 375ns, 3750ns, and all of those are great with a different algorithm.



Algorithm performance sometimes may be reviewed in the source, but other times they may require analysis tools and profiling.






After algorithm selection, one of the biggest performance factors in games is data locality.  It has very little to do with ECS, although some ECS decisions can have a major impact on it.


Basic arithmetic from data already available to the CPU can be done quickly. Processor design allows multiple operations to take place at the same time in internal parallel processing ports, so a single basic arithmetic operation can take place in about one-third of a CPU cycle, or about 0.15ns per operation.  If you are using SIMD operations and the CPU can schedule them on ports in parallel, it can take one-sixteenth of a CPU cycle per value, or about 0.03ns. Those are amazingly fast, and that is why so many programmers talk about ways to leverage SIMD operations, which you might have heard of under the name MMX, SSE, or similar.


But there aren't many registers and L1 cache lines on the processor, and reading from memory is slow. If the data is in L2 cache there is an overhead of about 7ns or about 20 cpu cycles.  If the value is in main memory it takes about 100ns or about 240 CPU cycles.


Cache misses (needing to get something from farther away in memory) and cache eviction (not using what is already in the cache) can completely destroy a game's performance. Jumping around all over memory might not be a bad thing, what matters is cache performance. If you are jumping around all over memory but it is all on the chip's L1 cache it is amazingly fast, jump around on data in the L2 cache and performance drops by a factor of about 100.  Jump around on data requiring loads from main memory and performance drops by a factor of about 10,000. 


ECS systems tend to jump around frequently, but design of the systems can mean it is jumping all over in L2 cache or jumping all over in main memory. It is a fairly minor design change but it makes about a 10x performance difference.  


Two systems that look exactly the same can differ by an order of magnitude in performance based on data locality.  Even the same system can suddenly seem to switch gears from fast to slow when data locality changes.  You cannot spot the differences in data locality performance by reading the source code alone.  





Another major performance factor in games is how you move data around.  


You need to move data between main memory and your CPU, between both of them to your graphics cards, to your sound cards, to your network cards, and to other systems.  The system bus performance depends quite a lot on the hardware. Cheap motherboards and bad chipsets can move very little data at a time and has slow transfer rates. Quality motherboards and good chipsets can move tremendous amounts of data at a time with rapid transfer rates.


While you probably cannot control the hardware, if you know what you are doing you can coordinate how data moves around.


You can send data around from system to system all the time with no thought or regard for size or system effect. This is much like the highway system: sometimes you have near-vacant roads and can travel quickly, other times you'll have tons of cars saturating the road with all the vehicles sitting at a standstill. Your data will eventually get there, but the performance time will be unpredictable and sometimes terrible.


You can take steps to bundle transfers together and take simple steps to ensure systems don't block each other.  This is much like freight trains: huge bundles with cars extending for one or two miles. There is some overhead, but they are efficient.


Or you can take more extreme methods to highly coordinate all your systems and ensure that every system is both properly bundled and carefully scheduled.  This is like mixing the capacity of long freight trains with the speed of bullet trains: enormous throughput, low latency, and everything gets moved directly to the destination with maximum efficiency.


Like memory performance, you cannot spot the differences in bus usage performance by reading the source code alone.







There are many more, but those are normally the biggest impact.  


These factors by themselves will account for the vast majority of the performance characteristics of an engine.  A few minor differences in each of those things mean the difference between a game running at 10 frames per second or running at 100+ frames per second.

#5306083 Initialize your goddamn variables, kiddies

Posted by on 15 August 2016 - 10:27 PM

the fun cousin of that demon, the denormal numbers, hiding in the infinitesimal gap in between zero and "smallest possible float value".


Yes, "Fun".


When dealing with numbers in games with a 1 meter scale, a good sanity test is: "Is this number less than the width of a hair or fingernail?"  Any distance smaller than around 0.0001 generally ought to become 0.