String handling in C

Started by
31 comments, last by cr88192 11 years, 1 month ago

Pretty much everything a game is bound to need can be converted to integers and such.

What if your game mostly revolves around text?

Advertisement

Though I do know of an engine that requires you to fetch all resources through strings. And you need to do this every time you pretend to use a resource, and consider about every object in the map is bound to use at least one or two resources of those every frame... (a similar issue happens if you use string indices instead of integer indices for arrays)

eh?, that sounds really broken, can't you hold an pointer to the resource?, or does the resource manager control how the resource is also used(which still sounds broken to me)?

Custom scripting language >.>' But even then, it could have had a retrieve ID function or something, but nope, the string is the ID, so e.g. if you want to play a sound effect you need to pass the name of the sound effect, if you want to switch to a specific sprite you need to pass the name of the sprite, etc. I suppose it's sorta mitigated by hashing, but integers/pointers/whatever as IDs would still have been a ton faster compared to strings.

As far as I know that engine was never pushed to its limits yet, so maybe that's why nobody got bothered by it in the first place. I presume some time in the future that will eventually happen, though. The only upside of its approach is that it may be slightly easier for beginners to get running.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.


Since when were string operations a major performance bottleneck in games?


Unless one is doing a lot of text-parsing at runtime (in which case it may be an appropriate topic for a new post in this subforum...)


or, if major pieces of engine infrastructure are based on strings...

(good or not, it can sometimes end up happening this way...).


this is basically things like using strings to identify things, and using constructs like:
if(!strcmp(str, "_foo_t"))
{
...
}else if(!strcmp(str, "_bar_t"))
{
...
}else if ...

which, if not careful, can end up eating a lot of time, and then one is left to try to figure out why "strcmp()" has jumped to the top of the list in the profiler (*1).

but, at least, one can intern the strings, and in these cases using '==' and '!=' on the pointers can lead to slightly faster string comparisons (but has other drawbacks, like often the need to cache literals in variables, or resort to ugly hacks). (if both are already interned, it is basically just the cost of the pointer comparison).

some of this may result because strings are self-describing and easier to use as decentralized unique IDs than integers, and generally also easier to work with than GUIDs.

ADD: another past trick is to basically use a hash-table to quickly map a string to an integer based index (the position of the string within an array of strings), and then use this index with a "switch()", which can at least generally be faster than a long strcmp() based if/else chain, and generally comparing favorably to a big nested switch (less awful looking, and also faster in many cases).


*1: like, one time in my renderer (earlier on), I ended up profiling things, and observing that "strcmp()" was at the top of the profiler list. I then looked into it and found that this was because an inner loop (related to querying objects) was falling back to one of these strcmp() if/else chains (dispatching to the logic for each specific model type) for each iteration of the loop (which at the time was also a linear search over every object in the world).

things have improved at least slightly since then (much of this logic has since been moved to vtables, ...).

(actually, much of the engine runs on top of a dynamic type-system, itself based mostly around string-based type-ID names, which are used for pretty much every heap-allocated object in the engine, ...).

nevermind cases where strings and while-loops directly drive program logic in a few places (typically "type signature strings", ...), ...

and also the frequent use of strings to identify things like entity field-names, the contents of a database-like structure, file paths, ...


also if one builds parts of their logic on top of working with DOM-like XML trees or similar (like, using XML trees as a data-structure for representing other data), this can also involve using a lot of strings. historically, some code had also worked largely by walking XML trees and dispatching to logic, but most of this code went away as the performance was often a bit lacking (the only major examples left have since largely been relegated to offline tools).

a lot of other code uses walking Lisp-like lists instead, which are a bit faster. (lists are basically a tree-structure composed of linked-lists of "cons-cells", with each list holding a string identifying its contents, ...).


so, depending on the code, strings can be a big deal.

though, the it makes sense to avoid a lot of stuff like this in performance-critical areas or as part of the main execution path.

This topic is closed to new replies.

Advertisement