Unique ID

Started by
21 comments, last by ultramailman 11 years, 1 month ago
Hello.
I've figured out the format of object UID's in my 2d game.

It will be like this:
struct uid{
        int16_t n; //the nth object of a chunk of world
        int8_t cell_x; //coordinate of the chunk the object belongs to
        int8_t cell_y;
};
It would be nice if I can use it as a number:
union obj_uid{
        int32_t id;
        struct uid parts;
};
However, if I remember correctly, that is undefined behaviour (write to one field and read from another).

Is there a portable way to treat "struct uid" as one integer, or should I just use "struct uid" as the sole type for uid?
Advertisement
What happens when something moves? Does its ID suddenly change?


When generating unique IDs, is it often best to just start with number one and hand out unique integers as objects get created.

What happens when something moves? Does its ID suddenly change?


When generating unique IDs, is it often best to just start with number one and hand out unique integers as objects get created.

The UID will not change. An object can move outside of the chunk it belongs to, but it will always keep its UID. That will allow the loading of near chunks and unloading of far chunks, without duplicating the same object after a load, unload, load sequence.

I could use an incrementing number like you said, but wouldn't that require the UID to be saved in a file as well?

What happens when something moves? Does its ID suddenly change?


When generating unique IDs, is it often best to just start with number one and hand out unique integers as objects get created.

yep.

another strategy (for longer-lived/persistent UIDs) is to generate a large (typically 128-bit) random number.

if the random number generator is good (generates true random numbers), the chance of a random collision is statistically very low.

for a variant of this, I have often used 96-bit numbers, sometimes represented in ASCII as base48 (A-X, a-x), which requires 18 digits.

though, often something like this is not needed:

if the lifetime is limited or some other way to ensure uniqueness exists, this may make more sense instead;

also, if the unique value is being generated by humans, often it may be preferable to instead use a string value (humans are generally much better at generating unique strings, and more often, these strings will have some sort of semantic meaning as well).

However, if I remember correctly, that is undefined behaviour (write to one field and read from another).

Yes, this is implementation-defined, and it breaks the strict aliasing rule... however, on every C++ compiler that I know of, it works as intended. If you're going to intentionally break the strict aliasing rule, the compiler vendors actually recommend doing so via a union like this, because they do support this behaviour.

So, despite the standard saying one thing, compiler vendors actually do the exact opposite in this case.

another strategy (for longer-lived/persistent UIDs) is to generate a large (typically 128-bit) random number.

Interesting. It sounds similar to the hashing of passwords. But by generating a number, does that mean I have to store the number in a save file? It's preferable that I don't have to do this, that's why I thought of concatenating the cell coordinate and the object number instead.

Yes, this is implementation-defined, and it breaks the strict aliasing rule... however, on every C++ compiler that I know of, it works as intended. If you're going to intentionally break the strict aliasing rule, the compiler vendors actually recommend doing so via a union like this, because they do support this behaviour.

So, despite the standard saying one thing, compiler vendors actually do the exact opposite in this case.

Oh, I didn't know it was for c++. Does it apply for c99? It would be nice if it is defined behaviour for c99.

Oh, I didn't know it was for c++. Does it apply for c99? It would be nice if it is defined behaviour for c99.

Yes sorry, the situation is the same for both C/C++ -- the spec says it doesn't have to work, but the compiler vendors say that if you want to willingly break the strict aliasing rule, you should do so via a union.

According to the spec, the only way to do this generally is:

struct uid parts;
parts.blah = ...;
int32_t id;
memcpy( &id, &parts, 4 );

This is pretty silly, and the compiler vendors agree this is silly. So, they chose allow you to (ab)use unions in this way, even though the spec says they don't have to allow it.

does bring up the idle thought of if there were C and C++ standards that better reflected what typical compilers and architectures already do.

for example:

defining the sizes of various numeric types (for example: "char" and "unsigned char" are 8 bits, "short" and "unsigned short" are 16-bits, ...);

defining the use of things like two's complement or IEEE floats (or at least the implementation behaving as-if these were used);

better defining the behavior of structs and data are laid out in memory (for example, defining struct packing behavior);

defining that char is signed (this is how most common compilers define it), vs leaving it undefined;

...

and probably also make optional some features people probably haven't really used in decades, such as trigraphs, which some common compilers (such as GCC) complain about if/when they are ever seen;

likewise, probably formally drop old-style declarations ("int foo(a, b) int a, b; { ... }");

...

secondarily (more drastic / extensions):

place some "sane" restrictions of the ordering of type qualifiers and specifiers (to ease more efficient parsing);

maybe define some mechanism to indicate the endianess of stuct-members (*1);

...

*1: unlike a lot of the other factors, both big and little endian are in common use, mostly as relevant to file-formats.

unlike most of the other changes listed, this would actually result in a visible effect.

all this doesn't even need to be "the standard", but maybe could be a "standardized profile".

I don't think you will get a standard that clearly requires those things. As nice as it would be to have a clear definition of how bits in a bit-field are assigned or what the exact size/format of a type has to be, the result would be that somewhere someone starts complaining about why the standard requires things to be done in a way that would be inefficient for his particular platform.

A "profile" might be an interesting compromise at first, but vendors would probably just ignore them and stick with their existing implementations (that or spend the next decade arguing about which way to do each and every of the million things that are "undefined" or "implementation dependent")..

f@dzhttp://festini.device-zero.de

defining the sizes of various numeric types (for example: "char" and "unsigned char" are 8 bits, "short" and "unsigned short" are 16-bits, ...);

C has addressed that with the sized types, like int_32t.


The "old style" / K&R style function declarations have been marked as obsolete since at least 1999, and many current compilers don't accept them any more.
As always though, the C99 spec mentions that they didn't forbid them outright for fear of breaking people's code. C++11 has a similar amount of hoop-jumping, because an updated standard that breaks old code is basically just a new language, not an update.

all this doesn't even need to be "the standard", but maybe could be a "standardized profile".

Yeah it would defeat the purpose of the whole C language to nail down all of those things, which are basically hardware details, as then the language couldn't be ported to other types of hardware. Having sub-standards /profiles for certain platforms, e.g. "C for x86" makes a bit more sense though wink.png

This topic is closed to new replies.

Advertisement