Jump to content

  • Log In with Google      Sign In   
  • Create Account

Unique ID


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
22 replies to this topic

#1 ultramailman   Prime Members   -  Reputation: 1572

Like
0Likes
Like

Posted 25 March 2013 - 10:48 PM

Hello.
I've figured out the format of object UID's in my 2d game.

It will be like this:
struct uid{
        int16_t n; //the nth object of a chunk of world
        int8_t cell_x; //coordinate of the chunk the object belongs to
        int8_t cell_y;
};
It would be nice if I can use it as a number:
union obj_uid{
        int32_t id;
        struct uid parts;
};
However, if I remember correctly, that is undefined behaviour (write to one field and read from another).

Is there a portable way to treat "struct uid" as one integer, or should I just use "struct uid" as the sole type for uid?

Edited by ultramailman, 25 March 2013 - 10:50 PM.


Sponsor:

#2 frob   Moderators   -  Reputation: 21479

Like
1Likes
Like

Posted 25 March 2013 - 11:41 PM

What happens when something moves? Does its ID suddenly change?


When generating unique IDs, is it often best to just start with number one and hand out unique integers as objects get created.
Check out my personal indie blog at bryanwagstaff.com.

#3 ultramailman   Prime Members   -  Reputation: 1572

Like
0Likes
Like

Posted 25 March 2013 - 11:58 PM

What happens when something moves? Does its ID suddenly change?


When generating unique IDs, is it often best to just start with number one and hand out unique integers as objects get created.

The UID will not change. An object can move outside of the chunk it belongs to, but it will always keep its UID. That will allow the loading of near chunks and unloading of far chunks, without duplicating the same object after a load, unload, load sequence.

I could use an incrementing number like you said, but wouldn't that require the UID to be saved in a file as well?

#4 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 26 March 2013 - 12:46 AM

What happens when something moves? Does its ID suddenly change?


When generating unique IDs, is it often best to just start with number one and hand out unique integers as objects get created.

 

yep.

 

another strategy (for longer-lived/persistent UIDs) is to generate a large (typically 128-bit) random number.

if the random number generator is good (generates true random numbers), the chance of a random collision is statistically very low.

 

for a variant of this, I have often used 96-bit numbers, sometimes represented in ASCII as base48 (A-X, a-x), which requires 18 digits.

 

though, often something like this is not needed:

if the lifetime is limited or some other way to ensure uniqueness exists, this may make more sense instead;

also, if the unique value is being generated by humans, often it may be preferable to instead use a string value (humans are generally much better at generating unique strings, and more often, these strings will have some sort of semantic meaning as well).



#5 Hodgman   Moderators   -  Reputation: 30432

Like
4Likes
Like

Posted 26 March 2013 - 01:43 AM

However, if I remember correctly, that is undefined behaviour (write to one field and read from another).

Yes, this is implementation-defined, and it breaks the strict aliasing rule... however, on every C++ compiler that I know of, it works as intended. If you're going to intentionally break the strict aliasing rule, the compiler vendors actually recommend doing so via a union like this, because they do support this behaviour.

So, despite the standard saying one thing, compiler vendors actually do the exact opposite in this case.



#6 ultramailman   Prime Members   -  Reputation: 1572

Like
0Likes
Like

Posted 26 March 2013 - 01:58 AM

another strategy (for longer-lived/persistent UIDs) is to generate a large (typically 128-bit) random number.

Interesting. It sounds similar to the hashing of passwords. But by generating a number, does that mean I have to store the number in a save file? It's preferable that I don't have to do this, that's why I thought of concatenating the cell coordinate and the object number instead.

Yes, this is implementation-defined, and it breaks the strict aliasing rule... however, on every C++ compiler that I know of, it works as intended. If you're going to intentionally break the strict aliasing rule, the compiler vendors actually recommend doing so via a union like this, because they do support this behaviour.

So, despite the standard saying one thing, compiler vendors actually do the exact opposite in this case.

Oh, I didn't know it was for c++. Does it apply for c99? It would be nice if it is defined behaviour for c99.

#7 Hodgman   Moderators   -  Reputation: 30432

Like
1Likes
Like

Posted 26 March 2013 - 02:05 AM

Oh, I didn't know it was for c++. Does it apply for c99? It would be nice if it is defined behaviour for c99.

Yes sorry, the situation is the same for both C/C++ -- the spec says it doesn't have to work, but the compiler vendors say that if you want to willingly break the strict aliasing rule, you should do so via a union.

 

According to the spec, the only way to do this generally is:

struct uid parts;
parts.blah = ...;
int32_t id;
memcpy( &id, &parts, 4 );

This is pretty silly, and the compiler vendors agree this is silly. So, they chose allow you to (ab)use unions in this way, even though the spec says they don't have to allow it.



#8 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 26 March 2013 - 03:28 AM

does bring up the idle thought of if there were C and C++ standards that better reflected what typical compilers and architectures already do.

 

for example:

defining the sizes of various numeric types (for example: "char" and "unsigned char" are 8 bits, "short" and "unsigned short" are 16-bits, ...);

defining the use of things like two's complement or IEEE floats (or at least the implementation behaving as-if these were used);

better defining the behavior of structs and data are laid out in memory (for example, defining struct packing behavior);

defining that char is signed (this is how most common compilers define it), vs leaving it undefined;

...

 

and probably also make optional some features people probably haven't really used in decades, such as trigraphs, which some common compilers (such as GCC) complain about if/when they are ever seen;

likewise, probably formally drop old-style declarations ("int foo(a, b) int a, b; { ... }");

...

 

secondarily (more drastic / extensions):

place some "sane" restrictions of the ordering of type qualifiers and specifiers (to ease more efficient parsing);

maybe define some mechanism to indicate the endianess of stuct-members (*1);

...

 

*1: unlike a lot of the other factors, both big and little endian are in common use, mostly as relevant to file-formats.

unlike most of the other changes listed, this would actually result in a visible effect.

 

 

all this doesn't even need to be "the standard", but maybe could be a "standardized profile".



#9 Trienco   Crossbones+   -  Reputation: 2195

Like
0Likes
Like

Posted 26 March 2013 - 08:05 AM

I don't think you will get a standard that clearly requires those things. As nice as it would be to have a clear definition of how bits in a bit-field are assigned or what the exact size/format of a type has to be, the result would be that somewhere someone starts complaining about why the standard requires things to be done in a way that would be inefficient for his particular platform.

 

A "profile" might be an interesting compromise at first, but vendors would probably just ignore them and stick with their existing implementations (that or spend the next decade arguing about which way to do each and every of the million things that are "undefined" or "implementation dependent")..


f@dzhttp://festini.device-zero.de

#10 Hodgman   Moderators   -  Reputation: 30432

Like
1Likes
Like

Posted 26 March 2013 - 08:36 AM

defining the sizes of various numeric types (for example: "char" and "unsigned char" are 8 bits, "short" and "unsigned short" are 16-bits, ...);

C has addressed that with the sized types, like int_32t.


The "old style" / K&R style function declarations have been marked as obsolete since at least 1999, and many current compilers don't accept them any more.
 
As always though, the C99 spec mentions that they didn't forbid them outright for fear of breaking people's code. C++11 has a similar amount of hoop-jumping, because an updated standard that breaks old code is basically just a new language, not an update.

 

all this doesn't even need to be "the standard", but maybe could be a "standardized profile".

Yeah it would defeat the purpose of the whole C language to nail down all of those things, which are basically hardware details, as then the language couldn't be ported to other types of hardware. Having sub-standards /profiles for certain platforms, e.g. "C for x86" makes a bit more sense though wink.png


Edited by Hodgman, 26 March 2013 - 08:37 AM.


#11 SiCrane   Moderators   -  Reputation: 9604

Like
0Likes
Like

Posted 26 March 2013 - 09:34 AM

Some of those probably aren't as useful as you might think. Machines that use two's complement integer representation can still have different integer behavior. Ex: digital signal processors often saturate on overflow rather than wrap. IEEE 754 specifies bit representation, but not byte order, so two machines both using IEEE floats can still lay them out in memory differently, so you still need platform dependent code if you want to serialize the bytes directly. Further, IEEE 754 doesn't fully define how operations are carried out so you will still get differences in computations between platforms. Heck, you can get differences on a single platform just from changing a compiler setting (or at runtime by setting a floating point register). Nonetheless, you can still check if you're using a IEEE floats by checking __STDC_IEC_559__, std::numeric_limits<double>::is_iec559 and so on.

#12 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 26 March 2013 - 02:22 PM

defining the sizes of various numeric types (for example: "char" and "unsigned char" are 8 bits, "short" and "unsigned short" are 16-bits, ...);

C has addressed that with the sized types, like int_32t.


The "old style" / K&R style function declarations have been marked as obsolete since at least 1999, and many current compilers don't accept them any more.
 
As always though, the C99 spec mentions that they didn't forbid them outright for fear of breaking people's code. C++11 has a similar amount of hoop-jumping, because an updated standard that breaks old code is basically just a new language, not an update.
 
actually, AFAIK, they are still present in C11 as well, and are still required AFAIK.
 
I would otherwise assume maybe they could be demoted to "optional" features, since it makes little sense to have a feature as required that only some compilers bother to support (and, likewise, compilers that actually see reason to still support them, can still do so).
 
like, if VLA's were demoted to an optional feature, why not K&R style declarations?...
 
part of the controversy seems to be that some people see it like if the old-style declaration syntax were dropped, support for "()" style empty argument lists would also need to be dropped. I personally think it more sensible to drop the old-style declarations, but keep "()" as it is.
 
 
the issue with integer sizes (for core integer types) is that a lot of code assumes their sizes, and wouldn't likely work correctly on targets where the sizes differ.
yes, this is more of a profile issue than a legacy issue.
 

all this doesn't even need to be "the standard", but maybe could be a "standardized profile".

Yeah it would defeat the purpose of the whole C language to nail down all of those things, which are basically hardware details, as then the language couldn't be ported to other types of hardware. Having sub-standards /profiles for certain platforms, e.g. "C for x86" makes a bit more sense though wink.png

 

if you don't nail down endianess, then the "profile" could apply equally to x86, PowerPC, and ARM.

 

if endianess is specified (say, as LE), then it is mostly x86 and most ARM devices.

IIRC, both PPC and ARM are bi-endian, but typically people run ARM devices in little-endian mode, and PPC in big-endian mode.

 

if it gets more specific, like whether or not unaligned load/stores are allowed, ... then it is probably target specific.

 

 

as for endianess, maybe 95% of the time, there is not much reason to care, and in the few cases where there is reason to care, it may make sense to have some way to specify it, and have the compiler emulate it if needed (like, say, if we say "this value needs to be LE or BE", then likely, the code has already agreed to pay for any requisite byte-swapping on loads/stores).

 

per-structure or per-value indication could be more useful than specifying it globally though, where the global endianess could still be left unspecified.

 

then, one can have a structure or similar, and be able to say that, with a compiler implementing this profile, and with a structure following the relevant rules, it is possible to know the exact byte-for-byte layout of the structure.

 

 

such a profile, if it existed, could still be "reasonably" portable within a range of targets, and possibly any points of disagreement could be emulated.

this would then make it a little more like Java or C# in these regards.

 

granted, this would be N/A for some targets, but these targets need not implement this profile.


Edited by cr88192, 26 March 2013 - 02:26 PM.


#13 Paradigm Shifter   Crossbones+   -  Reputation: 5380

Like
0Likes
Like

Posted 26 March 2013 - 02:45 PM

Empty argument lists in C is a terrible idea! It's also different to C++ where it means the function is void. It just seems like an incredibly lazy way to prototype a function and what's more the compiler doesn't complain when you pass the wrong number or type of arguments!!! It's basically the same as using ellipsis argument list except without the requirement that you have at least one argument before the ellipsis.

 

At least one bug has been incredibly hard to track down in our codebase because of this "feature" (i.e. someone does a local prototype in a c file for a function that was once void, and it continues to work (incorrectly of course) without complaint when someone added an argument to the function).


"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#14 ultramailman   Prime Members   -  Reputation: 1572

Like
0Likes
Like

Posted 26 March 2013 - 03:40 PM


Oh, I didn't know it was for c++. Does it apply for c99? It would be nice if it is defined behaviour for c99.

Yes sorry, the situation is the same for both C/C++ -- the spec says it doesn't have to work, but the compiler vendors say that if you want to willingly break the strict aliasing rule, you should do so via a union.
 
According to the spec, the only way to do this generally is:
 
struct uid parts;
parts.blah = ...;
int32_t id;
memcpy( &id, &parts, 4 );
This is pretty silly, and the compiler vendors agree this is silly. So, they chose allow you to (ab)use unions in this way, even though the spec says they don't have to allow it.


Ah, thanks for clarifying. After some pondering, I realized I can just concatenate them with shift and OR normally. I guess I was trying too hard to use cool union tricks.
int32_t cell_x = uid.cell_x;
int32_t cell_y = uid.cell_y;
int32_t n = uid.n;
int32_t id = n | (cell_x << 16) | (cell_y << 24)
Will this work on systems with different endianess?

#15 Paradigm Shifter   Crossbones+   -  Reputation: 5380

Like
0Likes
Like

Posted 26 March 2013 - 03:56 PM

It will work, the binary representation will be different though.

 

Have a look at the functions ntohl and htonl which you can call to do endian swaps before sending things across a network between machines of different endianness.


"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#16 SiCrane   Moderators   -  Reputation: 9604

Like
0Likes
Like

Posted 26 March 2013 - 04:06 PM

Ah, thanks for clarifying. After some pondering, I realized I can just concatenate them with shift and OR normally. I guess I was trying too hard to use cool union tricks.

int32_t cell_x = uid.cell_x;
int32_t cell_y = uid.cell_y;
int32_t n = uid.n;
int32_t id = n | (cell_x << 16) | (cell_y << 24)


Be careful. This may not work how you expect if your values can be negative. You might want to cast to unsigned ints of the appropriate size before doing the bit manipulation.

#17 ultramailman   Prime Members   -  Reputation: 1572

Like
0Likes
Like

Posted 26 March 2013 - 04:24 PM

Have a look at the functions ntohl and htonl which you can call to do endian swaps before sending things across a network between machines of different endianness.

Oh, thanks. I take it those two functions are what I need if I want to store my values in a file portably, yes?

Be careful. This may not work how you expect if your values can be negative. You might want to cast to unsigned ints of the appropriate size before doing the bit manipulation.

Thanks, what a gotcha. So if I shift a signed number, the sign bit will not change, yes? And if I cast to unsigned, I can cast that unsigned back to signed to get the same value?

#18 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 26 March 2013 - 05:03 PM

Empty argument lists in C is a terrible idea! It's also different to C++ where it means the function is void. It just seems like an incredibly lazy way to prototype a function and what's more the compiler doesn't complain when you pass the wrong number or type of arguments!!! It's basically the same as using ellipsis argument list except without the requirement that you have at least one argument before the ellipsis.

 

At least one bug has been incredibly hard to track down in our codebase because of this "feature" (i.e. someone does a local prototype in a c file for a function that was once void, and it continues to work (incorrectly of course) without complaint when someone added an argument to the function).

 

yes, but this is more about what would be defined by the standard, not about whether or not it is good practice.

the issue is, there is still a lot of code floating around which would break if you took away "()", but relatively little which would break if the rest of old-style declarations were taken away.

 

(FWIW, my C compiler had effectively dropped both, making "()" behave like "(void)" and treating relying upon the old-style semantics as a warning).

 

 

but, anyways, to clarify a few of the thoughts: the endianess specifiers would probably be as special preprocessor defines, which would have undefined behavior (probably being no-op) if the compiler doesn't support the feature. preprocessor defines could exist to specify whether the feature is present and works.

 

the declaration ordering restrictions would be more subtle, probably placing a few restrictions like:

type qualifiers will precede specifiers (except in certain conditions);

other type specifiers and user-defined types (typedef-name) will be mutually exclusive;

only one (and exactly one) user-defined type may be referenced as part of a given declaration type;

...

 

most existing code already does this, and making this an optional requirement can result in a parser speedup (though, as-is, a command-line option can achieve similar effect). basically, it allows eliminating most cases where it is necessary to check whether or not an identifier is a known typedef (IME: this is where a big chunk of the time goes when parsing declarations from headers, at least in my tools).


Edited by cr88192, 26 March 2013 - 05:06 PM.


#19 SiCrane   Moderators   -  Reputation: 9604

Like
1Likes
Like

Posted 26 March 2013 - 07:06 PM

Thanks, what a gotcha. So if I shift a signed number, the sign bit will not change, yes?

Really the problem is sign extension when moving to a larger size. Let's say n is -5. As a int16_t your computer might represent that as 1111 1111 1111 1011. When you cast that to a int32_t it might become 1111 1111 1111 1111 1111 1111 1111 1011; all the upper bits are 1. Trying to do a bitwise or with that will wipe out any information that you want to get from cell_x and a cell_y. If you cast it to a uint16_t first you'll get something like 0000 0000 0000 0000 1111 1111 1111 1011 which you can bitwise-or things in the upper bits with.

And if I cast to unsigned, I can cast that unsigned back to signed to get the same value?

To be pedantic whether or not a cast from signed to unsigned and back will give you the same number is implementation defined. However, I've never worked on a platform where you wouldn't get the same value from a round trip.

#20 Paradigm Shifter   Crossbones+   -  Reputation: 5380

Like
0Likes
Like

Posted 26 March 2013 - 07:13 PM

Well the solution is to mask out the bits you don't want before you do the shift... looks like your x, y are 8 bits and n is 16 bits so do

 

int32_t id = (n & 0xffff) | ((cell_x & 0xff) << 16) | ((cell_y & 0xff) << 24)

 

EDIT: Missed an effing eff


Edited by Paradigm Shifter, 26 March 2013 - 07:20 PM.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS