Unique ID

Started by
21 comments, last by ultramailman 11 years, 1 month ago
Some of those probably aren't as useful as you might think. Machines that use two's complement integer representation can still have different integer behavior. Ex: digital signal processors often saturate on overflow rather than wrap. IEEE 754 specifies bit representation, but not byte order, so two machines both using IEEE floats can still lay them out in memory differently, so you still need platform dependent code if you want to serialize the bytes directly. Further, IEEE 754 doesn't fully define how operations are carried out so you will still get differences in computations between platforms. Heck, you can get differences on a single platform just from changing a compiler setting (or at runtime by setting a floating point register). Nonetheless, you can still check if you're using a IEEE floats by checking __STDC_IEC_559__, std::numeric_limits<double>::is_iec559 and so on.
Advertisement

defining the sizes of various numeric types (for example: "char" and "unsigned char" are 8 bits, "short" and "unsigned short" are 16-bits, ...);

C has addressed that with the sized types, like int_32t.


The "old style" / K&R style function declarations have been marked as obsolete since at least 1999, and many current compilers don't accept them any more.
As always though, the C99 spec mentions that they didn't forbid them outright for fear of breaking people's code. C++11 has a similar amount of hoop-jumping, because an updated standard that breaks old code is basically just a new language, not an update.
actually, AFAIK, they are still present in C11 as well, and are still required AFAIK.
I would otherwise assume maybe they could be demoted to "optional" features, since it makes little sense to have a feature as required that only some compilers bother to support (and, likewise, compilers that actually see reason to still support them, can still do so).
like, if VLA's were demoted to an optional feature, why not K&R style declarations?...
part of the controversy seems to be that some people see it like if the old-style declaration syntax were dropped, support for "()" style empty argument lists would also need to be dropped. I personally think it more sensible to drop the old-style declarations, but keep "()" as it is.
the issue with integer sizes (for core integer types) is that a lot of code assumes their sizes, and wouldn't likely work correctly on targets where the sizes differ.
yes, this is more of a profile issue than a legacy issue.

all this doesn't even need to be "the standard", but maybe could be a "standardized profile".

Yeah it would defeat the purpose of the whole C language to nail down all of those things, which are basically hardware details, as then the language couldn't be ported to other types of hardware. Having sub-standards /profiles for certain platforms, e.g. "C for x86" makes a bit more sense though wink.png

if you don't nail down endianess, then the "profile" could apply equally to x86, PowerPC, and ARM.

if endianess is specified (say, as LE), then it is mostly x86 and most ARM devices.

IIRC, both PPC and ARM are bi-endian, but typically people run ARM devices in little-endian mode, and PPC in big-endian mode.

if it gets more specific, like whether or not unaligned load/stores are allowed, ... then it is probably target specific.

as for endianess, maybe 95% of the time, there is not much reason to care, and in the few cases where there is reason to care, it may make sense to have some way to specify it, and have the compiler emulate it if needed (like, say, if we say "this value needs to be LE or BE", then likely, the code has already agreed to pay for any requisite byte-swapping on loads/stores).

per-structure or per-value indication could be more useful than specifying it globally though, where the global endianess could still be left unspecified.

then, one can have a structure or similar, and be able to say that, with a compiler implementing this profile, and with a structure following the relevant rules, it is possible to know the exact byte-for-byte layout of the structure.

such a profile, if it existed, could still be "reasonably" portable within a range of targets, and possibly any points of disagreement could be emulated.

this would then make it a little more like Java or C# in these regards.

granted, this would be N/A for some targets, but these targets need not implement this profile.

Empty argument lists in C is a terrible idea! It's also different to C++ where it means the function is void. It just seems like an incredibly lazy way to prototype a function and what's more the compiler doesn't complain when you pass the wrong number or type of arguments!!! It's basically the same as using ellipsis argument list except without the requirement that you have at least one argument before the ellipsis.

At least one bug has been incredibly hard to track down in our codebase because of this "feature" (i.e. someone does a local prototype in a c file for a function that was once void, and it continues to work (incorrectly of course) without complaint when someone added an argument to the function).

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley


Oh, I didn't know it was for c++. Does it apply for c99? It would be nice if it is defined behaviour for c99.

Yes sorry, the situation is the same for both C/C++ -- the spec says it doesn't have to work, but the compiler vendors say that if you want to willingly break the strict aliasing rule, you should do so via a union.

According to the spec, the only way to do this generally is:
struct uid parts;
parts.blah = ...;
int32_t id;
memcpy( &id, &parts, 4 );
This is pretty silly, and the compiler vendors agree this is silly. So, they chose allow you to (ab)use unions in this way, even though the spec says they don't have to allow it.


Ah, thanks for clarifying. After some pondering, I realized I can just concatenate them with shift and OR normally. I guess I was trying too hard to use cool union tricks.

int32_t cell_x = uid.cell_x;
int32_t cell_y = uid.cell_y;
int32_t n = uid.n;
int32_t id = n | (cell_x << 16) | (cell_y << 24)
Will this work on systems with different endianess?

It will work, the binary representation will be different though.

Have a look at the functions ntohl and htonl which you can call to do endian swaps before sending things across a network between machines of different endianness.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Ah, thanks for clarifying. After some pondering, I realized I can just concatenate them with shift and OR normally. I guess I was trying too hard to use cool union tricks.


int32_t cell_x = uid.cell_x;
int32_t cell_y = uid.cell_y;
int32_t n = uid.n;
int32_t id = n | (cell_x << 16) | (cell_y << 24)


Be careful. This may not work how you expect if your values can be negative. You might want to cast to unsigned ints of the appropriate size before doing the bit manipulation.

Have a look at the functions ntohl and htonl which you can call to do endian swaps before sending things across a network between machines of different endianness.

Oh, thanks. I take it those two functions are what I need if I want to store my values in a file portably, yes?

Be careful. This may not work how you expect if your values can be negative. You might want to cast to unsigned ints of the appropriate size before doing the bit manipulation.

Thanks, what a gotcha. So if I shift a signed number, the sign bit will not change, yes? And if I cast to unsigned, I can cast that unsigned back to signed to get the same value?

Empty argument lists in C is a terrible idea! It's also different to C++ where it means the function is void. It just seems like an incredibly lazy way to prototype a function and what's more the compiler doesn't complain when you pass the wrong number or type of arguments!!! It's basically the same as using ellipsis argument list except without the requirement that you have at least one argument before the ellipsis.

At least one bug has been incredibly hard to track down in our codebase because of this "feature" (i.e. someone does a local prototype in a c file for a function that was once void, and it continues to work (incorrectly of course) without complaint when someone added an argument to the function).

yes, but this is more about what would be defined by the standard, not about whether or not it is good practice.

the issue is, there is still a lot of code floating around which would break if you took away "()", but relatively little which would break if the rest of old-style declarations were taken away.

(FWIW, my C compiler had effectively dropped both, making "()" behave like "(void)" and treating relying upon the old-style semantics as a warning).

but, anyways, to clarify a few of the thoughts: the endianess specifiers would probably be as special preprocessor defines, which would have undefined behavior (probably being no-op) if the compiler doesn't support the feature. preprocessor defines could exist to specify whether the feature is present and works.

the declaration ordering restrictions would be more subtle, probably placing a few restrictions like:

type qualifiers will precede specifiers (except in certain conditions);

other type specifiers and user-defined types (typedef-name) will be mutually exclusive;

only one (and exactly one) user-defined type may be referenced as part of a given declaration type;

...

most existing code already does this, and making this an optional requirement can result in a parser speedup (though, as-is, a command-line option can achieve similar effect). basically, it allows eliminating most cases where it is necessary to check whether or not an identifier is a known typedef (IME: this is where a big chunk of the time goes when parsing declarations from headers, at least in my tools).

Thanks, what a gotcha. So if I shift a signed number, the sign bit will not change, yes?

Really the problem is sign extension when moving to a larger size. Let's say n is -5. As a int16_t your computer might represent that as 1111 1111 1111 1011. When you cast that to a int32_t it might become 1111 1111 1111 1111 1111 1111 1111 1011; all the upper bits are 1. Trying to do a bitwise or with that will wipe out any information that you want to get from cell_x and a cell_y. If you cast it to a uint16_t first you'll get something like 0000 0000 0000 0000 1111 1111 1111 1011 which you can bitwise-or things in the upper bits with.

And if I cast to unsigned, I can cast that unsigned back to signed to get the same value?

To be pedantic whether or not a cast from signed to unsigned and back will give you the same number is implementation defined. However, I've never worked on a platform where you wouldn't get the same value from a round trip.

Well the solution is to mask out the bits you don't want before you do the shift... looks like your x, y are 8 bits and n is 16 bits so do

int32_t id = (n & 0xffff) | ((cell_x & 0xff) << 16) | ((cell_y & 0xff) << 24)

EDIT: Missed an effing eff

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

This topic is closed to new replies.

Advertisement