• Advertisement
Sign in to follow this  

Unity What are the pros/cons of using exact-width integer types in a portable C or C++ API?

This topic is 408 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone. It's been a really long time since I've been part of this community, but I can think of no forums I've liked better than this. Fair warning - I posted this on Stack Overflow and, frustratingly, people said it was too broad of a topic, and won't explain how I can improve the question. So I'm posting this here with the hopes that I can have a reasonable discussion, and maybe come to an answer. Anywho...

It seems fairly common for highly-portable APIs, such as OpenGL, Vulkan, or glib,to define types and parameters in terms of exact-width integer sizes. Those APIs often cite "portability reasons", without really providing detailed rationale. Many languages these days seem to have their numeric types defined as exact-width types as well (ex: Rust).

I have an API function, for example, where an input must be restricted to a 16-bit unsigned integer. However, I don't really care if the type of parameter allows values greater than 65535. There are additional validation checks, as well as policy and documentation, in place that ensure the value is in the correct range, or the input is rejected. So I could get away with defining the type as unsigned intuint_least16_t, or uint_fast16_t.

As another example, in code that does bit manipulation, I don't really care if the type has more than 16 bits, as long as it has at least that many. So I would use one of the aforementioned types and mask off just the bits I'm interested in.

So what are the benefits and drawbacks of the exact-width types in these cases? I'm particularly interested in the API case. Do exact-width types make it easier for bindings for other languages? Are there other reasons? I feel like I'm missing a piece of the portability puzzle. I'm just looking for why some APIs use exact-width types, when possibly inexact types would be fine, citing portability. But I'm not just going to blindly trust and follow their example for my own code without some rationale.

At work I've never really had to focus much on different architectures. Pretty much everything we do is either x86, x86_64, or PPC64, or is code that is not designed to be portable anyways, like device drivers or embedded software. Over the lifetime of a typical project, the architecture does not change - we just pick something and run with it. So this question is mainly out of curiosity.

Share this post


Link to post
Share on other sites
Advertisement

Binary data.

Binary data is read from files, sent across networks, submitted via API calls.  You need to know the precise size of types in order to correctly interpret binary data.

Share this post


Link to post
Share on other sites

sizeof( uint32_t ) always returns 4.

 

That is not entirely true. It's only true when char is 8 bit. There are architectures (that I had the questionable honor of working with) where char was 32 bit and therefore sizeof(uint32_t) == 1.

But it's true on all architectures you may encounter in gamedev today. It's just not universally true.

Share this post


Link to post
Share on other sites

You need to know the precise size of types in order to correctly interpret binary data.

 
Yes, for reading or writing binary data I agree. However, what it is best stored as isn't necessarily the best type for the operations you have to perform on it. With some hypothetical binary format and calculation to perform on that data, it may make sense for the serialized data to be a 16-bit unsigned integer, but it may also be that the calculation performed on that data is more performant when the type is uint_fast16_t or some other larger size.

You also don't have to use exact-width types to read or write binary data. Casting to a pointer to one of these types from a char buffer breaks strict aliasing rules anyways. One alternative is to memcpy the data into a type of at least the necessary size, and swap just those bytes for endian if required. A quick test shows my compiler inlines the memcpy call and performs about the same as the casting version.

Share this post


Link to post
Share on other sites

Yes, for reading or writing binary data I agree. However, what it is best stored as isn't necessarily the best type for the operations you have to perform on it. With some hypothetical binary format and calculation to perform on that data, it may make sense for the serialized data to be a 16-bit unsigned integer, but it may also be that the calculation performed on that data is more performant when the type is uint_fast16_t or some other larger size.
You also don't have to use exact-width types to read or write binary data. Casting to a pointer to one of these types from a char buffer breaks strict aliasing rules anyways. One alternative is to memcpy the data into a type of at least the necessary size, and swap just those bytes for endian if required. A quick test shows my compiler inlines the memcpy call and performs about the same as the casting version.


Assuming you're even performing a calculation on it. Assume that you're reading a wave file and sending the data to a multimedia API.  That's just one example; there are many more.

Share this post


Link to post
Share on other sites

Yes, for reading or writing binary data I agree. However, what it is best stored as isn't necessarily the best type for the operations you have to perform on it.

But you're talking about APIs. They're interfaces, where you're basically writing binary data into them and reading it out, even if it is only a few bytes at a time. How it operates on the data internally is a different issue.

You give an example where you don't care as long as the number you pass has at least 16 bits, as the others can be masked off. But which ones should be used? Without additional context, there are arguments for using the most significant bits, and there are arguments for using the least significant bits. Which is correct? Better not to guess; specify that the interface requires 16 bits exactly, and everyone knows how things will work.

Share this post


Link to post
Share on other sites

Assuming you're even performing a calculation on it. Assume that you're reading a wave file and sending the data to a multimedia API.  That's just one example; there are many more.

 In that case I would probably just read into whatever type is expected by the multimedia API.
 

 

Yes, for reading or writing binary data I agree. However, what it is best stored as isn't necessarily the best type for the operations you have to perform on it.

You give an example where you don't care as long as the number you pass has at least 16 bits, as the others can be masked off. But which ones should be used? Without additional context, there are arguments for using the most significant bits, and there are arguments for using the least significant bits. Which is correct? Better not to guess; specify that the interface requires 16 bits exactly, and everyone knows how things will work.

I do see your point, but to clarify, the documentation in this case would say something like "The number must be defined in the configuration file for the application.", whereas the policy is really an XML file that gets validated against a schema. The schema limits the allowed numbers to a 16-bit unsigned integer, and if the number isn't in the configuration file, it's rejected as nonsense.

So no one gets the wrong impression, I'm not against using exact-width types in APIs. I'm just looking for tradeoffs between exact-width and inexact-width types in an API.

One possible reason to not use, for example again, uint16_t, is that if this code gets ported to a system without that type, the API would have to change. Not a likely scenario, I know, considering that those kinds of platforms probably wouldn't support other parts of the application anyways. That's why this is driven mainly out of curiosity, and does not stem from a real-world problem.  

Share this post


Link to post
Share on other sites

I do see your point, but to clarify, the documentation in this case would say something like "The number must be defined in the configuration file for the application.", whereas the policy is really an XML file that gets validated against a schema. The schema limits the allowed numbers to a 16-bit unsigned integer, and if the number isn't in the configuration file, it's rejected as nonsense.


Your explanation here is too abstract for me, so I don't understand.
But if you're trying to say that the 16-bit-ness of the value is documented - e.g. float function(int sixteenbitsorlessplease) - and that therefore users who pass a larger value should expect it to be 'rejected as nonsense', how would you like that rejection to happen? An exception? (Not all languages use them.) An error code? (You have to remember to check it.) A crash? (No fun for anyone.)
Easier to just force the user to pass exactly the right thing in.
Besides which, many coders prefer the API to be self-documenting as far as possible. That doesn't mean no documentation; but it does mean that every name should be meaningful and every type should carry useful information.
 

One possible reason to not use, for example again, uint16_t, is that if this code gets ported to a system without that type, the API would have to change.

 

If an equivalent type existed, it's 5 minutes work to add a typedef and commit it. The absolute worst case might be if no similar type existed, but most people aren't interested in supporting platforms with non-conforming compilers. Those who are, might write wrappers. (e.g. Packing and unpacking a 2-byte struct instead of a uint16_t.)

Share this post


Link to post
Share on other sites

most people aren't interested in supporting platforms with non-conforming compilers

A compiler is not required to provide types of all sizes. The standard says nothing about the availability of certain types. It just defines relationships between built-in types and explicitly makes all the fixed-width integer types optional for exactly that reason.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement