Jump to content
  • Advertisement
Sign in to follow this  
Fredericvo

Warning conversion from size_t to int in x64

This topic is 992 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Maybe a bit OT as not at all game related but still C/programming/library related.

 

I've been compiling a lot of originally linux/unix libraries and programs on Win64 lately and it works without too many problems except for one recurrent warning happening a massive amount of time, eg: 

 

hash_sha.c
c:\php-sdk\phpdev\vc12\x64\php56\zend\zend_execute.h(247) : warning C4267: 'func
tion' : conversion from 'size_t' to 'int', possible loss of data
 
No matter what I compile, openssl, libcurl, engine-pkcs11, opensc-pkcs11, apr, apr-util, apr-iconv, apache 2.4.18 and modules, php5.6.17 etc this warning keeps sprouting up. I know it's not without consequences (integer overflow exploit potential!) although I'm not compiling for a production environment, only for my own learning.
 
I still wonder if there is a one-size-fits-all approach to tackle this, maybe defining size_t as something else or should I in theory replace all occurrences on a 1 by 1 basis to determine whether a 64-bit or 32-bit is appropriate there? That would be a massive amount of work and I kinda wonder how win64 seems to only be an afterthought even now in 2016.
 
AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct? So the one-size-fits-all could be to redefine size_t as a 4 byte type but I feel queasy doing this. There could always be places where a size_t is cast to a pointer type even though it's bad practice.
I guess if all occurrences have to be touched it's the ints that need to become size_t's?
 
Still I wonder if existing x64 window binaries of say Apache and PHP were compiled with these warning left ignored and that's why they say those ports are "experimental" or did they correct it somehow?
 

Share this post


Link to post
Share on other sites
Advertisement

size_t is defined as an unsigned type that can store the maximum size of a theoretically possible object of any type (including array).

 

You have a possible translation issue between the two because one is signed (int) and one is unsigned (size_t), even if int is 32-bit and size_t is 32-bit.  So there will never truly be a 1:1 comparison between these two types.  That being said, if you know the int values are always within the range that size_t covers, then it would be a safe conversion.

Share this post


Link to post
Share on other sites

AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct?


No, it's 8 bytes in 64-bit Linux as well.

If you can edit the code, you can use an explicit `static_cast<int>(...)' to tell the compiler that you know what you are doing (even if you don't wink.png ).

Share this post


Link to post
Share on other sites

I still wonder if there is a one-size-fits-all approach to tackle this, maybe defining size_t as something else or should I in theory replace all occurrences on a 1 by 1 basis to determine whether a 64-bit or 32-bit is appropriate there? That would be a massive amount of work and I kinda wonder how win64 seems to only be an afterthought even now in 2016.

 

size_t is a type defined in the standard library. How exactly would you redefine it?

The correct approach is to store size_t values in variables of type size_t. Everything else is based on assumptions that may or may not be valid.

 

That being said...I do not care much about that in personal projects. size_t is equivalent to unsigned long long (64 bit uint) in MSVC x64 and I have never needed more than INT_MAX entries within a single container, so casting to int just works. The only case where it may matter is file handling; there certainly is software working on files larger than 4GB.

 

By the way: C# uses signed types for container lengths, which is IMHO a much better approach (and the long variants of Count and Length use a signed 64 bit integer, which is more than enough -- they only return the value in range [0,2^31) as a long, see below).

Edited by duckflock

Share this post


Link to post
Share on other sites


C# uses signed types for container lengths, which is IMHO a much better approach

I'm curious why you think this is a better approach?

 

You aren't alone in thinking that - Java doesn't even have unsigned integer types. Which leads to support libraries for when you need unsigned integers, and incredible oddities like limiting array length to 2^31 even on 64-bit systems...

Share this post


Link to post
Share on other sites

 

AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct?


No, it's 8 bytes in 64-bit Linux as well.

If you can edit the code, you can use an explicit `static_cast<int>(...)' to tell the compiler that you know what you are doing (even if you don't wink.png ).

 

Ah I was not sure about that as I program on Windows only (will try to do Linux later as well)

So this means that if you compile these libs/progs on Linux as target x64, GCC should spit out the same warnings right?

Does it mean that what I'm trying to compile was never meant to go x64 in the first place? I know PHP at least says only PHP 7 has x64 truly in mind and that before that,eg PHP 5.6.x, it was "experimental, use at own risk".

Share this post


Link to post
Share on other sites

 


C# uses signed types for container lengths, which is IMHO a much better approach

I'm curious why you think this is a better approach?

 

You aren't alone in thinking that - Java doesn't even have unsigned integer types. Which leads to support libraries for when you need unsigned integers, and incredible oddities like limiting array length to 2^31 even on 64-bit systems...

 

C# has unsigned types and I recognize their usefulness, especially for low-level bit manipulation. I have looked up NET array limitations - they are in fact limited to 2^31 elements. Apparently individual NET objects are limited to 2GB by default anyways, so this is a deliberate runtime limitation...

 

Regarding C++ I don't see the problem though. Consider that I'm by no means a professional programmer, so this comes more down to preference. Here are some things that strike me as odd:

  • numeric literals are signed by default
  • the default go-to type of the language is signed int
  • unsigned int vs int comparison forces conversion to unsigned, making accidental unsigned vs signed comparisons dangerous (who hasn't been bitten by this?)
  • if I need more than 2^31 elements right now, I would naively expect to need more than 2^32 at some point in the future.
  • You can possibly return error codes in the negative space. Ok, you probably won't for methods like size() biggrin.png
  • I always have the odd feeling that unsigned behaves a bit parasitic, forcing other things it interacts with to unsigned, too.
  • Most of the time a signed int is enough for my uses laugh.png

Well, I guess most of this unsigned vs signed behavior is there to make the language efficient for as many architectures as possible. I usually don't want to perform math on unsigned values; most of the time I want to operate on signed values. In most of my code I still simply cast size_t to int and call it a day. If I needed larger containers, I'd cast it to long and call it a day. I'm not writing security-critical code, I'm writing games smile.png

Share this post


Link to post
Share on other sites

I'm writing games

 

I mean, technically what you are writing is bugs, but I won't quibble :)

 

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.

 

It could be argued that the compiler could be stricter, and treat all sign mismatches in comparisons as errors. But it could also be argued that if you aren't developing with -Wall -Werror, then you are inviting this sort of thing...

Share this post


Link to post
Share on other sites
Seems this has turned into a signed vs unsigned debate. I'm quite curious on other's take on it. I've always leaned towards the 'unsigned as length/size' approach, but I'm curious as to other's choice/rational.

Share this post


Link to post
Share on other sites

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.


Someone calls List<T>.IndexOf(x), but x is not in the list. You're the IndexOf function. What do you return?

signed -> return -1.

unsigned -> return Length+1 (essentially vector::end or equivalent)? What if the length is UInt32.MaxValue? You don't want to return 0. You either need to limit lists to UInt32.MaxValue-1 in length so that you can use MaxValue (i.e. (uint)-1) as the invalid position, or have two return values - Index and IsValid.

iterator (or some other custom type) -> collection_type::end, or a (index, isvalid) tuple, but this means your API now burdens people with understanding the mechanics of iterators or unpacking return values from complex types.


In my opinion, the API simplicity of signed values outweighs the loss of array sizes over 2 billion elements long. If you want collections larger than that, you can still make a custom type using ulong indices and complex IndexOf return values when you need it. Edited by Nypyren

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!