Warning conversion from size_t to int in x64

Started by
30 comments, last by swiftcoder 8 years, 2 months ago

Maybe a bit OT as not at all game related but still C/programming/library related.

I've been compiling a lot of originally linux/unix libraries and programs on Win64 lately and it works without too many problems except for one recurrent warning happening a massive amount of time, eg:

hash_sha.c
c:\php-sdk\phpdev\vc12\x64\php56\zend\zend_execute.h(247) : warning C4267: 'func
tion' : conversion from 'size_t' to 'int', possible loss of data
No matter what I compile, openssl, libcurl, engine-pkcs11, opensc-pkcs11, apr, apr-util, apr-iconv, apache 2.4.18 and modules, php5.6.17 etc this warning keeps sprouting up. I know it's not without consequences (integer overflow exploit potential!) although I'm not compiling for a production environment, only for my own learning.
I still wonder if there is a one-size-fits-all approach to tackle this, maybe defining size_t as something else or should I in theory replace all occurrences on a 1 by 1 basis to determine whether a 64-bit or 32-bit is appropriate there? That would be a massive amount of work and I kinda wonder how win64 seems to only be an afterthought even now in 2016.
AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct? So the one-size-fits-all could be to redefine size_t as a 4 byte type but I feel queasy doing this. There could always be places where a size_t is cast to a pointer type even though it's bad practice.
I guess if all occurrences have to be touched it's the ints that need to become size_t's?
Still I wonder if existing x64 window binaries of say Apache and PHP were compiled with these warning left ignored and that's why they say those ports are "experimental" or did they correct it somehow?
Advertisement

size_t is defined as an unsigned type that can store the maximum size of a theoretically possible object of any type (including array).

You have a possible translation issue between the two because one is signed (int) and one is unsigned (size_t), even if int is 32-bit and size_t is 32-bit. So there will never truly be a 1:1 comparison between these two types. That being said, if you know the int values are always within the range that size_t covers, then it would be a safe conversion.

"I can't believe I'm defending logic to a turing machine." - Kent Woolworth [Other Space]

AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct?


No, it's 8 bytes in 64-bit Linux as well.

If you can edit the code, you can use an explicit `static_cast<int>(...)' to tell the compiler that you know what you are doing (even if you don't wink.png ).

I still wonder if there is a one-size-fits-all approach to tackle this, maybe defining size_t as something else or should I in theory replace all occurrences on a 1 by 1 basis to determine whether a 64-bit or 32-bit is appropriate there? That would be a massive amount of work and I kinda wonder how win64 seems to only be an afterthought even now in 2016.

size_t is a type defined in the standard library. How exactly would you redefine it?

The correct approach is to store size_t values in variables of type size_t. Everything else is based on assumptions that may or may not be valid.

That being said...I do not care much about that in personal projects. size_t is equivalent to unsigned long long (64 bit uint) in MSVC x64 and I have never needed more than INT_MAX entries within a single container, so casting to int just works. The only case where it may matter is file handling; there certainly is software working on files larger than 4GB.

By the way: C# uses signed types for container lengths, which is IMHO a much better approach (and the long variants of Count and Length use a signed 64 bit integer, which is more than enough -- they only return the value in range [0,2^31) as a long, see below).


C# uses signed types for container lengths, which is IMHO a much better approach

I'm curious why you think this is a better approach?

You aren't alone in thinking that - Java doesn't even have unsigned integer types. Which leads to support libraries for when you need unsigned integers, and incredible oddities like limiting array length to 2^31 even on 64-bit systems...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct?


No, it's 8 bytes in 64-bit Linux as well.

If you can edit the code, you can use an explicit `static_cast<int>(...)' to tell the compiler that you know what you are doing (even if you don't wink.png ).

Ah I was not sure about that as I program on Windows only (will try to do Linux later as well)

So this means that if you compile these libs/progs on Linux as target x64, GCC should spit out the same warnings right?

Does it mean that what I'm trying to compile was never meant to go x64 in the first place? I know PHP at least says only PHP 7 has x64 truly in mind and that before that,eg PHP 5.6.x, it was "experimental, use at own risk".


C# uses signed types for container lengths, which is IMHO a much better approach

I'm curious why you think this is a better approach?

You aren't alone in thinking that - Java doesn't even have unsigned integer types. Which leads to support libraries for when you need unsigned integers, and incredible oddities like limiting array length to 2^31 even on 64-bit systems...

C# has unsigned types and I recognize their usefulness, especially for low-level bit manipulation. I have looked up NET array limitations - they are in fact limited to 2^31 elements. Apparently individual NET objects are limited to 2GB by default anyways, so this is a deliberate runtime limitation...

Regarding C++ I don't see the problem though. Consider that I'm by no means a professional programmer, so this comes more down to preference. Here are some things that strike me as odd:

  • numeric literals are signed by default
  • the default go-to type of the language is signed int
  • unsigned int vs int comparison forces conversion to unsigned, making accidental unsigned vs signed comparisons dangerous (who hasn't been bitten by this?)
  • if I need more than 2^31 elements right now, I would naively expect to need more than 2^32 at some point in the future.
  • You can possibly return error codes in the negative space. Ok, you probably won't for methods like size() biggrin.png
  • I always have the odd feeling that unsigned behaves a bit parasitic, forcing other things it interacts with to unsigned, too.
  • Most of the time a signed int is enough for my uses laugh.png

Well, I guess most of this unsigned vs signed behavior is there to make the language efficient for as many architectures as possible. I usually don't want to perform math on unsigned values; most of the time I want to operate on signed values. In most of my code I still simply cast size_t to int and call it a day. If I needed larger containers, I'd cast it to long and call it a day. I'm not writing security-critical code, I'm writing games smile.png

I'm writing games

I mean, technically what you are writing is bugs, but I won't quibble :)

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.

It could be argued that the compiler could be stricter, and treat all sign mismatches in comparisons as errors. But it could also be argued that if you aren't developing with -Wall -Werror, then you are inviting this sort of thing...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Seems this has turned into a signed vs unsigned debate. I'm quite curious on other's take on it. I've always leaned towards the 'unsigned as length/size' approach, but I'm curious as to other's choice/rational.

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.


Someone calls List<T>.IndexOf(x), but x is not in the list. You're the IndexOf function. What do you return?

signed -> return -1.

unsigned -> return Length+1 (essentially vector::end or equivalent)? What if the length is UInt32.MaxValue? You don't want to return 0. You either need to limit lists to UInt32.MaxValue-1 in length so that you can use MaxValue (i.e. (uint)-1) as the invalid position, or have two return values - Index and IsValid.

iterator (or some other custom type) -> collection_type::end, or a (index, isvalid) tuple, but this means your API now burdens people with understanding the mechanics of iterators or unpacking return values from complex types.


In my opinion, the API simplicity of signed values outweighs the loss of array sizes over 2 billion elements long. If you want collections larger than that, you can still make a custom type using ulong indices and complex IndexOf return values when you need it.

This topic is closed to new replies.

Advertisement