Sign in to follow this  

Warning conversion from size_t to int in x64

This topic is 686 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Maybe a bit OT as not at all game related but still C/programming/library related.

 

I've been compiling a lot of originally linux/unix libraries and programs on Win64 lately and it works without too many problems except for one recurrent warning happening a massive amount of time, eg: 

 

hash_sha.c
c:\php-sdk\phpdev\vc12\x64\php56\zend\zend_execute.h(247) : warning C4267: 'func
tion' : conversion from 'size_t' to 'int', possible loss of data
 
No matter what I compile, openssl, libcurl, engine-pkcs11, opensc-pkcs11, apr, apr-util, apr-iconv, apache 2.4.18 and modules, php5.6.17 etc this warning keeps sprouting up. I know it's not without consequences (integer overflow exploit potential!) although I'm not compiling for a production environment, only for my own learning.
 
I still wonder if there is a one-size-fits-all approach to tackle this, maybe defining size_t as something else or should I in theory replace all occurrences on a 1 by 1 basis to determine whether a 64-bit or 32-bit is appropriate there? That would be a massive amount of work and I kinda wonder how win64 seems to only be an afterthought even now in 2016.
 
AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct? So the one-size-fits-all could be to redefine size_t as a 4 byte type but I feel queasy doing this. There could always be places where a size_t is cast to a pointer type even though it's bad practice.
I guess if all occurrences have to be touched it's the ints that need to become size_t's?
 
Still I wonder if existing x64 window binaries of say Apache and PHP were compiled with these warning left ignored and that's why they say those ports are "experimental" or did they correct it somehow?
 

Share this post


Link to post
Share on other sites

size_t is defined as an unsigned type that can store the maximum size of a theoretically possible object of any type (including array).

 

You have a possible translation issue between the two because one is signed (int) and one is unsigned (size_t), even if int is 32-bit and size_t is 32-bit.  So there will never truly be a 1:1 comparison between these two types.  That being said, if you know the int values are always within the range that size_t covers, then it would be a safe conversion.

Share this post


Link to post
Share on other sites

AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct?


No, it's 8 bytes in 64-bit Linux as well.

If you can edit the code, you can use an explicit `static_cast<int>(...)' to tell the compiler that you know what you are doing (even if you don't wink.png ).

Share this post


Link to post
Share on other sites

I still wonder if there is a one-size-fits-all approach to tackle this, maybe defining size_t as something else or should I in theory replace all occurrences on a 1 by 1 basis to determine whether a 64-bit or 32-bit is appropriate there? That would be a massive amount of work and I kinda wonder how win64 seems to only be an afterthought even now in 2016.

 

size_t is a type defined in the standard library. How exactly would you redefine it?

The correct approach is to store size_t values in variables of type size_t. Everything else is based on assumptions that may or may not be valid.

 

That being said...I do not care much about that in personal projects. size_t is equivalent to unsigned long long (64 bit uint) in MSVC x64 and I have never needed more than INT_MAX entries within a single container, so casting to int just works. The only case where it may matter is file handling; there certainly is software working on files larger than 4GB.

 

By the way: C# uses signed types for container lengths, which is IMHO a much better approach (and the long variants of Count and Length use a signed 64 bit integer, which is more than enough -- they only return the value in range [0,2^31) as a long, see below).

Edited by duckflock

Share this post


Link to post
Share on other sites


C# uses signed types for container lengths, which is IMHO a much better approach

I'm curious why you think this is a better approach?

 

You aren't alone in thinking that - Java doesn't even have unsigned integer types. Which leads to support libraries for when you need unsigned integers, and incredible oddities like limiting array length to 2^31 even on 64-bit systems...

Share this post


Link to post
Share on other sites

 

AFAIK size_t is 8 bytes on win64, 4 bytes on x64-linux, is that correct?


No, it's 8 bytes in 64-bit Linux as well.

If you can edit the code, you can use an explicit `static_cast<int>(...)' to tell the compiler that you know what you are doing (even if you don't wink.png ).

 

Ah I was not sure about that as I program on Windows only (will try to do Linux later as well)

So this means that if you compile these libs/progs on Linux as target x64, GCC should spit out the same warnings right?

Does it mean that what I'm trying to compile was never meant to go x64 in the first place? I know PHP at least says only PHP 7 has x64 truly in mind and that before that,eg PHP 5.6.x, it was "experimental, use at own risk".

Share this post


Link to post
Share on other sites

 


C# uses signed types for container lengths, which is IMHO a much better approach

I'm curious why you think this is a better approach?

 

You aren't alone in thinking that - Java doesn't even have unsigned integer types. Which leads to support libraries for when you need unsigned integers, and incredible oddities like limiting array length to 2^31 even on 64-bit systems...

 

C# has unsigned types and I recognize their usefulness, especially for low-level bit manipulation. I have looked up NET array limitations - they are in fact limited to 2^31 elements. Apparently individual NET objects are limited to 2GB by default anyways, so this is a deliberate runtime limitation...

 

Regarding C++ I don't see the problem though. Consider that I'm by no means a professional programmer, so this comes more down to preference. Here are some things that strike me as odd:

  • numeric literals are signed by default
  • the default go-to type of the language is signed int
  • unsigned int vs int comparison forces conversion to unsigned, making accidental unsigned vs signed comparisons dangerous (who hasn't been bitten by this?)
  • if I need more than 2^31 elements right now, I would naively expect to need more than 2^32 at some point in the future.
  • You can possibly return error codes in the negative space. Ok, you probably won't for methods like size() biggrin.png
  • I always have the odd feeling that unsigned behaves a bit parasitic, forcing other things it interacts with to unsigned, too.
  • Most of the time a signed int is enough for my uses laugh.png

Well, I guess most of this unsigned vs signed behavior is there to make the language efficient for as many architectures as possible. I usually don't want to perform math on unsigned values; most of the time I want to operate on signed values. In most of my code I still simply cast size_t to int and call it a day. If I needed larger containers, I'd cast it to long and call it a day. I'm not writing security-critical code, I'm writing games smile.png

Share this post


Link to post
Share on other sites

I'm writing games

 

I mean, technically what you are writing is bugs, but I won't quibble :)

 

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.

 

It could be argued that the compiler could be stricter, and treat all sign mismatches in comparisons as errors. But it could also be argued that if you aren't developing with [tt]-Wall -Werror[/tt], then you are inviting this sort of thing...

Share this post


Link to post
Share on other sites
Seems this has turned into a signed vs unsigned debate. I'm quite curious on other's take on it. I've always leaned towards the 'unsigned as length/size' approach, but I'm curious as to other's choice/rational.

Share this post


Link to post
Share on other sites

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.


Someone calls List<T>.IndexOf(x), but x is not in the list. You're the IndexOf function. What do you return?

signed -> return -1.

unsigned -> return Length+1 (essentially vector::end or equivalent)? What if the length is UInt32.MaxValue? You don't want to return 0. You either need to limit lists to UInt32.MaxValue-1 in length so that you can use MaxValue (i.e. (uint)-1) as the invalid position, or have two return values - Index and IsValid.

iterator (or some other custom type) -> collection_type::end, or a (index, isvalid) tuple, but this means your API now burdens people with understanding the mechanics of iterators or unpacking return values from complex types.


In my opinion, the API simplicity of signed values outweighs the loss of array sizes over 2 billion elements long. If you want collections larger than that, you can still make a custom type using ulong indices and complex IndexOf return values when you need it. Edited by Nypyren

Share this post


Link to post
Share on other sites

Someone calls List.IndexOf(x), but x is not in the list. You're the IndexOf function. What do you return?

 

Representing error codes in the same space as normal answers doesn't strike me as a particularly good thing to do. You can return a pair of a bool and a size_t. You make the bool false if x is not in the list.

 

Limiting the size to UInt32MaxValue-1 doesn't seem as bad as wasting an entire half of the space for error messages.

 

I sometimes use signed integers for things that are essentially indices, so perhaps size_t would be more natural. But then I don't get to do this:

  for (int n = 7; n>=0; --n) ...

Share this post


Link to post
Share on other sites

 

for (int n = 7; n>=0; --n) ...

Just write it like this, lol:
for ( unsigned n = 7; n <= 7; --n )
Problem solved?

 

 

Yes, I've done that kind of thing before, but two things about it make me feel "dirty":

 (1) The condition `n <= 7' doesn't express intent very well.

 (2) The code wouldn't work if `unsigned' were a 3-bit type (i.e, it doesn't work if n is as large as it can be according to its type).

Share this post


Link to post
Share on other sites

I mean, technically what you are writing is bugs, but I won't quibble smile.png

 

Fair enough!

 

 

I'm always confused why people find size_t being unsigned to be a problem. Why are you performing signed comparisons on array indices in the first place? There are very few valid use-cases thereof.

It could be argued that the compiler could be stricter, and treat all sign mismatches in comparisons as errors. But it could also be argued that if you aren't developing with -Wall -Werror, then you are inviting this sort of thing...

 

I'm not. I came across this while working with unsigned matrix indices, I think. That was quite some time ago, but I believe everyone has tripped over the unsigned < 0 condition in a loop at least once in their programming career.

 

I thought "casting size_t to int => warning" was the whole point of this threadlaugh.png

 

 

Representing error codes in the same space as normal answers doesn't strike me as a particularly good thing to do. You can return a pair of a bool and a size_t. You make the bool false if x is not in the list.

 

I don't think there is something inherently wrong with returning an invalid index vs returning an after-the-end iterator or throwing an exception. Which brings us back to the exception vs error code discussion from last week smile.png

Share this post


Link to post
Share on other sites
The warning is very likely coming from returns from size_t funtions being caught in ints. The correct response is to hunt it down and fix it in each case, but in all honesty you're probably only going to get erroneous behaviour one in a million times from this.

This is a horrible thing to say, but... If it's not critical then just silence the warning in these builds.

If it is critical then fix it and send PRs to the authors, since this is indeed a bug.

Share this post


Link to post
Share on other sites


This is a problem of poor interface design for the container, not a problem with unsigned types. I would prefer something like this using an output parameter for the index:
bool IndexOf( const T& x, size_t& index ) const;
Then the usage of the method is:
size_t index;
if ( container.IndexOf( x, index ) )
{
… use index
}

 

In general I agree with "don't conflate error codes with values", but when it comes to array indices I'm not as convinced (I mean, even the C++ standard conflates valid iterators with errors).

 

The pattern above becomes cumbersome in the common case when you know IndexOf will succeed.

Share this post


Link to post
Share on other sites

This is a problem of poor interface design for the container, not a problem with unsigned types. I would prefer something like this using an output parameter for the index:




bool IndexOf( const T& x, size_t& index ) const;
Then the usage of the method is:

size_t index;
if ( container.IndexOf( x, index ) )
{
… use index
}

I like that approach as well. .Net uses that for "bool Dictionary<>.TryGetValue(key, out value)".

Last time I used C++ (before any of the C++11 or newer stuff), it didn't have the "out" semantics. Just "ref". I.e. the compiler can't tell whether the IndexOf function actually set the index to a value, or just left it uninitialized.

C# will prevent you from EVER using an uninitialized value, which I like a lot.

Do the modern versions of C++ have anything like this yet? Edited by Nypyren

Share this post


Link to post
Share on other sites


The pattern above becomes cumbersome in the common case when you know IndexOf will succeed.

I mean, honestly, if you know it will succeed, then you don't need *either* method of propagating errors, and what you really want is a [tt]IndexOf(std::no_error_result)[/tt] API?

Share this post


Link to post
Share on other sites

It's not really a matter of "yet". It's just not in keeping with the C++ philosophy.

 

 


I mean, honestly, if you know it will succeed, then you don't need *either* method of propagating errors, and what you really want is a IndexOf(std::no_error_result) API?

Both of these so much.

 

That is the design of the language.

 

It also mixes in a bit with another recent heated debate about the presence of exceptions in code, with the discussion that in games generally c++ exceptions should not happen. The conditions causing the errors should not happen, there should be no need to test for the condition nor the need to propagate the errors because the errors should not happen in the first place.

 

 

It reminds me of a Torvalds rant on the Linux kernel mailing list about 3 years ago.  The Linux kernel -- like most libraries -- specify a limited set of results from function calls. Calls are generally designed to succeed. If it can fail there is a specific list of failure conditions and error codes for every condition. One of the kernel maintainers ended up letting a bug go through where the io controller returned a result that wasn't one of the documented return codes.  Torvalds went ballistic, the discussion hit Slashdot and other media outlets, and most of the programming world quickly broke into two camps: The first camp was the only option is success or specific error results. The second (very small and quickly beaten-down) camp was more about propagation of errors arguing it might be time for the kernel to go more of a Java route. While many people didn't like Torvald's reaction, most could understand the bug was a serious flaw for the design of the system: the kernel's libraries absolutely cannot return values outside the specification.

 

 

The design philosophy of the language and the tools is straightforward enough.  First, design and build things so they cannot break. They cannot fail, they only succeed if used as directed. If there is a way they can fail, provide a way that the programmer can either test it to ensure it will not fail, or a way to notify the caller of the failure along with the reason it failed. 

 

As swiftcoder mentioned if done correctly then you don't need to detect the errors, you don't need to propagate errors, because there are no errors. If there is a flaw it is not because the library is broken, but because the programmer used it wrong.  In other terms, If you shot yourself in the foot isn't because the gun was broken, but because the shooter didn't point the gun down range.

 

 

This actually makes sense if you look at the history of the language.

 

Warning: Old person boring rant about the bad old days, pet dinosaurs, and when dirt was young.

 

C was made back in 1972.  Personal computers did not hit the market until 1977.

 

Back in that era, computers were big machines taking up entire rooms, sometimes entire floors of buildings. They were expensive, and only large universities and large businesses owned them.

 

Usually computer time was rented, often both by the CPU cycle and by the bit of memory and storage.

 

Back in that era software was developed differently.  Usually there were a small number of programmers, generally mathematicians, who haggled back and forth over exactly what each function was supposed to do, exactly what it was supposed to return.

 

There was an army of secretaries, and every week they would collect the notes, type them up, and give copies (in the forms of typewritten books) to each one of the programming team.  The programmers would review the functions, ensure they were correct, haggle over details, make corrections, give their corrections to the army of secretaries, and the following week the secretaries produced another typewritten book with all the updates for each programmer.

 

That army of secretaries and weeks of development by high-priced experts, often commanding salaries the same as doctors and lawyers, was cheaper than renting the computer time.

 

Code was reviewed by these professionals based on the text descriptions and the CPU instructions. Reviewed time after time, week after week, until it was shown to be correct.

 

Given that model, there was no reason for anything to have an unexpected failure.  Every function call was meticulously designed. Every parameter value and every return result was reviewed by domain experts and argued over for weeks, sometimes for months. 

 

Given that model, the best option is for a function to succeed quickly and give no result, the second option is to quickly (to minimize cost) return an error code detailing exactly the reason for the failure.  

 

The option to throw an exception, to tear everything down and arrange for some other thing to handle it, does not fit the model. That would mean additional defensive coding for rare situations, which is something automated by compilers these days. That would mean additional compute time for the program, which is being paid for by the cycle.  That would mean additional storage space, which was paid for often by the bit. Exception handling code would mean enormous costs for development since the boilerplate wasn't automated, and even bigger costs for running the final software.

 

End history

 

Sorry about that, trying to bring this back on track...

 

 

 


So this means that if you compile these libs/progs on Linux as target x64, GCC should spit out the same warnings right?
Does it mean that what I'm trying to compile was never meant to go x64 in the first place? I know PHP at least says only PHP 7 has x64 truly in mind and that before that,eg PHP 5.6.x, it was "experimental, use at own risk".

It should probably give those warnings, but it doesn't have to depending on the implementation.

 

The root of the problem is that they are using the types incorrectly for portable use.  They are relying on implementation defined behavior.

 

The size_t type has a specific meaning. In the C language standard is an unsigned integer type of the result of the sizeof operator. The exact size is implementation defined.

 

The int type has a specific meaning. In the C language standard it is an implementation defined signed integer type that has the natural size suggested by the target architecture, large enough to contain any value within INT_MIN and INT_MAX.

 

The mixed types (signed and unsigned) need to go through the "usual arithmetic conversions". One implementation-defined signed value is being compared to another implementation-defined unsigned value. It is entirely possible that using the conversions the compiler has a type available that makes it legal. For example, if a signed variable is 64 bits and the unsigned variable is 32-bits, the representation can match if the unsigned variable is promoted to a 64-bit value. For another example, if an unsigned 64-bit variable is compared to a signed 32-bit value, if the compiler has a signed 128-bit type available to it that can represent both in the signed 128-bit type without warning.

 

Additionally, because this is C, a signed vs unsigned comparison is not required to issue a warning. The implicit conversions in C can result in a narrower type and not generate a warning.

 

Most compilers are kind enough to give you a warning in that case, but it isn't required and it is up to the expert programmers --- the ones from the history lesson who are supposed to be carefully reviewing every line of their code for weeks on end --- to ensure it is correct.

 

C and its siblings C++ and Objective C that all branched from the same family tree are extremely unforgiving languages. The model they are built on is that the programmer knows exactly what they are doing and that the programmer is doing it correctly. The model assumes the programmer is right, and that the programmer does not make mistakes.

Share this post


Link to post
Share on other sites

It's not really a matter of "yet". It's just not in keeping with the C++ philosophy.


C++ checks for tons of things at compile time, and performs a ridiculous number of optimizations. I don't see why making sure a local variable has been assigned before all uses is counter to that philosophy, given that the checks required are done entirely at compile time, do not alter the executable code, and use the same data flow analysis process the compiler is already using when performing optimizations and writing machine code.

You've got a control flow graph, and you have statements that assign or use variables. You traverse the control flow graph backwards from each use, and if you don't hit an assign along all paths before you reach the start of the function, you generate a compile error.

The three different kinds of passing forms in C#...

"in"
- The variable must be assigned by the caller before the call.
- The called function treats it as assigned from the beginning.
- The called function can assign it a different value, or leave it alone.
- Modifications to the value do not propagate back to the caller.

"ref"
- The variable must be assigned by the caller before the call.
- The called function treats it as assigned from the beginning.
- The called function can assign it a different value, or leave it alone.
- Modifications to the value DO propagate back to the caller.

"out"
- The variable does not have to be assigned before the call.
- The caller treats the call as an assignment to the variable.
- The called function treats the variable as unassigned from the beginning.
- The called function is required to assign the variable before returning.


The last time I used C++, you could use a reference parameter as an out, but there was no checking at all for use of unassigned locals. You do not NEED an 'out' modifier to support it, but it would waste an assignment if you require the caller to assign the variable before passing it as ref to a function. This is what the out keyword is for. It allows you to leave the variable unassigned before the call and have the compiler understand that the function you're calling is 100% guaranteed to assign it, without having to propagate the entire data flow analysis across functions (which is not really feasible), and THAT allows the compiler to make it a compile error when you don't follow the rules. Edited by Nypyren

Share this post


Link to post
Share on other sites

It's also only relevant for basic types.

 

Anything else that you declare but don't assign anything to, must by definition have a default constructor, and therefore is not uninitialized regardless, and perfectly valid to leave in that state.

Share this post


Link to post
Share on other sites

This topic is 686 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this