using signed int for values that should never be negative

Started by
28 comments, last by SiCrane 16 years, 7 months ago
Maybe this is a wild idea to the extreme, but what about something like:

#include <limits>#include <stdexcept>template<typename baseType> class iter_t{public:	iter_t(baseType lower, baseType upper) :		lower_(lower), upper_(upper),		min_limit_(std::numeric_limits<baseType>::min()),		max_limit_(std::numeric_limits<baseType>::max())	{		checkLower(lower);		checkUpper(upper);	}	void isFirst()	{		counter_ = lower_;	}	void isFirst(baseType lower)	{		checkLower(lower);		counter_ = lower;	}	void isNext()	{		++counter_;	}	bool isDone()	{		return (counter_ == upper_ );	}        // Could include another settable isDone(baseType) method but        // checking each loop would slow things considerably.private:	baseType counter_;	const baseType lower_;	const baseType upper_;	const baseType min_limit_;	const baseType max_limit_;	void checkLower(baseType lower)	{		if (lower < min_limit_ )		{			// issue some error condition...		}	}	void checkUpper(baseType upper)	{		if (upper > max_limit_ )		{			// issue some error condition...		}	}};


Used in source something like:

iter_t<int> i(0,500);	for( i.isFirst(); !i.isDone(); i.isNext() ){	// do something...	}	for( i.isFirst(10); !i.isDone(); i.isNext() ){	// do something...}


This would be for determinant loops; a condition could be added for indeterminant loops as well, that accepts another form of computed baseType limit.

Edit - What is this weirdness? I've benchmarked the above code against a standard loop producing 100 million random numbers with my fastest generator, and consistently outperforms a standard loop by about 15%. What gives?

Results are something like:

// For standard 'for' loop:
Elapsed time : 0.84s. // Sometimes as high as 0.98s.

// For 'iter_t' loop:
Elapsed time : 0.73s. // Pretty much stable at 0.73s.

Comparative source code follows:

int j = 0;double time = 0;timer.mark();	// Simple timer class, already initialized.for ( j=0; j!=100000000; ++j){	pGen->next();}time = timer.read();std::cout << "\nElapsed time   : " << time << "s." << std::endl;	iter_t<int> i(0,100000000);	timer.mark();for( i.isFirst(); !i.isDone(); i.isNext() ){	pGen->next();}time = timer.read();std::cout << "\nElapsed time   : " << time << "s." << std::endl;


Is this because there is something run-time 'dynamic' going on with the use of 'int', and something compile-time 'static; going on with 'iter_t'? Thinking about doing some major refactoring here....

Edit again...Revised iter_t (static invocation is consistently 0.7s for 100 million random numbers) and tried some run-time trials for comparison purposes. Code of run-time is:

unsigned lower = STUtility::promptRange("Select a lower bound within the range : ",-100,100000000);unsigned upper = STUtility::promptRange("Select a upper bound within the range : ",-100,100000000);	iter_t<unsigned> test(lower,upper);	timer.mark();for( test.isFirst(); !test.isDone(); test.isNext() ){	pGen->next();}time = timer.read();std::cout << "\nElapsed time   : " << time << "s." << std::endl;lower = STUtility::promptRange("Select a lower bound within the range : ",-100,100000000);upper = STUtility::promptRange("Select a upper bound within the range : ",-100,100000000);timer.mark();	for ( j=lower; j!=upper; ++j){	pGen->next();}time = timer.read();std::cout << "\nElapsed time   : " << time << "s." << std::endl;


Interestingly, the iter_t is still faster. Results are:

Select a lower bound within the range : -100 to 100000000 : 0
Select a upper bound within the range : -100 to 100000000 : 100000000

Elapsed time : 1.57s.

Select a lower bound within the range : -100 to 100000000 : 0
Select a upper bound within the range : -100 to 100000000 : 100000000

Elapsed time : 1.68s.

Note: Type is unsigned because I'm in the process of testing the error conditions. I wouldn't necessarily use unsigned in this context.

--random

[Edited by - random_thinker on September 16, 2007 8:56:19 AM]
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Advertisement
Wondering if this speed improvement with 'iter_t' is just a compiler-specific issue? All these benchmarks are on G++ (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5).

--random
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Quote:Original post by Sneftel
No, that's the thing. An index of 4294967295 will access exactly the same thing as an index of -1. From the CPU's point of view, there literally is zero difference between the two situations.


Assuming a 32-bit twos complement signed integer representation. A sign magnitude integer implementation would probably do something different.
Fascinated by this problem. To solve some of the issues pointed out in this thread, and also some buffer overflow and possible security problems, what about using an int_t type, something like:

#include <iostream>#include <limits>#include "../Headers/Utilities/Error.h"template<typename intType>class int_t{public:	int_t() :		value_(0)	{	}		int_t(const intType arg) :		value_(arg)	{	}		int_t(const int_t<intType>& arg) :		value_(arg.value_)	{	}		~int_t()	{	}	int_t operator=(const int_t<intType> arg)	{		value_ = arg.value_;		return *this;	}		int_t operator=(const intType arg)	{		value_ = arg;		return *this;	}		int_t operator+(const intType arg) const	{		return (int_t(value_ + arg));	}		int_t operator+(const int_t<intType> arg) const	{		return (int_t(value_ + arg.value_));	}		int_t operator-(const intType arg) const	{			return (int_t(value_ - arg));	}			int_t operator-(const int_t<intType> arg) const	{		return (int_t(value_ - arg.value_));	}		int_t operator*(const intType arg) const	{			return (int_t(value_ * arg));	}				int_t operator*(const int_t<intType> arg) const	{		return (int_t(value_ * arg.value_));	}		int_t operator/(const intType arg) const	{		if (arg == 0)		{			STUtility::error error_cond("Divide by zero");			error_cond.die();		}		return (int_t(value_ / arg));	}		int_t operator/(const int_t<intType> arg) const	{		if (arg.value_ == 0)		{			STUtility::error error_cond("Divide by zero");			error_cond.die();		}		return (int_t(value_ / arg.value_));	}		int_t operator%(const intType arg) const	{		return (int_t(value_ % arg));	}		int_t operator%(const int_t<intType> arg) const	{		return (int_t(value_ % arg.value_));	}		int_t operator&(const intType arg) const	{		return (int_t(value_ & arg));	}		int_t operator&(const int_t<intType> arg) const	{		return (int_t(value_ & arg.value_));	}		int_t operator|(const intType arg) const	{		return (int_t(value_ | arg));	}		int_t operator|(const int_t<intType> arg) const	{		return (int_t(value_ | arg.value_));	}		int_t operator^(const intType arg) const	{		return (int_t(value_ ^ arg));	}		int_t operator^(const int_t<intType> arg) const	{		return (int_t(value_ ^ arg.value_));	}		int_t operator~() const	{		return (~value_);	}		int_t operator<<(const intType arg) const	{		return (int_t(value_ << arg));	}		int_t operator<<(const int_t<intType> arg) const	{		return (int_t(value_ << arg.value_));	}		int_t operator>>(const intType arg) const	{		return (int_t(value_ >> arg));	}		int_t operator>>(const int_t<intType> arg) const	{		return (int_t(value_ >> arg.value_));	}		bool operator==(const intType arg) const	{		return (int_t(value_ == arg));	}		bool operator==(const int_t<intType> arg) const	{		return (int_t(value_ == arg.value_));	}		bool operator!=(const intType arg) const	{		return (int_t(value_ != arg));	}		bool operator!=(const int_t<intType> arg) const	{		return (int_t(value_ != arg.value_));	}		bool operator>=(const intType arg) const	{		return (int_t(value_ >= arg));	}		bool operator>=(const int_t<intType> arg) const	{		return (int_t(value_ >= arg.value_));	}		bool operator>(const intType arg) const	{		return (int_t(value_ > arg));	}		bool operator>(const int_t<intType> arg) const	{		return (int_t(value_ > arg.value_));	}		bool operator<=(const intType arg) const	{		return (int_t(value_ <= arg));	}		bool operator<=(const int_t<intType> arg) const	{		return (int_t(value_ <= arg.value_));	}		bool operator<(const intType arg) const	{		return (int_t(value_ < arg));	}		bool operator<(const int_t<intType> arg) const	{		return (int_t(value_ < arg.value_));	}		int_t operator++()	{		++value_;		return (*this);	}		int_t operator--()	{		--value_;		return (*this);	}		intType value()	{		return value_;	}	private:	intType value_;};template<typename intType>std::ostream & operator<<(std::ostream & out, int_t<intType> arg){	return out << arg.value(); }template<typename intType>std::istream & operator>>(std::istream & in, int_t<intType> arg){	return in >> arg.value(); }


--random
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Quote:Original post by JohnBolton
I definitely think it is a mistake to use an unsigned variable just because the values are expected to never be negative. Two reasons:
  • It gives you a false sense of security. As pointed earlier, it doesn't catch or prevent problems, and the mistaken assumptions in some of the posts above illustrate the point.

I only use unsigned integers to indicate an intention rather than to catch mistakes; I certainly don't feel any safer using them.

Quote:
  • Mixing signed and unsigned variables results in a lot of casting, which reduces readability and leads to bugs.

If used consistently and at appropriate circumstances then there's little reason that casting for signedness within your own code need occur. If you're using a library then there is always the risk that you have a different convention on the use of primitive types. The library I use most often is the SC++L; which consistently defines size_type to be an unsigned integer so I rarely need to cast for signedness at all.

Quote:I use unsigned values for bit manipulation, for variables that need an extra bit of range, and for compatibility.

I use signed values when I need negative numbers or for compatibility.

Quote:Does anyone use unsigned char if the value is expected to always fit in 8 bits? In other words, does anyone advocate doing this?
    for ( unsigned char i = 0; i < 10; ++i )    ... 

This is a very good point.
I wouldn't advocate this; I would wager most people associate char with characters, strings and bytes; as a result I'd only use a char when dealing with one of those three things.
Even though using a char may be more explicit about the range it would obfuscate the intended purpose of the variable if just used for natural numbers in this way.

Quote:For the same reasons described above, I don't use char or short just because the values are expected to be in their ranges. I just use int. In fact I only use char for text and can't think of any reason for ever using short (if I need specific sizes, types such as __int8 and __int16 are more appropriate).

Personally I think that using either a signed or unsigned integer is fine provided it's consistent. Relying on unsigned values to catch bugs for you is risky and as shown in the posts above often mistaken. As long as types are used appropriately I have no problem reading or even writing code using either convention (but if left to my own devices I would default to using unsigned integers for non-negative values).
I wasn't trying to trash the use of signed integers btw, I just want to provide a balanced argument :-)

As for short there really is little reason to use it by itself, if you have a lot of integers packed together then using short could half the memory requirement, the first example that springs to mind is an index buffer/array.
In theory I would say that using unsigned types for positive-only values is a good idea. But in practice, you often find yourself performing arithmetic on them and falling foul of bugs. We came across one in our GUI code recently, where text was aligned in its container by subtracting the text width from the container width to work out how much free space there was, and positioning it accordingly. Using unsigned values meant that text which was too wide for its container was being rendered 600 miles offscreen. :)

For reasons like this, the only time I use unsigned values is in array indices and loops over such indices, or when explicitly comparing against values that the standard library returns as unsigned (eg. size() and friends).
Quote:Original post by thedustbustr
Should I always be using unsigned types for types that are never negative, or am I overreacting and adding complexity where it is completely unnecessary?


I've dug into this a bit deeper today and have changed my views on unsigned/signed usage. That is, except for bit-level operations such as those demanded by number generators, etc, I think I'll stick to signed integers. Here's why:

1. Today I tried to implement a system that catches lower bounds on loops, and found that with unsigned integers this is nearly imposssible, as these cannot go negative but instead tick over to the highest value. This was mentioned in the previous posts.

2. IMHO, the best way to control the boundaries of loops, etc is to use integers that can go negative. This means sticking with signed integers.

Today, after some experimentation, I've put together some code for type safety and a generic iterator type that can be used for general purpose looping, of the form:

for(obj.isFirst(),!obj.isDone(),obj.isNext()){    // do something...}


On my system, this code will run from 15% to more than 50% faster than a straightforward for(;;) loop of the same magnitude. Additionally, I've developed a generic int_t that is based upon ideas presented at http://hissa.nist.gov/sw_develop/ir5769/node6.html for safe integers for mission critical code. This can be combined with iter_t. My benchmarks of these so far indicate that there is little or no difference for small loops and that the generic types are much faster where static implementation can be made, and are slightly faster for dynamic, runtime implementation.

The rough code follows:

#include <iostream>#include <limits>#include "../Headers/Utilities/Error.h"// int_t classtemplate<typename intType>class int_t{public:	int_t() :		value_(0)	{	}		int_t(const intType arg) :		value_(arg)	{	}		int_t(const int_t<intType>& arg) :		value_(arg.value_)	{	}		~int_t()	{	}	int_t operator=(const int_t<intType> arg)	{		value_ = arg.value_;		return *this;	}		int_t operator=(const intType arg)	{		value_ = arg;		return *this;	}		int_t operator+(const intType arg) const	{		return (int_t<intType>(value_ + arg));	}		int_t operator+(const int_t<intType> arg) const	{		return (int_t<intType>(value_ + arg.value_));	}		int_t operator-(const intType arg) const	{		return (int_t<intType>(value_ - arg));	}			int_t operator-(const int_t<intType> arg) const	{		return (int_t<intType>(value_ - arg.value_));	}		int_t operator*(const intType arg) const	{		return (int_t<intType>(value_ * arg));	}				int_t operator*(const int_t<intType> arg) const	{		return (int_t<intType>(value_ * arg.value_));	}		int_t operator/(const intType arg) const	{		if (arg == 0)		{			STUtility::error error_cond("Divide by zero");			error_cond.die();		}		return (int_t<intType>(value_ / arg));	}		int_t operator/(const int_t<intType> arg) const	{		if (arg.value_ == 0)		{			STUtility::error error_cond("Divide by zero");			error_cond.die();		}		return (int_t<intType>(value_ / arg.value_));	}		int_t operator%(const intType arg) const	{		return (int_t<intType>(value_ % arg));	}		int_t operator%(const int_t<intType> arg) const	{		return (int_t<intType>(value_ % arg.value_));	}		int_t operator&(const intType arg) const	{		return (int_t<intType>(value_ & arg));	}		int_t operator&(const int_t<intType> arg) const	{		return (int_t<intType>(value_ & arg.value_));	}		int_t operator|(const intType arg) const	{		return (int_t<intType>(value_ | arg));	}		int_t operator|(const int_t<intType> arg) const	{		return (int_t<intType>(value_ | arg.value_));	}		int_t operator^(const intType arg) const	{		return (int_t<intType>(value_ ^ arg));	}		int_t operator^(const int_t<intType> arg) const	{		return (int_t<intType>(value_ ^ arg.value_));	}		int_t operator~() const	{		return (int_t<intType>(~value_));	}		int_t operator<<(const intType arg) const	{		return (int_t<intType>(value_ << arg));	}		int_t operator<<(const int_t<intType> arg) const	{		return (int_t<intType>(value_ << arg.value_));	}		int_t operator>>(const intType arg) const	{		return (int_t<intType>(value_ >> arg));	}		int_t operator>>(const int_t<intType> arg) const	{		return (int_t<intType>(value_ >> arg.value_));	}		int_t operator++()	{		++value_;		return *this;	}		int_t operator--()	{		--value_;		return *this;	}		intType value()	{		return value_;	}		bool operator==(const intType arg) const	{		return (value_ == arg);	}		bool operator==(const int_t<intType> arg) const	{		return (value_ == arg.value_);	}		bool operator!=(const intType arg) const	{		return (value_ != arg);	}		bool operator!=(const int_t<intType> arg) const	{		return (value_ != arg.value_);	}		bool operator>=(const intType arg) const	{		return (value_ >= arg);	}		bool operator>=(const int_t<intType> arg) const	{		return (value_ >= arg.value_);	}		bool operator>(const intType arg) const	{		return (value_ > arg);	}		bool operator>(const int_t<intType> arg) const	{		return (value_ > arg.value_);	}		bool operator<=(const intType arg) const	{		return (value_ <= arg);	}		bool operator<=(const int_t<intType> arg) const	{		return (value_ <= arg.value_);	}		bool operator<(const intType arg) const	{		return (value_ < arg);	}		bool operator<(const int_t<intType> arg) const	{		return (value_ < arg.value_);	}		static intType min()	{		return std::numeric_limits<intType>::min();	}		static intType max()	{		return std::numeric_limits<intType>::max();	}	private:	intType value_;};// Templated ostream operator.template<typename intType>std::ostream & operator<<(std::ostream & out, int_t<intType> arg){	return ( out << arg.value() ); }// Templated istream operator.template<typename intType>template<typename intType>std::istream & operator>>(std::istream & in, int_t<intType> arg){	return ( in >> arg.value() ); }// iter_t, can be used with int_t typestemplate<typename intType>class iter_t{public:	iter_t(intType lower, intType upper) :		counter_(0),min_(0),max_(intType::max()),lower_(lower), upper_(upper)	{		minErrorCheck(lower);		maxErrorCheck(upper);	}		iter_t(intType lower, intType upper,intType minimum,intType maximum) :		counter_(0),min_(minimum),max_(maximum),lower_(lower), upper_(upper)	{		minErrorCheck(lower);		maxErrorCheck(upper);	}	void isFirst()	{		counter_ = lower_;	}	void isFirst(intType lower)	{		minErrorCheck(lower);		counter_ = lower;	}	void isNext()	{		++counter_;	}	bool isDone()	{		return (counter_ == upper_ );	}	bool isDone(intType upper)	{		maxErrorCheck(upper);		return (counter_ == upper );	}		intType min()	{		return min_;	}		intType max()	{		return max_;	}private:	intType counter_;	const intType min_;	const intType max_;	const intType lower_;	const intType upper_;	void minErrorCheck(intType lower)	{		if ( lower < min_  )		{			STUtility::error error_cond("Exceeded lower limit");			error_cond.die();		}	}		void maxErrorCheck(intType upper)	{		if ( upper > max_ )		{			STUtility::error error_cond("Exceeded upper limit");			error_cond.die();		}	}};


This code is specific to my system, but would be easy to clean up for general use. Usage would be something like:

// Introduce safe integer types here, notice all are signed, bit type // will depend upon your system...typedef int_t<short>     int16;typedef int_t<long>      int32;typedef int_t<long long> int64;// Introduce 'safe and fast' iterator types here, based upon signed types...typedef iter_t<int16>  iter16;typedef iter_t<int32>  iter32;typedef	iter_t<int64>  iter64;// Initialize iterators for loop two ways...// 1. Pass lower, upper iteration limits, but the default min=0,max=numeric_limit<intType>iter32 i(0,100000000);// 2.  Pass lower, upper iteration limits and min, max limitsiter32 j(0,100000000,-200,10000000000);// Now use as follows:for ( j.isFirst();!j.isDone();j.isNext() ){    // do something...}// You can reset the iteration limits but not min/max.for ( j.isFirst(-250); !isDone(100000); j.isNext() ){    // do something...}// The above loop will result in an error...as -250 is below the minimum for j.// You cannot pick up this type of error with unsigned values, as the system // will just use negative values by resetting to another unsigned value.  To me// this is a potentially very serious and almost undetectable error condition!   


Anyway, from a practical perspective, I'll be using signed ints for just about everything I can now. Hope this helps, and comments on the code would be welcomed...

--random

Edit--Although the above code has illustrated deterministic application, it is possible to plug in a non-deterministic iteration limit, whilst still having a check on the maximum range.


[Edited by - random_thinker on September 16, 2007 5:42:04 PM]
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Quote:Original post by SiCrane
Quote:Original post by Sneftel
No, that's the thing. An index of 4294967295 will access exactly the same thing as an index of -1. From the CPU's point of view, there literally is zero difference between the two situations.

Assuming a 32-bit twos complement signed integer representation. A sign magnitude integer implementation would probably do something different.


Would you mind explaining why this is true? (that goes for both SiCrane's and Sneftel's posts)

And just to throw my experience in, at my school's robotics team (read: a bunch of unskilled high schoolers hacking up C code) we've had a couple of problems with unsigned numbers. One of which rammed a robot going at high speeds into a wall... However, it was also necessary that we used unsigned numbers due to hardware (at least I know of no way around it other than maybe some terrible hacks that are even more error prone).
Quote:Original post by Ezbez
Quote:Original post by SiCrane
Quote:Original post by Sneftel
No, that's the thing. An index of 4294967295 will access exactly the same thing as an index of -1. From the CPU's point of view, there literally is zero difference between the two situations.

Assuming a 32-bit twos complement signed integer representation. A sign magnitude integer implementation would probably do something different.


Would you mind explaining why this is true? (that goes for both SiCrane's and Sneftel's posts)


Moving 359 degrees forward on a circle is the same as moving 1 degree backward.

Processor's that use signed magnitude or one's complement for negative integers don't exhibit this kind of wraparound. The binary representation still wraps around when the processor calculates the effective address, but the bits have different meanings, so -1 isn't equivalent to 2^32-1 in either case for example.
Twos complement arithmetic takes advantage of the property that a n-bit positive integer, x is equivalent to 2n + x, mod 2n. So the negative of x is equivalent to 2n - x. For a 32 bit integer size, you have -1 is equivalent to 0xFFFFFFFF mod 232. Therefore you see that the binary representation is the same and that the process of adding -1 or 0xFFFFFFFF to a number involve exactly the same circuitry in an arithmetic logic unit, so that in a twos complement system, the two numbers are treated exactly the same by the processor. The difference between the two is in how the software interprets the data.

On the other hand a sign-magnitude representation uses (traditionally) 0x80000001 to represent -1. The most significant bit says the integer is negative, and the rest of the bits represent the magnitude of the number. The process of adding two signed numbers requires different circuitry on the ALU than the process of adding two unsigned numbers, so the numbers look different to the processor.

This topic is closed to new replies.

Advertisement