Sign in to follow this  

Secure/Stable Programming

This topic is 4811 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Howdy folks, it's da Nurgster here

I've noticed that several threads have a few insecure/instable code segments in them, so I though I'd just give you all a few pointers on improving the dire trend in software development today to leave buggy pieces of elephant feces in apps.

Whenever copying from one area of memory to another, ALWAYS use the size of the destination as the amount of memory to copy, never the source. e.g.:

memcpy(dest, src, sizeof(dest)); // OR
strncmp(dest, src, sizeof(dest));

This simple, and maybe even obvious piece of advise is so foten ignored, and so often the cause of many of the problems involved in computing (especially when it comes to networking).

If you have any other tips, feel free to post them.

Note to felis: I've made this sticky so people will see it, but if you object, I'll make it unsticky.

Share this post


Link to post
Share on other sites
I guess i'll just mention the classic ones =)

---

Allways initialize variables when you declare them, even if you're sure you'll attribute a value to them before you use them, you should allways do it.
You can allways add extra code before, and forget to clear it, you can remove that code that initializes it, and forget to clear it in the next piece of code that will use it.

---

When alocating space for an array, particularly strings, allways allocate one more space, like so:

#define STRSIZE 256

char strbuffer[STRSIZE+1];

Even if you define STRSIZE as 256+1, it is allways a good practice to add an extra space, you might change the define later, and that way you allways have a safe guard against buffer overflows(provided you zero the buffer).

^^^

These two when coding relatively large and complex pieces of code, can give you real headache because they don't allways show themselfs.
You might run the program once and it's ok, and after that it crashes, gives out wrong results, or runs for ever. And while you're trying to figure out, what part of the algorithm you implemented wrongly, the mistake is else where, laughing at you =).

---

Sometimes you might use an output funtion to report on the status of a particular part of your code for debugging purposes, like printf, what you can't forget is that some of these funtions are buffered(like printf), meaning, they just output when the internal buffer is full, a speed optimization.
The problem is that, if you try to output a test string to signal that a particular part of your code was executed, again for debugging purposes, the program might crash before the actual string is printed, it remains in the buffer, and then you'll get the wrong impression, you'll think the problem is earlier in the code, because the string that signaled the right execution of the code, didn't show, when in fact was just waiting in the buffer when the program crashed.
To solve this, you need to flush the buffer of the particular output funtion, in the case of printf, a simples fflush(stdout) will do.

[Edited by - White Rabbit on August 21, 2004 2:18:17 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Nurgle
Whenever copying from one area of memory to another, ALWAYS use the size of the destination as the amount of memory to copy, never the source. e.g.:



memcpy(dest, src, sizeof(dest)); // OR



Not quite... you want to take the min value of them.

memcpy(dest, src, destsize > srcsize ? srcsize : destsize);


The same really goes for any case where you are copying memory.

Also, when using C++, use the standard containers. It will deal with memory allocations and deallocations for you. Using the bounds safe versions of the accessors can save you a lot of trouble.

Share this post


Link to post
Share on other sites
Assert everything. The less likely something will be bad, the more likely you should assert it. The more complex the assert, the more information it will give you. Don't hesitate to write your own assert macro that gives you more information than the standard one does.

You should read articles like: Generic<Programming>: Assertions

Also, refactor everything. The simpler a chunk of code is, the easier it is to debug. It is also easier to test.

Unit test as much as possible. Doing TDD can save you days of debugging, and days more of fixing. The more refined your tests, the better. Remember: A test that should fail but doesn't will often give a better idea of where a bug lies.

Profile last. Then once you've profiled, optimize. But optimize at the highest level possible. Changing the algorithm or data structure you use will generally lead to better results than taking the slow code and making it "faster", assembly should be a last resort. If you have to drop to assembly code, chances are you've picked the wrong algorithm or data structure.

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu

Profile last. Then once you've profiled, optimize. But optimize at the highest level possible. Changing the algorithm or data structure you use will generally lead to better results than taking the slow code and making it "faster", assembly should be a last resort. If you have to drop to assembly code, chances are you've picked the wrong algorithm or data structure.



Not entirely true. Sometimes, using assembly is the best way to go. Consider, for example, prefetching. The compiler doesn't know what is good code/data to prefetch, and can't actually make use of this feature too well.



Other things, such as vectorization, would also benefit from assembly, but that requires some work at the initial implimentation to do well. It helps if you have an understanding of what the processor does and when, of course.

Share this post


Link to post
Share on other sites
If you use threads, you almost always should also use mutexes! (aka 'critical sections')

Mutexes, or critical sections, or locks, are systems offered by any decent operating system which allow a thread to request and hold sole control of things like global variables and structures. Learn them well, and use them well, curse it!

If you have a single boolean "bFinished", don't worry about it. That is done in a single memory write, and you have no worries.

But if you have a structure holding application properties, or a "to do" list that one thread fills, and another drains... before reading or writing that list, lock it, copy it to thread-specific memory (your heap or stack), and then unlock it.

Why? Because what if, in the draining thread, you've halfway through a memory copy when the threads switch? Now the filler thread writes over the probably unused entry, does its whole copy, and the threads switch again. Now the drainer continues his memory copy... with corrupt data, which is not part of the data he was originally copying! With use of a mutex, the drainer would lock that section. He would start his copy, and then the threads switch. The filler would look, and see the section was locked, and choose to wait until it becomes free. The threads switch again, and the drainer finishes copying the entry, and unlocks the mutex.

There are non-mutex fixes for this particular example, but trust me; you don't want to screw up on something like interthread communication! Go the safe way! Use these programming equivalents of an "Occupied" sign!

Share this post


Link to post
Share on other sites
My suggestion, _never_ hide bugs!
It's extremely bad.


Quote:
Original post by White Rabbit
I guess i'll just mention the classic ones =)

Allways initialize variables when you declare them, even if you're sure you'll attribute a value to them before you use them, you should allways do it.
You can allways add extra code before, and forget to clear it, you can remove that code that initializes it, and forget to clear it in the next piece of code that will use it.

---

When alocating space for an array, particularly strings, allways allocate one more space, like so:
#define STRSIZE 256

char strbuffer[STRSIZE+1];

Even if you define STRSIZE as 256+1, it is allways a good practice to add an extra space, you might change the define later, and that way you allways have a safe guard against buffer overflows(provided you zero the buffer).


It's just hiding bugs - usual practice for C/C++ programmers. Everything is done in that way, "just hide bug,and everything is OK now" :( .Adding extra space it's not avoiding buffer overflow but making buffer overflow to happen less frequently
edit:on normal usage. It does not protect against hacker.

Language that can't optionally check array bounds are actually extremely crappy and shouldn't be used in any critical or networking application. But it's used!
(extra time to check array bounds is essentially _zero_ for most applications where it's more-or-less critical(networking,etc). Networks are VERY slow comparing to computers,even 100Mbit ones.)

Quote:


---

These two when coding relatively large and complex pieces of code, can give you real headache because they don't allways show themselfs.
You might run the program once and it's ok, and after that it crashes, gives out wrong results, or runs for ever. And while you're trying to figure out, what part of the algorithm you implemented wrongly, the mistake is else where, laughing at you =).

Share this post


Link to post
Share on other sites
here is one that not many people realize when using STL containers:

When using STL containers, always explicitly specify copy constructors and/or copy operators!

sample of safe STL structs/classes:


class MyGenericClass {
public:
MyGenericClass( const MyGenericClass& rhs ) {
(*this) = rhs;
}

MyGenericClass& operator=( const MyGenericClass& rhs ) {
m_A = rhs.m_A; // has assign operator defined

// 'SomeClassType_WithOutCopyOperator' doesn't have assign operator so do it manually!
m_B.m_someMember = rhs.m_someMember;
m_B.m_someOtherMember = rhs.m_someOtherMember;

return (*this);
}

public:
SomeClassType_WithCopyOperator m_A;
SomeClassType_WithOutCopyOperator m_B;
};





RULE OF THUMB:
Use this mainly for large classes with non-primitive member variables.


in somecases, it may be better (read: easier) to let the
compiler implicitly define a member-for-member default copy
constructor/operators.

Share this post


Link to post
Share on other sites
Quote:
Original post by silvermace

MyGenericClass( const MyGenericClass& rhs ) {
(*this) = rhs;
}

MyGenericClass& operator=( const MyGenericClass& rhs ) {
m_A = rhs.m_A;
m_B = rhs.m_B;
return (*this);
}





I wouldn't do that in the copy constructor

(construction/initialization) != assignement

even thou the final result is the same they are not the same operation, with assignement the variable always had "some" previous value that is replaced.

Also like to add always prefer the default copy/assignment operations implicitly defined or if doesn't make sense for a type to copy/assign then make them protected/private. Only define what means to copy & assign when you have data members that are raw pointers.

Share this post


Link to post
Share on other sites
Dmytry:

I think you're blowing this a bit of proportion. Null terminating low level strings in c/c++ is what one needs to do for that same string to be usable by many funtions, it's not "hidding bugs", it's actually preventing them.
If you don't particularly enjoy the way bounds checking is done in c/c++, it's ok, but leaving an extra space for the NULL is indeed good practice with c/c++ when dealing with low level strings.

Moderator:

Feel free to delete this post for thread maintenance purposes.

Share this post


Link to post
Share on other sites
Quote:
Original post by snk_kid
***** source block removed *****

I wouldn't do that in the copy constructor

(construction/initialization) != assignement

even thou the final result is the same they are not the same operation, with assignement the variable always had "some" previous value that is replaced.

Also like to add always prefer the default copy/assignment operations implicitly defined or if doesn't make sense for a type to copy/assign then make them protected/private. Only define what means to copy & assign when you have data members that are raw pointers.


Hi,
what do you mean "some" previous value, i thought by that point
inside the code, the class and its variables are already
constructed w/o initial values, wouldnt inline'ing the
copy operator give out the EXACT same code?

eg.

MyGenericClass( const MyGenericClass& rhs ) {
m_A = rhs.m_A;
}

MyGenericClass& operator=( const MyGenericClass& rhs ) {
m_A = rhs.m_A; // same as above ????
return (*this);
}


is it actually bad practice, or do you just not prefer it?

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
Assert everything. The less likely something will be bad, the more likely you should assert it. The more complex the assert, the more information it will give you. Don't hesitate to write your own assert macro that gives you more information than the standard one does.

I've never used asserts alot except in complex code. I find less clutted code worth the extra time it takes to find out the assert information using the debugger.

Share this post


Link to post
Share on other sites
Quote:
Original post by silvermace
what do you mean "some" previous value


Well i mean this:


int i = 10; //implicitly invokes constructor

i = 30; //assignement


is different from this:


int i(30);
//or
int i = 30; //both are equivalent


Even thou they both eventually have the same result the first version being i is initialized with 10 and is replaced by 30 by assignement so it had a previous value, if in the first version you didn't intialize it to 10 or any number what so ever then it will probably have some garabage value so again it had a previous value.

Quote:
Original post by silvermace
i thought by that point inside the code, the class and its variables are already constructed w/o initial values, wouldnt inline'ing the
copy operator give out the EXACT same code?


From what i remember the c++ standard does not mandate compiler writers to have variables intialized to some default or without initial values.

You cannot make any assumptions thats the whole point of using constructors & constructor intializer lists to intialize user-defined types with either default or non-default values.

Quote:
Original post by silvermace
eg.

MyGenericClass( const MyGenericClass& rhs ) {
m_A = rhs.m_A;
}

MyGenericClass& operator=( const MyGenericClass& rhs ) {
m_A = rhs.m_A; // same as above ????
return (*this);
}



What happens if you have pointers as data members i'm pretty sure that copy construction and assignement will be quite different. The constructor will pull resources (say call new) where as the assignement operator will probably release the current resource back (say call delete) then pull some new resources to prepare and then copy the values.

Quote:
Original post by silvermace
is it actually bad practice, or do you just not prefer it?


bad practicel, prefer constructor intializer lists to intialize members and pull resources in the body of the constructor.

Share this post


Link to post
Share on other sites
Once VS 2005 comes out (both express VC and the full deal), use the secure CRT functions. It requires 0 effort on your part, and provides a pretty minor overhead.

Share this post


Link to post
Share on other sites
Quote:
Original post by White Rabbit
Dmytry:

I think you're blowing this a bit of proportion. Null terminating low level strings in c/c++ is what one needs to do for that same string to be usable by many funtions, it's not "hidding bugs", it's actually preventing them.
If you don't particularly enjoy the way bounds checking is done in c/c++, it's ok, but leaving an extra space for the NULL is indeed good practice with c/c++ when dealing with low level strings.

Moderator:

Feel free to delete this post for thread maintenance purposes.

Adding 0s at end of string is OK. Adding zero to zero-terminated string is not OK. Adding extra size to the end of all arrays is not OK too.
For string,one 0 is needed for it's normal work,but it's not extra 0s you need to optionally add at the end "to be sure".

edit:and i know,many people have problems with counting from 1 versus from 0 (i also had) . And will just allocate bigger arrays because they are +/- 1 unsure what size they need. It's not good.

Share this post


Link to post
Share on other sites
If you do any sort of string handling and security is a concern then for crying out loud use a string class. It doesn't have to be std::string, just something that will protect you against overruns. If speed is a concern then consider making a StringRef class that provides class semantics atop an otherwise stack-allocated buffer.

Ditto for stream classes. No excuse for C++ programs to have buffer over/underflows nowadays. Avoid unsafe CRT functions like the plague.

Share this post


Link to post
Share on other sites
Don't use signed values for sizes!

This mistake is made too often, and it is a security break waiting to happen.


int size;
char *buf;

fread(&size, sizeof(int), 1, infile);

if (size < 1024) //WOOHOO
buf = (char *)malloc(size);





The code reads in a size from file to an int, then checks that it is not greater than 1k, and then allocs the specified size.

That might be a good idea if size was unsigned int, but it is not, and can therefore be smaller than 0.

edit: now the code seems right

Share this post


Link to post
Share on other sites
Quote:
Original post by doho
Quote:
Original post by Washu
Assert everything. The less likely something will be bad, the more likely you should assert it. The more complex the assert, the more information it will give you. Don't hesitate to write your own assert macro that gives you more information than the standard one does.

I've never used asserts alot except in complex code. I find less clutted code worth the extra time it takes to find out the assert information using the debugger.

In my professional experiance I've found it to be quite the opposite. A well placed assert can save me many hours of debugging. Especially as the codebase becomes more and more complex. An assert doesn't have to be ugly you know.

Quote:

Not entirely true. Sometimes, using assembly is the best way to go. Consider, for example, prefetching. The compiler doesn't know what is good code/data to prefetch, and can't actually make use of this feature too well.

Other things, such as vectorization, would also benefit from assembly, but that requires some work at the initial implimentation to do well. It helps if you have an understanding of what the processor does and when, of course.

You are right, there are times when assembly can be the "right" thing to use. My point was that optimization should be looked at from a higher level than that. Because chances are that you will get a greater gain from optimizing something else. While assembly can be useful (I won't deny that), it tends to be more wasteful writing something in assembly than just optimizing an algorithm, or changing it out entirely.

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
Quote:
Original post by Nurgle
Not entirely true. Sometimes, using assembly is the best way to go. Consider, for example, prefetching. The compiler doesn't know what is good code/data to prefetch, and can't actually make use of this feature too well.

Other things, such as vectorization, would also benefit from assembly, but that requires some work at the initial implimentation to do well. It helps if you have an understanding of what the processor does and when, of course.

You are right, there are times when assembly can be the "right" thing to use. My point was that optimization should be looked at from a higher level than that. Because chances are that you will get a greater gain from optimizing something else. While assembly can be useful (I won't deny that), it tends to be more wasteful writing something in assembly than just optimizing an algorithm, or changing it out entirely.


Nurgle, I disagree with you (not about the need of assembly language - of course we still need it, since some code construct cannot be written using high level languages - but about the examples you gave). Todays compiler are far more efficient than the one I used to use in the past days. The Intel compiler and the Codeplay VectorC compiler, for example, can vectorize your code - and they perform very well. It can also unroll small loops - and most of them are good candidate to prefetch. Most assembly programming is governed by rules : "if you do this like that, then you'll have some very fast code". Those rules are also applied by compilers when they can - of course, this "when they can" is the Big Point. IMHO it is far better to give hints to the compiler and to use its intrisics in order to enable him to generate better code than to write hand assembly routine - and it is less time consuming, of course, which is always good :)

Washu++ - as coders, the first things we need to optimize are our algorithms.

And since we are still speaking about code statbiliy and security : even this should not be done before testing. The code development process is usually :


  • design
  • write code
  • test
  • debug
  • profile
  • optimize
  • test
  • debug
  • release


(with a lot of loops everywhere :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Emmanuel Deloget

Nurgle, I disagree with you (not about the need of assembly language - of course we still need it, since some code construct cannot be written using high level languages - but about the examples you gave). Todays compiler are far more efficient than the one I used to use in the past days. The Intel compiler and the Codeplay VectorC compiler, for example, can vectorize your code - and they perform very well. It can also unroll small loops - and most of them are good candidate to prefetch. Most assembly programming is governed by rules : "if you do this like that, then you'll have some very fast code". Those rules are also applied by compilers when they can - of course, this "when they can" is the Big Point. IMHO it is far better to give hints to the compiler and to use its intrisics in order to enable him to generate better code than to write hand assembly routine - and it is less time consuming, of course, which is always good :)



Disclaimer: I used to work at Codeplay, so I am well aware of the advantages and limitations VectorC.



Prefetching is a very subjective thing, and it isn't a good idea to let the compiler decide what to prefetch and what to ignore. Finding this info out can only be done at runtime when you have identified bottlenecks using your profiler of choice. Your second point about hints, however, is redundant, as this pretty much equates to the same thing in the long run (except it isn't proc. specific).




Share this post


Link to post
Share on other sites
Reuse your code, you don't have to do the same thing in several member functions as long as you know where you should do each part of it...
class Type
{
public:
Type()
: mpValue(new Value())
{
}

Type(const Type& rhs)
: mpValue(new Value(*rhs.mpValue))
{
}

Type& operator=(const Type& rhs)
{
if (this != &rhs)
Type(rhs).swap(*this);
return *this;
}

~Type()
{
delete mpValue;
}

void swap(Type& rhs)
{
std::swap(mpValue, rhs.mpValue);
}

private:
Value* mpValue;
};

Share this post


Link to post
Share on other sites
Quote:
Original post by doho
I've never used asserts alot except in complex code. I find less clutted code worth the extra time it takes to find out the assert information using the debugger.

Using assert effectively doesn't have to clutter your code. You can (and should, IMO) bury it completely in member functions. I use asserts very heavily, but once I have created my classes for a particular project, I never see them again unless one fails.

Recently I created some classes to run some poker simulations. I created classes for cards, decks, and so on. All of them internally check that they are valid via asserts. A card always makes sure it is a valid card. A deck makes sure it only has valid cards, that all cards are present, that duplicates aren't present, and so on.

One example I recently created was a bounded value, which is basically an int wrapper that asserts that the value of the int is within certain bounds. Something like this.


// digit will cause an assert to fail if it takes on a value
// outside the range [0,9] inclusive
BoundedValue<0,9> digit;
digit = 0; // OK
digit++; // OK
digit += 42; // Not OK, assert fails



An assert is meant to check an invariant. Invariants belong in classes. That is one of the main reasons for creating a class, to establish and protect an invariant. Therefore all asserts should live inside a class.

That's my current theory anyway, but I don't have the professional experience to test that theory. What do you experienced professionals think?

Share this post


Link to post
Share on other sites
Make your functions perform a specific logical action. Don't try to stuff as much crap into your function as possible.

Also, read and live by Code Complete.

Share this post


Link to post
Share on other sites

This topic is 4811 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this