Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


Like
17Likes
Dislike

Grounded Pointers

By Andrey Karpov | Published Sep 13 2013 09:10 AM in General Programming
Peer Reviewed by (Endurion, Vilem Otte, Dave Hunt)

c++ swamp static analysis

Not so long ago one of our colleagues left the team and joined one company developing software for embedded systems. There is nothing extraordinary about it: in every firm people come and go, all the time. Their choice is determined by bonuses offered, the convenience aspect, and personal preferences. What we find interesting is quite another thing. Our ex-colleague is sincerely worried about the quality of the code he deals with in his new job. And that has resulted in us writing a joint article. You see, once you have figured out what static analysis is all about, you just don't feel like settling for "simply programming".

Forest Reserves


I find an interesting phenomenon occurring in the world nowadays. What happens when a software development department turns into a secondary entity not closely related to the company's basic area of activity? A forest reserve appears. However significant and critical the company's area of activity may be (say, medicine or military equipment), a small swamp appears anyway, where new ideas get stuck and 10-year old technologies are in use.

Here you are a couple of extracts from the correspondence of a man working in the software development department at some nuclear power plant:

And then he says, "What for do we need git? Look here, I've got it all written down in my paper notebook."

...

And do you have any version control at all?

2 men use git. The rest of the team use numbered zip's at best. Though it's only 1 person with zip's I'm sure about.

Don't be scared. Software developed at nuclear power plants may serve different purposes, and no one has abolished hardware security yet. In that particular department, people collect and process statistical data. Yet the tendency to swamping is pretty obvious. I don't know why it happens, but the fact is certain. What's interesting, the larger the company, the more intense the swamping effect.

I want to point it out that stagnation in large companies is an international phenomenon. Things are quite the same abroad. There is an article on the subject, but I don't remember its title. I spent quite a time trying to find it, but in vain. If somebody knows it, give me the link please so that I could post it. In that article, a programmer is telling a story about him having worked at some military department. It was - naturally - awfully secret and bureaucratic - so much secret and bureaucratic that it took them several months to agree on which level of access permissions he could be granted to work on his computer. As a result, he was writing a program in Notepad (without compiling it) and was soon fired for inefficiency.

Foresters


Now let's get back to our ex-colleague. Having come to his new office, he was struck with kind of a cultural shock. You see, after spending so much time and effort on studying and working with static analysis tools, it's very painful to watch people ignore even compiler warnings. It's just like a separate world where they program according to their own canons and even use their own terms. The man told me some stories about it, and most of all I liked the phrase "grounded pointers" common among the local programmers. See how close they are to the hardware aspect?

We are proud of having raised inside our team a skilled specialist who cares about the code quality and reliability. He hasn't silently accepted the established situation; he is trying to improve it.

As a start, he did the following. He studied the compiler warnings, then checked the project with Cppcheck and considered preventing typical mistakes in addition to making some fixes.

One of his first steps was preparing a paper aiming at improving the quality of the code created by the team. Introducing and integrating a static code analyzer into the development process might be the next step. It will certainly not be PVS-Studio: first, they work under Linux; second, it's very difficult to sell a software product to such companies. So, he has chosen Cppcheck for now. This tool is very fine for people to get started with the static analysis methodology.

I invite you to read the paper he has prepared. It is titled "The Way You Shouldn't Write Programs". Many of the items may look written pretty much in the Captain Obvious style. However, these are real problems the man tries to address.

The Way You Shouldn't Write Programs


Issue 1


Ignoring compiler warnings. When there are many of them in the list, you risk easily missing genuine errors in the lately written code. That's why you should address them all.

Issue 2


In the conditional statement of the if operator, a variable is assigned a value instead of being tested for this value:

if (numb_numbc[i] = -1) { }

The code is compiled well in this case, but the compiler produces a warning. The correct code is shown below:

if (numb_numbc[i] == -1) { }

Issue 3


The statement using namespace std; written in header files may cause using this namespace in all the files which include this header, which in turn may lead to calling wrong functions or the occurrence of name collisions.

Issue 4


Comparing signed variables to unsigned ones:

unsigned int BufPos;
std::vector<int> ba;
....
if (BufPos * 2 < ba.size() - 1) { }

Keep in mind that mixing signed and unsigned variables may result in:
  • overflows;
  • occurrence of always true or always false conditions and, as a consequence, infinite loops;
  • a value larger than INT_MAX may be written into a signed variable (and it will be negative);
  • an int-variable participating in addition/subtraction/etc. with an unsigned variable becomes unsigned too (so that negative values turn into large positive ones);
  • other unexpected nice things
The foregoing code sample incorrectly handles the situation of the ba array being empty. The expression ba.size() - 1 evaluates to an unsigned size_t value. If the array contains no items, the expression evaluates to 0xFFFFFFFFu.

Issue 5


Neglecting usage of constants may lead to overlooking hard-to-eliminate bugs. For example:

void foo(std::string &str)
{
  if (str = "1234")
  {
  }
}

The = operator is mistakenly used instead of ==. If the str variable were declared as a constant, the compiler would not even compile the code.

Issue 6


Pointers to strings are compared instead of strings themselves:

char TypeValue [4];
...
if (TypeValue == "S") {}

Even if the string S is stored in the variable TypeValue, the comparison will always return false. The correct way to compare strings is to use the special functions strcmp or strncmp.

Issue 7


Buffer overflow:

memset(prot.ID, 0, sizeof(prot.ID) + 1);

This code may cause several bytes of the memory area right after prot.ID to be cleared as well.

Don't mix up sizeof() and strlen(). The sizeof() operator returns the complete size of an item in bytes. The strlen() function returns the string length in characters (without counting the null terminator).

Issue 8


Buffer underflow:

struct myStruct
{
  float x, y, h;
};
myStruct *ptr;
 ....
memset(ptr, 0, sizeof(ptr));

In this case only N bytes will be cleared instead of the whole *ptr structure (N is the pointer size on the current platform). The correct way is to use the following code:

  myStruct *ptr;
  ....
  memset(ptr, 0, sizeof(*ptr));

Issue 9


Incorrect expression:

if (0 < L < 2 * M_PI) { }

The compiler doesn't see any error here, yet the expression is meaningless, for you will always get either true or false when executing it, the exact result depending on the comparison operators and boundary conditions. The compiler generates a warning for such expressions. The correct version of this code is:

 if (0 < L && L < 2 * M_PI) { }

Issue 10


unsigned int K;
....
if (K < 0) { }
...
if (K == -1) { }

Unsigned variables cannot be less than zero.

Issue 11


Comparing a variable to a value it can never reach. For example:

short s;
...
If (s==0xaaaa) { }

The compiler produces warnings against such things.

Issue 12


Memory is allocated with the help of new or malloc, while forgotten to be freed through delete/free correspondingly. It may look something like this:

void foo()
{
  std::vector<int> *v1 = new std::vector<int>;
  std::vector<int> v2;
  v2->push_back(*v1);
  ...
}

Perhaps it was the pointer to std::vector<int> that used to be saved into v2 before. Now, due to modifications of some code parts, it is no longer needed and there are just int values being saved. At the same time, memory allocated for v1 is not freed, as that was not needed in earlier times. To fix the code we should add the statement delete v1 at the end of the function, or use smart pointers.

Even better is to bring refactoring to an end, making v1 a local object, since you no longer need to pass it anywhere:

void foo()
{
  std::vector<int> v1;
  std::vector<int> v2;
  v2->push_back(v1[0]);
  ...
}

Issue 13


Memory is allocated through new[] and freed through delete. Or, vice versa, memory is allocated through new and freed through delete[].

Issue 14


Using uninitialized variables:

int sum;
...
for (int i = 0; i < 10; i++)
{
  sum++;
}

In C/C++, variables are not initialized to zero by default. Sometimes code only seems to run well, which is not so - it's merely luck.

Issue 15


A function returns a reference or pointer to local objects:

char* CreateName()
{
  char FileName[100];
  ...
  return FileName;
}

On leaving the function, FileName will refer to an already-freed memory area, since all the local objects are created on the stack, so it's impossible to handle it correctly any further.

Issue 16


Values returned by functions are not checked, while they may return an error code or -1 in case of an error. It may happen that a function returns an error code, us continuing to work without noticing and reacting to it in any way, which will result in a sudden program crash at some point. Such defects take much time to debug after that.

Issue 17


Neglecting usage of special static and dynamic analysis tools, as well as creation and usage of unit-tests.

Issue 18


Being too greedy for adding some parentheses in math expressions, which results in the following:

D = ns_vsk.bit.D_PN_ml + (int)(ns_vsk.bit.D_PN_st) << 16;

In this case, addition is executed in the first place and only then left-shift is. See "Operation priorities in C/C++". Judging by the program logic, the order the operations must be executed in is quite reverse: shift first, then addition. A similar error occurs in the following fragment:

#define A 1
#define B 2
#define TYPE A | B
if (type & TYPE) { }

The error here is this: the programmer forgot to enclose the TYPE macro in parentheses. This results in first executing the type & A expression and only then the (type & A ) | B expression. As a consequence, the condition is always true.

Issue 19


Array index out of bounds:

int mas[3];
mas[0] = 1;
mas[1] = 2;
mas[2] = 3;
mas[3] = 4;

The mas[3] = 4; expression addresses a non-existing array item, since it follows from the declaration of the int mas[N] array that its items can be indexed within the range [0...N-1].

Issue 20


Priorities of the logical operations && and || are mixed up. The && operator has a higher priority. Example of bad code:

if (A || B && C) { }

This may not conform to the required execution logic. It's often assumed that logical expressions are executed from left to right. The compiler generates warnings for such suspicious fragments.

Issue 21


An assigned value will have no effect outside the function:

void foo(int *a, int b)
{
  If (b == 10)
  {
    *a = 10;
  }
  else
  {
    a = new int;
  }
}

The pointer a cannot be assigned a different address value. To do that, you need to declare the function in the following way:

void foo(int *&a, int b) {....}

or:

void foo(int **a, int b) {....}

References:


  1. "Enough Rope to Shoot Yourself in the Foot. Rules for C and C++ Programming". Allen I. Holub;
  2. "C++ Coding Standards: 101 Rules, Guidelines, and Best Practices". Herb Sutter, Andrei Alexandrescu;
  3. "Code Complete". Steve McConnel;
  4. "C++ Gotchas: Avoiding Common Problems in Coding and Design". Stephen C. Dewhurst;
  5. "Effective C++: 50 Specific Ways to Improve Your Programs and Designs". Scott Meyers.

Conclusion


I haven't drawn any specific and significant conclusions. I'm only sure that in one particular place the situation with software development is beginning to improve. And that's pleasant.

On the other hand, it makes me sad that many people haven't even heard of static analysis. And these people are usually responsible for serious and important affairs. The area of programming is developing very fast. As a result, those who are constantly "working at work" fail to keep track of contemporary tendencies and tools in the industry. They eventually grow to work much less efficiently than freelance programmers and programmers engaged in startups and small companies.

Thus we get a strange situation. A young freelancer can do his work better (because he has knowledge: TDD, continuous integration, static analysis, version control systems, and so on) than a programmer who has worked for 10 years at Russian Railways/nuclear power plant/… (add your variant of some large enterprise). Thank God, it's so far not always like that. But still it happens.

Why am I feeling sad about this? I wish we could sell PVS-Studio to them. But they don't even have a slightest suspicion about existence and usefulness of such tools. :)



License


GDOL (Gamedev.net Open License)




Comments

Good articles :)

This article was good. I would like to check out PVS-Studio mostly for Issue 14!

 

I would like to suggest that most of these issues can be completely avoided by simply using C++ rather than C techniques such as replacing char*, malloc and raw arrays (via new[]).

 

Issue 14 is a tough one though. Clang++ can detect an unused variable but not an unassigned one. With GCC it is a free for all :/

These are some nice guidelines for less experienced programmers, and probably some good reminders for some who are more experienced.

 

I have a couple criticisms however.

 

Firstly, rather than saying "Issue 1...Issue 2...Issue N", make each of these a concise statement of the rule. Instead of "Issue 1" say "Don't ignore compiler warnings", instead of "Issue 2" say "Beware = and ==", instead of "Issue 3" say Don't use 'using' in headers".

 

Secondly, the advice given in the body of the section is often too brief, or gives incomplete advice. I like that brevity keeps things simple, but simplicity to the point of being incomplete or to the point of lacking any context at all is not much help, and can give those who don't know better a false sense of confidence in code they really ought to be suspicious of. For example, in "Issue 2" you warn users to beware mixing up = and ==, which is perfectly good advice -- if, in addition to that, you also tell them they should put their literals (or constants) on the left side of the operator, the compiler will help you spot this kind of mistake.

 

Lastly, some of the bodies sound like a problem statement alone, and don't provide a solution or any guidance (like Issue 15). Try to take the form: State the rule, state the problem, state the solution or offer guidance.

 

One other suggestion is that since you are trying to be brief, consider linking to outside resources for further reading if it seems appropriate, especially if and where you veer into matters of best practices over absolute correctness. You probably will not be able to clearly and effectively state the case for a best practice in just a paragraph or two.

I disagree here : 
 

Issue 8 

struct myStruct
{
  float x, y, h;
};
myStruct *ptr;
 ....
memset(ptr, 0, sizeof(ptr)); 

the correct(imho) way is :  memset(ptr, 0, sizeof(myStruct));

 

Just because when I see that line I always ask my self a question : "How ptr is defined?"

imoogiBG what if ptr is a base class pointer to a derived class to? Then sizeof the base class pointer would have predictable results (clearing the portion of pointed-to memory for the size of base class) but it would have undesirable results for the portion of pointed-to memory that is the derived class.

Good article!
 
I always cringe when I see code like this:
 
if (0 < L && L < 2 * M_PI) { }
Since I really don't want to keep the complete operator precedence list in my head I prefer to put paranthesis and make the inteded groups clearer:
 
if ( ( 0 < L ) 
&&   ( L < 2 * M_PI ) )
{ }
Paranthesis also auto-fix issue 18 and 20.



And I also prefer imoogiBGs way for memset. Put the type in the sizeof call.
And if you're using memset in conjunction with derived structs you should know the correct type.

imoogiBG what if ptr is a base class pointer to a derived class to? Then sizeof the base class pointer would have predictable results (clearing the portion of pointed-to memory for the size of base class) but it would have undesirable results for the portion of pointed-to memory that is the derived class.

 

It is badd practice to *clear the object state via memset. I'm using memset rearly only on *basic structures.

 

When i need to clear the state of a *complex object i prefer to write a method called Clear() that will clear evrything the right way! calling memset on *comlex objects will clear the virtual table ptr which is really dangerous.

After reading this article I installed static code analysis plugins for my personal projects.
I already used sonar for my thesis project, but didn't bother to set any S.C.A.T. up for my personal projects.
Even if I (surprisingly) had just 2 minor warning, definitely the situation in programming is getting better

Interesting article, I wasn't actually aware that  && and || had different priorities in C++.  Although diligent use of parentheses would avoid those kinds of errors, useful to know! 

Issue 20 is wrong. B && C will NOT be executed first. A will be executed first. B will be executed second. C will be executed third. Assuming no boolean logic short circuiting occurs. The way that statement is actually parsed is:

 

if (A || (B && C)) {}

 

This executes and checks A first. If A returns false, then it executes B. If B returns true, then it executes C. Observe.

 

Also, Issue 14 is a little misleading. Global variables are actually zero-initialized. But local variables are not. However, you can value-initialize them and then they will be defaulted to zero, which means if you do *ptr = myStruct(); you will fill the structure with zeros (unless it has a default constructor defined, in which case it will call that, and it's the constructor's job to then initialize them).

DISCARD THAT COMMENT. DISCARD THAT COMMENT. DISCARD THAT COMMENT

 

 

Also, Issue 14 is a little misleading. Global variables are actually zero-initialized. But local variables are not. 

 

This is compiler and compiler options(probably) dependant . 

 

DISCARD THAT COMMENT. IT IS NOT COMPILER DEPENDANT. (See the comment below)

 

 

Also, Issue 14 is a little misleading. Global variables are actually zero-initialized. But local variables are not. 

 

This is compiler and compiler options(probably) dependant . 

No, it is not. It is mandated by the standard. Section 3.6.2, paragraph 2, specifically. I'll link you to a nice SO Q&A (the only answers worth reading there are Chubsdad's and AndreyT's).

Also, I want to point out a text encoding issue. The part about comparing string pointers instead of string contents reads:

 

Ãhar TypeValue [4];

 

I'm gonna go ahead and guess 'Ã' was supposed to be a 'c'.

 

(and as a formatting thing, shouldn't the References come after the Conclusion?)

Good catch on the encoding error.

 

And no, it's not required that references come after the conclusion

And no, it's not required that references come after the conclusion

Required, perhaps not, but it's the idiomatic way of structuring an article (and I don't just mean articles on GDNet).

 

 

 

Also, Issue 14 is a little misleading. Global variables are actually zero-initialized. But local variables are not. 

 

This is compiler and compiler options(probably) dependant . 

No, it is not. It is mandated by the standard. Section 3.6.2, paragraph 2, specifically. I'll link you to a nice SO Q&A (the only answers worth reading there are Chubsdad's and AndreyT's).

 

Well thanks, Im shure i've tried it several years ago on gcc(i think). I've tried it today on msc and you were right.

Well thanks, Im shure i've tried it several years ago on gcc(i think). I've tried it today on msc and you were right.

It's been required since C++ was first standardized in 1998. In the C++98 standard, it's found in section 3.6.2, paragraph 1. It's possible you either tried it before C++ was standardized, or perhaps on a compiler that had a bug that has since been fixed.

The big lesson I see: most businesses are too hard at work to do anything other than what they've always done. BAD. I call it "head in sand OR grow". Nearly impossible to do both simultaneously. Yes, it's important to get the work done, but more important to try and keep up with the world and your contemporaries (and/or competition).

 

Imagine the power that stale swamp could yield if the developers felt a new wave of creativity and empowerment due to new tools and improved efficiency! Show everyone the fast way to get "caught up" so they can start fixing things that have nagged at them for years, decades, or since last century.

As cornstalks already pointed out issue 20 doesn't seem to be correct. Never heard of 'higher priority' myself.

 

&& has higher precedence so the expression A || B && C is executed as ( A || ( B && C ) ). However as far as I know this doesn't mean at all that B && C would be executed first. 

 

Normally the || and && operators are considered sequence points and any side effects of the A expression should have taken place before B and then C is evaluated.

 

In other words with any compiler I have seen A is always executed first then B and then C.

 

Admittedly I do not have an official version of the C++ standard but I have yet to see a compiler that doesn't comply with it.

As cornstalks already pointed out issue 20 doesn't seem to be correct. Never heard of 'higher priority' myself.
 
&& has higher precedence so the expression A || B && C is executed as ( A || ( B && C ) ). However as far as I know this doesn't mean at all that B && C would be executed first. 
 
Normally the || and && operators are considered sequence points and any side effects of the A expression should have taken place before B and then C is evaluated.
 
In other words with any compiler I have seen A is always executed first then B and then C.
 
Admittedly I do not have an official version of the C++ standard but I have yet to see a compiler that doesn't comply with it.

Yes, if any compiler does not execute A first then it is a bug in the compiler.
 
(The following is not necessarily directed at you specifically)
 
Operator precedence does not actually determine which operator is executed first (as some believe). Instead, it determines how operators and operands are grouped in an expression. For example, consider
 
a = b + c * d;
 
* has higher precedence than +, however, * is not necessarily executed first. Instead, the operator precedence determines that the expression is parsed as:
 
a = (b + (c * d))
 
Or, in a parse tree (which eliminates the need for parentheses):
 

  =
 / \
a   +
   / \
  b   *
     / \
    c   d

The parse tree helps clarify things. When evaluating this parse tree, there's no guarantee about the order of its evaluation. Will the left or right side be evaluated first? What about the subtrees? We simply aren't guaranteed anything here about which side is evaluated first. However, operator precedence is what allows us to disambiguate the potential parse trees, because a different operator precedence could have yielded:
 

  =
 / \
a   *
   / \
  +   d
 / \
b   c

(or in code: a = (b + c) * d)

Now, for logical operators || and &&, we do have a guaranteed order of evaluation (unlike + and *). The left hand side is always evaluated first, and the right hand side is only evaluated if the left hand side is false or true, respectively. Evaluation is left to right. In the expression

a || b && c

The parse tree is:
 

  ||
 /  \
a   &&
   /  \
  b    c

Operator precedence merely determines how to build that tree, in order to eliminate this alternative parse tree (that uses different (wrong) operator precedence):
 

    &&
   /  \
  ||   c
 /  \
a    b

The precedence itself doesn't determine order of evaluation. The order of evaluation of && and || is always left to right. Operator precedence just determines the parse tree.

I feel like most comments are really ignoring the main point of the article. It's not really about the technical issues listed. Those are there to prove a point about, in general, FTE's and technology at bigger companies. I'll say that I'm falling into this category. I'm not at a software shop but we make software for our own internal needs.

 

I started as a desktop programmer in .NET 2.0. They also have me being basically a DBAish type role also. There are months where I spend more time in a database then VS. Because of this the web craze has passed me by and I'm playing catch-up. Things like Linq & the Entity framework are all something new/younger employees all know and use but I've been left behind because my nose has been in the grinder for a long time. Companies often don't give time for training leaving you to do it on your own. Well I have a wife, kids, staying in shape, hobbies, etc. A life outside of programming. I'm aloud to have that am I not? I think this underlying problem is a major issue and is the reason for the issues you see above in this situation. It's a sad situation.

 

I find often work life balance is a hard thing. The IT industry makes it even harder because it moves so fast. When I was younger I kept up with technology on my own. As time goes by and more and more things take my attention away, I find it much harder to stay up with the times. 

Issue 20 is wrong. B && C will NOT be executed first. A will be executed first. B will be executed second. C will be executed third. Assuming no boolean logic short circuiting occurs. The way that statement is actually parsed is:
 
if (A || (B && C)) {}
 
This executes and checks A first. If A returns false, then it executes B. If B returns true, then it executes C. Observe.


Thank you. Written by stupidity. I'm a little fix this.

Those should be shown on schools. Good work, thanks :)

Just a remark on issue 14 where you write "In C/C++, variables are not initialized to zero by default. Sometimes code only seems to run well, which is not so - it's merely luck."

 

I would actually argue that it it would be better if it was immediately obvious that it was inaccurate because depending on what the variable is being used for you could have really hard to track down bugs (maybe you can't imagine it happening but what if the unitialized variable was used as an index to an array ? It could happen and that would lead to difficult bug tracking in any program large enough where you don't write outside your address space). Of course that they didn't initalise it to begin with (and note that at least with gcc default options will not warn you of unitialised variables!) is a problem but as you point out it is an issue that is all too common (and unfortunately it isn't always warned against).

 

The point of ignoring warnings (compiler) cannot be overstated except for when compiling really large projects like gcc itself (they enable many, many warnings and there is also the serious stress tests it does after compiling -- unless you decide to not run that but that is naive -- so... no need usually to worry about those warnings).

 

Overall, a very nice article and some thing that I think about a lot (and more programmers should even -- some of the issues covered are security risks, e.g., the buffer overflows, even). Are there any schools that actually really truthfully teach secure programming? I would be surprised if there were anything of the sort -- programming is enough for them. Sadly experience is also a huge part of it when you're writing software that others (arguably not others) will use and unfortunately the mentality of "I have a diploma so I surely must be well capable." is too prevelant (a piece of paper is not experience).

Good article. I'm pretty sure I was guilty of some of those at Uni... Not any more!

RE the = and == confusion point

if (numb_numbc[i] = -1) { }

if you always write your constant on the left and variable on the right, then you will get an error when you accidentally miss out an =

if (-1 == numb_numbc[i]) { } // correct
if (-1 = numb_numbc[i]) { } // error

Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS