Taking High Level Programming for Granted

Published July 15, 2011
Advertisement
Recently there have been some posts around about people considering using C over C++ for god knows what reasons they have. As per usual the forum crowd advises them to stay away from C and just learn C++. This is great in theory because over all despite the insane complexity of C++ it is a much safer language to use then C. C is a very elegant language because of it's simplicity. It is a very tiny language that is extremely cross platform (more then C++) and has a very straight forward and tiny standard library. This makes C very easy to learn but at the same time makes C very difficult to master. This is because you have to do a lot of things by hand in C because there is no standard library equivalent or particular language features that cover all bases. C++ is safer in a lot of situations because of type safety. C++ keeps track of the type of everything where C actually discards all type information at compile time.

With that little blurb aside I personally feel a lot of programmers should learn C. Not as a first language but at some point I think they should learn it. This is because it allows you to understand how the high level features of modern day high level languages work and people take these things for granted now a days. Today there are not that many programmers that actually understand what templates are doing for them and what disadvantages/advantages they have. The same goes for objects. A lot of programmers fail to understand how objects work internally. This information can make you a much better programmer over all.

Over the last few years I have spent a lot of time in C compared to C++ or other high level languages. This is not only to understand the internals of high level features I have used in the past but because I am preparing for a up coming project I am designing. This project almost has to be done in C for portability, performance, and interoperability reasons. This project not only will be targeted towards the Linux desktop but possibly embedded devices as well. So today I am going to show something that C++ gives you that C does not and how to get the same functionality in C anyway. Then I will explain why the C version is more efficient then the C++ version but at the same time not as safe because of programmer error potential. We will keep the example simple instead of making a Generic Stack we are going to make a Generic Swap function and for simplicities sake I am going to keep the two examples as close as possible.

C++ gives us a feature known as templates. Templates are a powerful meta programming feature that will actually generate code for us based of the the type of data it receives. They can do more then just this but this is a very common use. The main downfall of this particular method of creating swap is that if you pass in over 50 different types to swap during the course of the application you are actually generating over 50 different functions that have to be added by the compiler. So with that said here is a generic swap function in C++ using templates.

[source lang="cpp"]
template
void swap(T &v1, T &v2)
{
T temp;
temp = v1;
v1 = v2;
v2 = temp;
}
[/source]

There are 2 specific features to C++ we are using here. First we are using templates to generalize the type we are swapping and lastly we are using references so that we are actually swapping the variables passed in. This basically means we take in any type of variable and then we swap the address those variables hold. When this is compiled for each different version of swap we call C++ will generate a type specific function for us.

Now we need to make the equivalent of this function in C. The first thing to note is C does not have templates and C does not have the concept of references and it does not retain type information after compile time. So with some C trickery and clever assumptions based of the specs we can achieve the same result. There are other ways to do this but I am going to do it the 100% portable way this is both ANSI and POSIX compliant standards wise. Here is the code and the explanation of why I can do what I am doing will be explained afterwards.

[source lang="cpp"]
void swap(void *vp1, void *vp2, int size)
{
char *buffer = (char *)malloc(size * sizeof(char));
assert(buffer != NULL);
memcpy(buffer, vp1, size);
memcpy(vp1, vp2, size);
memcpy(vp2, buffer, size);
free(buffer);
}
[/source]

Ok so there is a lot there. First a void ptr is a generic pointer we can do this because pointers are always 4 bytes in size so the compiler does not care what is in them because we are just pointing to the storage location. Since we also don't know how big the data being pointed to is we also need to pass in the size. Now we need to find a replacement for our temp variable we used in C++. We don't know what type is stored in our void pointer we need to figure out what to store and calculate how big of a space in storage we need. We don't care what is stored we just want to hold that bit pattern. Because we know in C that a char is only 1 byte we can use that to our advantage to do the pointer arithmetic necessary to calculate the size of our storage. So we will dynamically allocate an array of char types to store our bit pattern. We will also do an assertion to make sure that we are not null before we attempt to copy data into this location. The assertion will bail if we have no space allocated. Next we need to use memcpy this will copy our bit patterns around for us. Lastly we need to make sure we free our temp storage location.
The main advantage of this is the application does not generate a new function for each different type we call through it. This uses the same assembly no matter what we pass into it. This efficiency does come with a price. If swap is not called properly we don't know what we will get back. Also because we are using void pointers the compiler will not complain it actually suppresses what compiler checking we do have. Also you must keep in mind that if the 2 types being swapped are actually different say a double and an int or int and a char* we enter the realm of undefined behavior and have no idea what will happen.

When calling swap with 2 ints you would call it as

swap(&val1, &val2, sizeof(int));

If you are swapping 2 character strings you need to call it as

swap(&val1, &val2, sizeof(char *));

With the character strings you still need to pass the address and you need to pass the size of the pointer to that address range.
This is important because a character string or char * is actually a pointer to an array of characters so you need to make sure you are pointing to the address of that
array of characters.

With all that said you can see how the C++ makes things like this very easy at a price of generating duplicate instructions. With the C you can see of a very efficient way to do the same thing with its own set of drawbacks on the caller side. It is very similar to what the C++ would do internally behind the scenes the only difference is they are passing through hidden type information so that they can generate exact type casting so you retain your type safety. This is a great and simple demonstration of what we take for granted when we use the various high level features of different programming languages. So next time you use these features stop and say thank you to the designers because without their efforts your features would not exist and you would have to do pointer arithmetic on a daily basis.

Last note if you read this and you still are thinking of using C over C++ the decision is ultimately up to you. Personally I love C it is a very elegant and clean language and I really enjoy using it, however, ask yourself if it is the right tool for the job because in C you have to reinvent the wheel constantly to achieve the functionality that newer languages give you almost for free.
0 likes 4 comments

Comments

Aardvajk
Interesting post, and agree with the majority, but, to probably state the obvious but for the sake of less educated readers:

[quote]
It is very similar to what the C++ would do internally behind the scenes[/quote]

Only for POD types. Obviously the C version works because that is all you have in C, but in C++, the compiler could be generating code using operator=() for class types and all the similarities break down at this point. The C++ template could effectively be doing anything and a generic solution without templates and differently generated versions for different types becomes impossible.

To be perfectly honest, I think the only reason it is helpful to understand what is going on behind the scenes if you aren't actually writing library code is because we are stuck working with such bastardised messes as C++ and very obscure bugs arise that require this understanding.

I kind of doubt that an understanding of how, say, a string actually works is anything like as important in a more modern language for the majority of programming domains. But we are stuck with these older languages for my lifetime so your point holds.
July 16, 2011 01:36 AM
nickariku
I always thought that template were preprocessing definitions
I would have write something like this

#define DEFINE_SWAP(TYPE) void swap__##TYPE(TYPE t1, TYPE t2)\
{ TYPE tmp;\
OPRATORMOV__##TYPE(tmp, t1);\
OPRATORMOV__##TYPE(t1, t2);\
OPRATORMOV__##TYPE(t2, tmp);\
}

#define SWAP(TYPE, t1, t2) swap__##TYPE(t1, t2)

And then for example with int
I first have to define a operator mov for this type

#define OPERATORMOV__int(i1, i2) i1=i2


for a struct it would look like

#define OPERATORMOV_astruct(s1, s2) s1.anything= s2.anything\ and whatever
or with a call to a function
#define OPERATORMOV_astruct(s1, s2) mov(&s1, s2)

then somewhere I have to call define swap for a specific type to actualy create the function
DEFINE_SWAP(int);

and I could call swap like this

SWAP(int, int1, int2);

Not tested but I always thougth it actualy worked like this
I mean the compiler create a specifique code based on the given template
July 16, 2011 12:31 PM
Storyyeller
Another point you didn't mention. The "efficiency" has the downside of being harder to inline as well as easy to mess up and less generic. So it's not inconceivable that the C++ version is actually more efficient in some cases.
July 16, 2011 11:14 PM
blewisjr
The C version is not difficult for the C compiler to in line and it is very generic it can handle anything you can construct in C because it is not actually handling anything more then memory addresses. The only messing up that happens is on calling the swap function if someone fails to pass in the actual address. Even if you have a pointer to some array you have to remember to still pass the address that points to the beginning of the array not the first element of that array. Hence when dealing with an array of char * you actually need to technically pass in a double pointer and that is why you need the &.
July 17, 2011 12:55 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement