How were new features added to C?

Started by
20 comments, last by BitMaster 11 years, 7 months ago
I was recently reading about how the first C++ compiler was CFront which basically translated the C++ code in to C code and then compiled it. Well, I want to know how it translated classes and other new features in C++ which were not there in C. In fact, how were/are new features added to a programming language in the first place? I saw something about preprocessors. Well, what are they exactly and how do you write them?

I would appreciate it if you could answer in a easy-to-understand sort of way. I am asking here because I have googled this and not understood most of it.

EDIT: Also, isn't the preprocesser in C/C++ the hash symbol (#) ?
What's This?: basically, it's my blog. Click on it.
Advertisement
There's no real magic to it -- most features in C++ can be implmented relatively straight-forward in C, for example, a virtual function in class is not very different than a function pointer within a struct. The syntax to achieve it is different--much more verbose and error-prone in C--but the effect is the same. In this way, C++ is largely just a nicer syntax for doing things you could have done anyways.

Other things are a little more novel, for example the access specifiers public, private and protected don't have a real equivalent in C (or at least a straight-forward one), at least as far as I'm aware. Those things are essentially just new rules for the compiler to follow.


As for preprocessors, all that really means is that it implements some kind of (relatively trivial) source-to-source translation. It sucks in source code with special markers, and spits out source code in the same language. The C pre-processor expands macros and conditionally includes/elides specially marked code segments, for example.

The early C++ compiler, essentially an advanced preprocessor, would have translated a C++-like object syntax (classes, member functions, etc) into vanilla C code (structs, free functions, etc).

throw table_exception("(? ???)? ? ???");

This is basic compiler theory. A compiler is a computer program which translates one computer language into another one. So you have a working compiler to convert C code to machine language (this is your C compiler). Then you want to write a compiler to translate your new language into machine language. Now you have at least two choices: Write a compiler to translate from your new language directly to machine code or write a compiler which translates your new language into another language for which you already have a compiler. That's what was done in the first C++ compiler. C++ is a new language which needs to be somehow turned into machine code. C is a very flexible language which allows C++ constructs to be expressed with it, even though the way in which it is possible to express such constructs may require some boilerplate code in C and would make it quite uncomfortable for a C programmer to write all this code by hand.

So if you wrote your C++ program and compiled it with such an "oldschool" C++ compiler, the compiler would read in your program, "think hard" about how to express your program in C and write the generated C program to a file. The generated C program would most probably be quite hard to read for a human, because it has all the management and decoration required to express the C++ features in C in it.

The hash symbol in C and C++ is just an indicator FOR the preprocessor that this line has some meaning, which is defined by the preprocessor. So in reality, before your C (or C++) compiler compiles your program, it will run another program which reads the file, watches for the famous #-symbol and does some text processing before outputting another C (or C++) program which does not contain any #'es anymore. Now your C (or C++) file has been preprocessed and may be compiled by the real C (or C++ ;-)) compiler without any further fiddling. For example, if you write #include "xyz.h" in your C file, the preprocessor will replace the #include line with the contents of the xyz.h file, which's contents will be preprocesses as well and so on.

So you can actually compare the C preprocessor to the old C++ compiler: Both take a file written in one language (C with preprocessor directives OR a C++ program) and write a file in another language (plain C). The only difference is, that the C++-compiler has a much more difficult job to do than the C proprocessor has to.

I was recently reading about how the first C++ compiler was CFront which basically translated the C++ code in to C code and then compiled it. Well, I want to know how it translated classes and other new features in C++ which were not there in C.

As the earlier posts point out, the features WERE there in C, they just took extra work to complete. Both languages are equally powerful. It is just a little easier to do certain things in one language than the other.

Don't take that to mean the languages are the same. There is a lot of overlap, but most complicated C programs will not compile in a C++ compiler, nor will most C++ applications compile in a C compiler. There are ways to make them work together, but the languages diverged from each other in the 1980s.


In fact, how were/are new features added to a programming language in the first place?
[/quote]
People see something in the language that is missing or is difficult. They work with the compiler makers to get a non-standard extension added to the compiler. When the standards committee meets, they look at extensions that have been added to multiple compilers and make them standards. They also look for commonly used libraries and incorporate the most useful commonly used functionality into the standard libraries.


I saw something about preprocessors. Well, what are they exactly and how do you write them?
EDIT: Also, isn't the preprocesser in C/C++ the hash symbol (#) ?
[/quote]
In C and C++, preprocessing is one phase of compiling.

Certain symbols and commands get run before main processing. Generally yes, they start with the hash symbol.

For example, #include tells the preprocessor to bring in another file. #define causes a search-and-replace effect. There are other string and character manipulation functions, and functions that affect the main compiler phase by adjusting packing formats and error messages and much more. The preprocessor provides a full language by itself and preprocessing commands can become quite involved, just like any program.
Thanks for the replies, they have cleared some things up.

So basically, if I wanted to, I could download (for example) the source code for GCC and modify it to add a new feature in to C++?
Also, this simple Wikipedia article says that to write machine code (which is what compilers do) you would need an assembly language, a hex editor or a high-level programming language. Is this correct? If so, which high-level programming language is used?
What's This?: basically, it's my blog. Click on it.
Let us take a hypothetical example of the "first" version of C++. Perhaps it added classes, and a class could have simple member functions.

Here is a relatively trivial example of a first program in our imaginary C++ 0.1:

// foo.hpp
#ifndef FOO_HPP
#define FOO_HPP

class Foo
{
int frobnicate(int);
int bar;
};

#endif

// foo.cpp
#include "foo.hpp"

int Foo::frobnicate(int x) {
bar += x;
return bar / 2;
}

// main.cpp
#include "foo.hpp"

int main() {
Foo foo = { 42 };
return foo.frobnicate(13);
}


Here is what the first CFront compiler might* have output:

// foo.h
#ifndef FOO_H
#define FOO_H

struct Foo
{
int bar;
};

int __cfront_Foo_frobnicate(struct Foo *, int );

#endif

// foo.c
#include "foo.h"

void __cfront_Foo_frobnicate(struct Foo *__cfront_this, int x) {
__cfront_this->bar *= x;
return __cfront_this->bar / 2;
}

// main.c
#include "foo.h"

int main() {
struct Foo foo = { 42 };
return __cfront_Foo_frobnicate(&foo, 13);
}

Something like that. Adding more advanced features like virtual functions would generate more and more code.

* If Bjarne had a time machine so he was compiling it into modern C.
OK, that makes even more sense then it did before. I also realise what it meant when an article I was reading said that the C-code Cfront produced was harder to read for humans.
What's This?: basically, it's my blog. Click on it.

Thanks for the replies, they have cleared some things up.

So basically, if I wanted to, I could download (for example) the source code for GCC and modify it to add a new feature in to C++?
Also, this simple Wikipedia article says that to write machine code (which is what compilers do) you would need an assembly language, a hex editor or a high-level programming language. Is this correct? If so, which high-level programming language is used?

Yes, people make changes to GCC all the time to add new extensions. Most of the newer standard C and C++ extensions were first tested by adding them to GCC and then presenting them to the ISO standards bodies.

GCC itself is wriiten mostly in C, and now is using C++. It emits the machine language as binary data in a special format specific to the machine it's compiling for (eg. Extensible Link Format (ELF) on Linux, Common Object File Format (COFF) on Windows).

Stephen M. Webb
Professional Free Software Developer


So basically, if I wanted to, I could download (for example) the source code for GCC and modify it to add a new feature in to C++?

Yes, but you wouldn't even have to go that far. You could just add your own pre-processing step, and catch special symbols or words and replace them with C++ code before you pass the file to GCC.

Suppose (for whatever reason), you wanted to allow functions to return multiple return results instead of just one.

You could have code like this:
[int, float] MyFunction(stuff)
{
//Do stuff...

return[0] myInt;
return[1] myFloat;
}

int main()
{
MyFunction(stuff)
int myInt = MyFunction[0];
float myFloat = MyFunction[1];

return 0;
}


The generated C++ might look like this: (after your pre-processor processes it, and before it hands it to the C++ compiler).
//[int, float] MyFunction(stuff)
int MyFunction_return_0;
float MyFunction_return_1;

//[int, float] MyFunction(stuff)
void MyFunction(stuff)
{
//Do stuff...


//return[0] myInt;
MyFunction_return_0 = myInt;

//return[1] myFloat;
MyFunction_return_1 = myFloat;
}

int main()
{
MyFunction(stuff)

//int myInt = MyFunction[0];
myInt = MyFunction_return_0;

//float myFloat = MyFunction[1];
myFloat = MyFunction_return_1;

return 0;
}


Before the C++ compiler gets it, your pre-processor program could run, read the file, and replace the extended code with normal C++, and then send the code on to GCC for compiling into assembly. If you were using CFront (in theory), you'd be going from "Marked up C++" -> process -> "C++" -> process -> "C" -> proccess -> "Assembly".

Qt adds extensions (signals and slots, for example) to the C++ language using such a method. This is called a "Build step". Your build might go through a number of steps before the final output is generated. C++ itself goes through several steps (first processing the C preprocessor (#) language, then compiling to object code, then linking, then compiling to assembly, if I recall correctly (the actual translation to C language no longer occurs)).

You can have GCC even output some files showing the intermediate steps if you want to view them. The object code stage outputs .o files, but by passing certain parameters to GCC, you can also see the files before they get compiled to object code, but after they get pre-proccessed into one large file.
Go ahead and add -save-temps to your GCC call, and then go view the .i files it generates next to your .cpps (the .o and .s files probably won't be legible, but the .i files will be).

then linking, then compiling to assembly


It's the other way round. The linker is the one responsible for "linking together" (whence the name) fully compiled source files. This step is (at least in C derived languages like C++) not part of the compiler but a separate step called by the build system.

This topic is closed to new replies.

Advertisement