Jump to content

  • Log In with Google      Sign In   
  • Create Account

We need your feedback on a survey! Each completed response supports our community and gives you a chance to win a $25 Amazon gift card!


How were new features added to C?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
21 replies to this topic

#1 TheVirtualDragon   Members   -  Reputation: 237

Like
0Likes
Like

Posted 28 August 2012 - 02:26 AM

I was recently reading about how the first C++ compiler was CFront which basically translated the C++ code in to C code and then compiled it. Well, I want to know how it translated classes and other new features in C++ which were not there in C. In fact, how were/are new features added to a programming language in the first place? I saw something about preprocessors. Well, what are they exactly and how do you write them?

I would appreciate it if you could answer in a easy-to-understand sort of way. I am asking here because I have googled this and not understood most of it.

EDIT: Also, isn't the preprocesser in C/C++ the hash symbol (#) ?

Edited by TheVirtualDragon, 28 August 2012 - 02:35 AM.

What's This?: basically, it's my blog. Click on it.

Sponsor:

#2 Ravyne   GDNet+   -  Reputation: 8188

Like
5Likes
Like

Posted 28 August 2012 - 02:41 AM

There's no real magic to it -- most features in C++ can be implmented relatively straight-forward in C, for example, a virtual function in class is not very different than a function pointer within a struct. The syntax to achieve it is different--much more verbose and error-prone in C--but the effect is the same. In this way, C++ is largely just a nicer syntax for doing things you could have done anyways.

Other things are a little more novel, for example the access specifiers public, private and protected don't have a real equivalent in C (or at least a straight-forward one), at least as far as I'm aware. Those things are essentially just new rules for the compiler to follow.


As for preprocessors, all that really means is that it implements some kind of (relatively trivial) source-to-source translation. It sucks in source code with special markers, and spits out source code in the same language. The C pre-processor expands macros and conditionally includes/elides specially marked code segments, for example.

The early C++ compiler, essentially an advanced preprocessor, would have translated a C++-like object syntax (classes, member functions, etc) into vanilla C code (structs, free functions, etc).

#3 rnlf   Members   -  Reputation: 1185

Like
2Likes
Like

Posted 28 August 2012 - 02:52 AM

This is basic compiler theory. A compiler is a computer program which translates one computer language into another one. So you have a working compiler to convert C code to machine language (this is your C compiler). Then you want to write a compiler to translate your new language into machine language. Now you have at least two choices: Write a compiler to translate from your new language directly to machine code or write a compiler which translates your new language into another language for which you already have a compiler. That's what was done in the first C++ compiler. C++ is a new language which needs to be somehow turned into machine code. C is a very flexible language which allows C++ constructs to be expressed with it, even though the way in which it is possible to express such constructs may require some boilerplate code in C and would make it quite uncomfortable for a C programmer to write all this code by hand.

So if you wrote your C++ program and compiled it with such an "oldschool" C++ compiler, the compiler would read in your program, "think hard" about how to express your program in C and write the generated C program to a file. The generated C program would most probably be quite hard to read for a human, because it has all the management and decoration required to express the C++ features in C in it.

The hash symbol in C and C++ is just an indicator FOR the preprocessor that this line has some meaning, which is defined by the preprocessor. So in reality, before your C (or C++) compiler compiles your program, it will run another program which reads the file, watches for the famous #-symbol and does some text processing before outputting another C (or C++) program which does not contain any #'es anymore. Now your C (or C++) file has been preprocessed and may be compiled by the real C (or C++ ;-)) compiler without any further fiddling. For example, if you write #include "xyz.h" in your C file, the preprocessor will replace the #include line with the contents of the xyz.h file, which's contents will be preprocesses as well and so on.

So you can actually compare the C preprocessor to the old C++ compiler: Both take a file written in one language (C with preprocessor directives OR a C++ program) and write a file in another language (plain C). The only difference is, that the C++-compiler has a much more difficult job to do than the C proprocessor has to.

my blog (German)


#4 frob   Moderators   -  Reputation: 22833

Like
1Likes
Like

Posted 28 August 2012 - 04:53 AM

I was recently reading about how the first C++ compiler was CFront which basically translated the C++ code in to C code and then compiled it. Well, I want to know how it translated classes and other new features in C++ which were not there in C.

As the earlier posts point out, the features WERE there in C, they just took extra work to complete. Both languages are equally powerful. It is just a little easier to do certain things in one language than the other.

Don't take that to mean the languages are the same. There is a lot of overlap, but most complicated C programs will not compile in a C++ compiler, nor will most C++ applications compile in a C compiler. There are ways to make them work together, but the languages diverged from each other in the 1980s.

In fact, how were/are new features added to a programming language in the first place?

People see something in the language that is missing or is difficult. They work with the compiler makers to get a non-standard extension added to the compiler. When the standards committee meets, they look at extensions that have been added to multiple compilers and make them standards. They also look for commonly used libraries and incorporate the most useful commonly used functionality into the standard libraries.

I saw something about preprocessors. Well, what are they exactly and how do you write them?
EDIT: Also, isn't the preprocesser in C/C++ the hash symbol (#) ?

In C and C++, preprocessing is one phase of compiling.

Certain symbols and commands get run before main processing. Generally yes, they start with the hash symbol.

For example, #include tells the preprocessor to bring in another file. #define causes a search-and-replace effect. There are other string and character manipulation functions, and functions that affect the main compiler phase by adjusting packing formats and error messages and much more. The preprocessor provides a full language by itself and preprocessing commands can become quite involved, just like any program.

Edited by frob, 28 August 2012 - 04:57 AM.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I write about assorted stuff.


#5 TheVirtualDragon   Members   -  Reputation: 237

Like
0Likes
Like

Posted 28 August 2012 - 05:32 AM

Thanks for the replies, they have cleared some things up.

So basically, if I wanted to, I could download (for example) the source code for GCC and modify it to add a new feature in to C++?
Also, this simple Wikipedia article says that to write machine code (which is what compilers do) you would need an assembly language, a hex editor or a high-level programming language. Is this correct? If so, which high-level programming language is used?
What's This?: basically, it's my blog. Click on it.

#6 rip-off   Moderators   -  Reputation: 8764

Like
4Likes
Like

Posted 28 August 2012 - 05:50 AM

Let us take a hypothetical example of the "first" version of C++. Perhaps it added classes, and a class could have simple member functions.

Here is a relatively trivial example of a first program in our imaginary C++ 0.1:
// foo.hpp
#ifndef FOO_HPP
#define FOO_HPP

class Foo
{
    int frobnicate(int);
    int bar;
};

#endif

// foo.cpp
#include "foo.hpp"

int Foo::frobnicate(int x) {
   bar += x;
   return bar / 2;
}

// main.cpp
#include "foo.hpp"

int main() {
    Foo foo = { 42 };
    return foo.frobnicate(13);
}

Here is what the first CFront compiler might* have output:
// foo.h
#ifndef FOO_H
#define FOO_H

struct Foo
{
    int bar;
};

int __cfront_Foo_frobnicate(struct Foo *, int );

#endif

// foo.c
#include "foo.h"

void __cfront_Foo_frobnicate(struct Foo *__cfront_this, int x) {
   __cfront_this->bar *= x;
   return  __cfront_this->bar / 2;
}

// main.c
#include "foo.h"

int main() {
    struct Foo foo = { 42 };
    return __cfront_Foo_frobnicate(&foo, 13);
}
Something like that. Adding more advanced features like virtual functions would generate more and more code.

* If Bjarne had a time machine so he was compiling it into modern C.

#7 TheVirtualDragon   Members   -  Reputation: 237

Like
0Likes
Like

Posted 28 August 2012 - 06:05 AM

OK, that makes even more sense then it did before. I also realise what it meant when an article I was reading said that the C-code Cfront produced was harder to read for humans.
What's This?: basically, it's my blog. Click on it.

#8 Bregma   Crossbones+   -  Reputation: 5505

Like
1Likes
Like

Posted 28 August 2012 - 06:09 AM

Thanks for the replies, they have cleared some things up.

So basically, if I wanted to, I could download (for example) the source code for GCC and modify it to add a new feature in to C++?
Also, this simple Wikipedia article says that to write machine code (which is what compilers do) you would need an assembly language, a hex editor or a high-level programming language. Is this correct? If so, which high-level programming language is used?

Yes, people make changes to GCC all the time to add new extensions. Most of the newer standard C and C++ extensions were first tested by adding them to GCC and then presenting them to the ISO standards bodies.

GCC itself is wriiten mostly in C, and now is using C++. It emits the machine language as binary data in a special format specific to the machine it's compiling for (eg. Extensible Link Format (ELF) on Linux, Common Object File Format (COFF) on Windows).

Edited by Bregma, 28 August 2012 - 06:10 AM.

Stephen M. Webb
Professional Free Software Developer

#9 Servant of the Lord   Crossbones+   -  Reputation: 21217

Like
1Likes
Like

Posted 28 August 2012 - 12:27 PM

So basically, if I wanted to, I could download (for example) the source code for GCC and modify it to add a new feature in to C++?

Yes, but you wouldn't even have to go that far. You could just add your own pre-processing step, and catch special symbols or words and replace them with C++ code before you pass the file to GCC.

Suppose (for whatever reason), you wanted to allow functions to return multiple return results instead of just one.

You could have code like this:
[int, float] MyFunction(stuff)
{
     //Do stuff...

     return[0] myInt;
     return[1] myFloat;
}

int main()
{
     MyFunction(stuff)
     int myInt = MyFunction[0];
     float myFloat = MyFunction[1];

     return 0;
}

The generated C++ might look like this: (after your pre-processor processes it, and before it hands it to the C++ compiler).
//[int, float] MyFunction(stuff)
int MyFunction_return_0;
float MyFunction_return_1;

//[int, float] MyFunction(stuff)
void MyFunction(stuff)
{
	 //Do stuff...


     //return[0] myInt;
     MyFunction_return_0 = myInt;
     
     //return[1] myFloat;
     MyFunction_return_1 = myFloat;
}

int main()
{
	 MyFunction(stuff)
	
     //int myInt = MyFunction[0];
     myInt = MyFunction_return_0;

	 //float myFloat = MyFunction[1];
     myFloat = MyFunction_return_1;
	
	 return 0;
}

Before the C++ compiler gets it, your pre-processor program could run, read the file, and replace the extended code with normal C++, and then send the code on to GCC for compiling into assembly. If you were using CFront (in theory), you'd be going from "Marked up C++" -> process -> "C++" -> process -> "C" -> proccess -> "Assembly".

Qt adds extensions (signals and slots, for example) to the C++ language using such a method. This is called a "Build step". Your build might go through a number of steps before the final output is generated. C++ itself goes through several steps (first processing the C preprocessor (#) language, then compiling to object code, then linking, then compiling to assembly, if I recall correctly (the actual translation to C language no longer occurs)).

You can have GCC even output some files showing the intermediate steps if you want to view them. The object code stage outputs .o files, but by passing certain parameters to GCC, you can also see the files before they get compiled to object code, but after they get pre-proccessed into one large file.
Go ahead and add -save-temps to your GCC call, and then go view the .i files it generates next to your .cpps (the .o and .s files probably won't be legible, but the .i files will be).
It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]


#10 rnlf   Members   -  Reputation: 1185

Like
1Likes
Like

Posted 28 August 2012 - 01:02 PM

then linking, then compiling to assembly


It's the other way round. The linker is the one responsible for "linking together" (whence the name) fully compiled source files. This step is (at least in C derived languages like C++) not part of the compiler but a separate step called by the build system.

my blog (German)


#11 Servant of the Lord   Crossbones+   -  Reputation: 21217

Like
0Likes
Like

Posted 28 August 2012 - 01:48 PM

It's the other way round. The linker is the one responsible for "linking together" (whence the name) fully compiled source files. This step is (at least in C derived languages like C++) not part of the compiler but a separate step called by the build system.

It's linking together the assembly files? I thought it was linking together the object files, then assembling it. Googling shows you correct, thanks for the insight.
It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]


#12 Aardvajk   Crossbones+   -  Reputation: 6284

Like
1Likes
Like

Posted 28 August 2012 - 04:19 PM


It's the other way round. The linker is the one responsible for "linking together" (whence the name) fully compiled source files. This step is (at least in C derived languages like C++) not part of the compiler but a separate step called by the build system.

It's linking together the assembly files? I thought it was linking together the object files, then assembling it. Googling shows you correct, thanks for the insight.


Unless you ask the compiler to do so, it is unlikely to have assembly involved at all. The compiler builds object files, which are machine code along with additional information about their contents. The linker takes these and builds an exe, lib, DLL, whatever which is also machine code.

If assembly is involved, it comes between the compiler and the production of the object files, but there is no particular reason for a compiler to compile via assembly unless a human wishes to inspect it.

#13 Lightness1024   Members   -  Reputation: 739

Like
2Likes
Like

Posted 28 August 2012 - 05:30 PM

a little detour into Lex and Yacc , the historical tools to help dummies write compilers, instead of modifying gcc , it is less difficult to start with those.

#14 TheVirtualDragon   Members   -  Reputation: 237

Like
0Likes
Like

Posted 29 August 2012 - 06:20 AM

Thanks for the suggestion Lightness1024, although Lex and Yacc appear outdated and Flex and Bison are the suggested alternative. This has turned out to actually be an interesting topic.
What's This?: basically, it's my blog. Click on it.

#15 swiftcoder   Senior Moderators   -  Reputation: 10450

Like
0Likes
Like

Posted 29 August 2012 - 07:19 AM

It's also worth mentioning that modern compilers actually work quite similarly (at least conceptually) to those early C++ compilers: most compilers still target some sort of intermediate language that provides a level of abstraction over the machine code.

The toolkit de jour is LLVM, which provides an assembly-like intermediate language, and a backend compiler that turns LLVM IR into machine code for a variety of different platforms (see: clang, llvm-gcc).

But mainline GCC also uses a pair of related intermediate languages (called GENERIC and GIMPLE). Mono/.NET provide the 'Common Intermediate Language', to which C#, VB.net and various other languages are compiled. Java has the JVM, Perl has Parrot, and so forth...

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#16 TheVirtualDragon   Members   -  Reputation: 237

Like
0Likes
Like

Posted 29 August 2012 - 10:47 AM

So if I wanted to create a small language, like Squirrell (but one that has a compiler instead of an interpreter), what would be my best option:
a) Modify the gcc source to match my needs,
b) Write my own compiler using Flex and Bison or c) Write my own compiler from scratch

Basically, what would be the easiest option and take the least time? (Sorry if it is getting too offtopic)
What's This?: basically, it's my blog. Click on it.

#17 rnlf   Members   -  Reputation: 1185

Like
0Likes
Like

Posted 29 August 2012 - 10:54 AM

Understanding the inner workings of a monster-sized project like GCC will likely take you as long as it will take you to write your own compiler using some existing parser generator tool or library.

But honestly, the kind of questions you are asking leads me to believe that you may not yet be ready for writing a compiler. There's much more to it than just reading a source file and spitting out object code. You will have to learn about formal language theory, semantic analysis, code generation (with or without intermediate assembler steps: you will need a strong understanding of your target processor's assembly language and with the processor as a whole), object and executable file formats, optimization, operating system APIs and about everything there is to know about common implementing techniques for your language features.

If this does not scare you, go ahead. You will learn a hell of a lot by doing so. But don't expect it to be an easy task and don't expect to finish it anytime soon. If you don't have any knowledge in these topics, expect it to take you years, rather than months.

Edited by rnlf, 29 August 2012 - 10:57 AM.

my blog (German)


#18 TheVirtualDragon   Members   -  Reputation: 237

Like
0Likes
Like

Posted 29 August 2012 - 11:22 AM

I know I am not ready for this kind of thing; which is why I am making a game instead and why I am here on GameDev. I was thinking ahead for what to do when I am old and bored. :D
What's This?: basically, it's my blog. Click on it.

#19 swiftcoder   Senior Moderators   -  Reputation: 10450

Like
0Likes
Like

Posted 29 August 2012 - 01:25 PM

But honestly, the kind of questions you are asking leads me to believe that you may not yet be ready for writing a compiler. There's much more to it than just reading a source file and spitting out object code. You will have to learn about formal language theory, semantic analysis, code generation (with or without intermediate assembler steps: you will need a strong understanding of your target processor's assembly language and with the processor as a whole), object and executable file formats, optimization, operating system APIs and about everything there is to know about common implementing techniques for your language features.

I'm not sure that's actually true (disclaimer: I have a fair background in all those areas). It is definitely helpful to learn the formal stuff (graduate compiler theory courses are great), but eventually you need to dive in and actually write a compiler - and I don't think it matters if you start with that instead.

The majority of simple compilers are basically a filter to transform from source language A to target language B. Provided you pick a suitably featured language for B (i.e. something like LLVM IR, or CLI), the transform need not be anymore complicated than a python script using regular expressions (for example), and it can be a great learning experience.

I highly recommend diving into lexing/parsing (preferably with an easy framework like ANTLR), and gradually working your way up to a full-fledged compiler. It'll be a great learning exercise, if nothing else.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#20 rnlf   Members   -  Reputation: 1185

Like
0Likes
Like

Posted 30 August 2012 - 01:16 AM

Yes, you're right. It may be as easy as doing some text processing. But I don't think that's what he really wants. If I hear someone talk about compiler programming, I immediately think about a "real" compiler instead of merely a preprocessor. But all in all you're right.

my blog (German)





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS