Jump to content

  • Log In with Google      Sign In   
  • Create Account

D Bits

Binding D to C Part Five

Posted by , in D, Binding to C 31 December 2012 - - - - - - · 2,806 views

This is the fifth and final part of a series on creating bindings to C libraries for the D Programming Language.

In part one, I introduced the difference between dynamic and static bindings and some of the things to consider when choosing which kind to implement. In part two, I talked about the different linkage attributes to be aware of when declaring external C functions in D. In part three, I showed how to translate C types to D. In part four, I wrapped up the discussion of type translation with a look at structs. Here in part five, I'm going to revisit the static vs. dynamic binding issue, this time looking at implementation differences.


Back in part one, I gave this definition of static vs. dynamic bindings.

By static, I mean a binding that allows you to link with C libraries or object files directly. By dynamic, I mean a binding that does not allow linking, but instead loads a shared library (DLL/so/dylib/framework) at runtime.

Those two unfortunate sentences did not clearly express my meaning. Only one person demonstrated a misunderstanding of the above in a comment on the post, but in the intervening 12 months I've found a few more people using the two terms in the wrong way. Before explaining more clearly what it is I actually mean, let me make emphatically clear what I don't.

When I talk of a static binding, I am not referring to static linking. While the two terms are loosely related, they are not the same at all. A static binding can certainly be used to link with static libraries, but, and this is what I failed to express clearly in part one, they can also be linked with dynamic libraries at compile time. In the C or C++ world, it is quite common when using shared libraries to link them at compile time. On Windows, this is done via an import library. Given an application "Foo" that makes use of the DLL "Bar", when "Foo" is compiled it will be linked with an import library named Bar.lib. This will cause the DLL to be loaded automatically by the operating system when the application is executed. The same thing can be accomplished on Posix systems by linking directly with the shared object file (extending the example, that would be libBar.so in this case). So with a static binding in D, a program can be linked at compile time with the static library Bar.lib (Windows) or libBar.a (Posix) for static linkage, or the import library Bar.lib (Windows) or libBar.so (Posix) for dynamic linkage.

A dynamic binding can not be linked to anything at compile time. No static libraries, no import libraries, no shared objects. It is designed explicitly for loading a shared library manually at run time. In the C and C++ world, this technique is often used to implement plugin systems, or to implement hot swapping of different application subsystems (for example, switching between an OpenGL and D3D renderer) among other things. The approach used here is to declare exported shared library symbols as pointers, call into the OS API for loading shared libraries, then manually extract the exported symbols and assign them to the pointers. This is exactly what a dynamic binding does. It sacrifices the convenience of letting the OS load the shared library for more control over when and what is loaded.

So to reiterate, a static binding can be used with either static libraries or shared libraries that are linked at compile time. Both cases are subject to the issue with object file formats that I outlined in part one (though the very-soon-to-be-released DMD 2.061 alleviates this a good deal, as the 64-bit version knows how to work with the Visual Studio linker). A dynamic binding cannot be linked to the bound library at compile time, but must provide a mechanism to manually load the library at run time.

Now that I've hopefully gotten that point across, it's time to examine the only difference in implementing the two types of bindings. In part two, I foreshadowed this discussion with the following.

Although I'm not going to specifically talk about static bindings in this post, the following examples use function declarations as you would in a static binding. For dynamic bindings, you'll use function pointers instead.

Static Bindings

In D, we generally do not have to declare a function before using it. The implementation is the declaration. And it doesn't matter if it's declared before or after the point at which its called. As long as it is in the currently visible namespace, it's callable. However, when linking with a C library, we don't have access to any function implementations (nor, actually, to the declarations--hence the binding). They are external to the application. In order to call into that library, the D compiler needs to be made aware of the existence of the functions that need to be called so that, at link time, it can match up the proper address offsets to make the call. This is the only case I can think of in D where a function declaration isn't just useful, but required.

I explained linkage attributes in part two. The examples I gave there, coupled with the details in part three regarding type translation, are all you need to know to implement a function declaration for a static D binding to a C library. But I'll give an example anyway.
// In C, foo.h
extern int foo(float f);
extern void bar(void);

// In D
extern( C )

    int foo(float);

    void bar();
Please be sure to read parts 2 - 4 completely. With all of that, and the info in this post up to this point, you have almost everything you need to know to implement a static binding in D, barring any corner cases that I've failed to consider or am yet to encounter myself. Oh, and function pointers. But that's coming in the next section.

Interlude -- Function Pointers

I could have covered function pointers in part three. After all, it isn't uncommon to encounter C libraries that use function pointers as typedefed callbacks (or declared inline in a function parameter list), or as struct members. And there's no difference in declaring function pointers for callbacks or for loading function symbols from a shared library. The syntax is identical. But, there are some issues that need to be considered in the different situations. And it's important to understand this before we get into dynamic bindings.

Let's look first at the syntax for declaring a function pointer in D.

int function() MyFuncPtr;
Very simple: return type->function keyword->parameter list->function pointer name. Though it's possible to use MyFuncPtr directly, it's often convenient to declare an alias:

alias int function() da_MyFuncPtr;
da_MyFuncPtr MyFuncPtr;
What's the difference? Let's see.

int foo(int i)
  return i;

void main()
    int function(int) fooPtr;
    fooPtr = &foo;

    alias int function(int) da_fooPtr;
    da_fooPtr fooPtr2 = &foo;

    import std.stdio;
There is none! At least, not on the surface. I'll get into that later. Let's look at another example. Translating a C callback into D.

// In C, foo.h
typedef int (*MyCallback)(void);
// In D
extern( C ) alias int function() MyCallback;
Notice that I used the alias form here. Anytime you declare a typedefed C function pointer in D, it should be aliased so that it can be used the same way. Finally, the case of function pointers declared inline in a paramter list.

// In C, foo.h
extern void foo(int (*BarPtr)(int));

// In D.
// Option 1
extern( C ) void foo(int function(int) BarPtr);

// Option 2
extern( C ) alias int function(int) BarPtr;
extern( C ) void foo(BarPtr);
Personally, I prefer option 2. Also, I generally prefer to use extern blocks to include multiple declarations so that I don't have to type extern( C ) or extern(System) all the time (as I did in the previous example).

Now that the function pointer intro is out of the way, it's time to look at dynamic bindings.

Dynamic Bindings

At this point, I would very much like to say that you know everything you need to know about dynamic bindings. But that would be untrue. As it turns out, simply declaring function pointers is not enough. There are two issues to take into consideration. The first is function pointer initialization.

In one of the examples above (fooPtr), I showed how a function pointer can be declared and initialized. But in that example, it is obvious to the compiler that the function foo and the pointer fooPtr have the same basic signature (return type and parameter list). Now consider this example.

// This is all D.
int foo() { return 1; }

void* getPtr() { return cast(void*)&foo; }

void main()

    int function() fooPtr;

    fooPtr = getPtr();
Try to compile this and you'll see something like:

fptr.d(10): Error: cannot implicitly convert expression (getPtr()) of type void* to int function()

Now, obviously this is a contrived example. But I'm mimicking what a dynamic binding has to go through. OS API calls (like GetProcAddress or dlsym) return function pointers of void* type. So this is exactly the sort of error you will encounter if you try to directly assign the return value to a function pointer declared in this manner.

The first solution that might come to mind is to go ahead and insert an explicit cast. So, let's see what that gets us.

fooPtr = cast(fooPtr)getPtr();
The error here might be obvious to an experienced coder, but certainly not to most. I'll let the compiler explain.

fptr.d(10): Error: fooPtr is used as a type

Exactly. fooPtr is not a type, it's a variable. This is akin to declaring int i = cast(i)x; You can't do that. So the next obvious solution might be to use an aliased function pointer declaration. Then it can be used as a type. And that is, indeed, one possible solution (and, for reasons I'll explain below, the best one).

alias int function() da_fooPtr;
da_fooPtr fooPtr = cast(da_fooPtr)getPtr();
And this compiles. For the record, the 'da_' prefix is something I always use with function pointer aliases. It means 'D alias'. You can do as you please.

I implied above that there was more than one possible solution. Here's the second one.

int foo() { return 1; }void* getPtr() { return cast(void*)&foo; }void bindFunc(void** func) { *func = getPtr(); }void main()
    int function() fooPtr;
Here, the address of fooPtr is being taken (giving us, essentially, a foo**) and cast to void**. Then bind func is able to dereference the pointer and assign it the void* value without a cast. When I first implemented Derelict, I used the alias approach. In Derelict 2, Tomasz Stachowiak implemented a new loader using the void** technique. That worked well. And, as a bonus, it eliminated a great many alias declarations from the codebase. Until something happened that, while a good thing for many users of D on Linux, turned out to be a big headache for me.

For several years, DMD did not provide a stack trace when exceptions were thrown. Then, some time ago, a release was made that implemented stack traces on Linux. The downside was that it was done in a way that broke Derelict 2 completely on that platform. To make a long story short, the DMD configuration files were preconfigured to export all symbols when compiling any binaries, be they shared objects or executables. Without this, the stack trace implementation wouldn't work. This caused every function pointer in Derelict to clash with every function exported by the bound libraries. In other words. the function pointer glClear in Derelict 2 suddenly started to conflict with the actual glClear function in the shared library, even though the library was loaded manually (which, given my Windows background, makes absolutely no sense to me whatsoever). So, I had to go back to the aliased function pointers. Aliased function pointers and variables declared of their type aren't exported. If you are going to make a publicly available dynamic binding, this is something you definitely need to keep in mind.

I still use the void** style to load function pointers, despite having switched back to aliases. It was less work than converting everything to a direct load. And when I implemented Derelict 3, I kept it that way. So if you look at the Derelict loaders...

// Instead of seeing this
foo = cast(da_Foo)getSymbol("foo");

// You'll see this
foo = bindFunc(cast(void**)&foo, "foo");
I don't particularly advocate one over the other when implementing a binding with the aid of a script. But if you're doing it by hand, the latter is much more amenable to quick copy-pasting.

There's one more important issue to discuss. Given that a dynamic binding uses function pointers, the pointers are subject to D's rules for variable storage. And by default, all variables in D are stashed in Thread-Local Storage. What that means is that, by default, each thread gets its own copy of the variable. So if a binding just blindly declares function pointers, then they are loaded in one thread and called in another... boom! Thankfully, D's function pointers are default initialized to null, so all you get is an access violation and not a call into random memory somewhere. The solution here is to let D know that the function pointers need to be shared across all threads. We can do that using one of two keywords: shared or __gshared.

One of the goals of D is to make concurrency easier than it traditionally has been in C-like languages. The shared type qualifier is intended to work toward that goal. When using it, you are telling the compiler that a particular variable is intended to be used across threads. The compiler can then complain if you try to access it in a way that isn't thread-safe. But like D's immutable and const , shared is transitive. That means if you follow any references from a shared object, they must also be shared. There are a number of issues that have yet to be worked out, so it hasn't seen a lot of practical usage that I'm aware of. And that's where __gshared comes in.

When you tell the compiler that a piece of data is __gshared, you are saying, "Hey, Mr. Compiler, I want to share this data across threads, but I don't want you to pay any attention to how I use it, mmkay?" Essentially, it's no different from a normal variable in C or C++. If you want to share a __gshared variable across threads, it's your responsibility to make sure it's properly synchronized. The compiler isn't going to help you.

So when implementing a dynamic binding, a decision has to be made: thread-local (default), shared, or __gshared. My answer is __gshared. If we pretend that our function pointers are actual functions, which are accessible across threads anyway, then there isn't too much to worry about. Care still need be taken to ensure that the functions are loaded before any other threads try to access them and that no threads try to access them after the bound library is unloaded. In Derelict, I do this with static module constructors and destructors (which can still lead to some issues during program shutdown, but I'll cover that in a separate post). Here's an example.

extern( C )
    alias void function(int) da_foo;
    alias int function() da_bar;

    da_foo foo;
    da_bar bar;
Finally, there's the question of how to load the library. That, I'm afraid, is an exercise for the reader. In Derelict, I implemented a utility package (DerelictUtil) that abstracts the platform APIs for loading shared libraries and fetching their symbols. The abstraction is behind a set of free functions that can be used directly or via a convenient object interface. In Derelict itself, I use the latter since it makes managing loading an entire library easier. But in external projects, I often use the free-function interface for loading one or two functions at a time (such as certain Win32 functions that aren't available in the ancient libs shipped with DMD). It also supports selective loading, which is a term I use for being able to load a library if specific functions are missing (the default behavior is to throw an exception when an expected symbol fails to load).


Overall, there's a good deal of work involved in implementing any sort of binding in D. But I think it's obvious that dynamic bindings require quite some extra effort. This is especially true given that the automated tools I've seen so far are all geared toward generating static bindings. I've only recently begun to use custom scripts myself, but they still require a bit of manual preparation because I don't want to deal with a full-on C parser. That said, I prefer dynamic bindings myself. I like having the ability to load and unload at will and to have the opportunity to present my own error message to the user when a library is missing. Others disagree with me and prefer to use static bindings. That's perfectly fine.

At this point, static and dynamic bindings exist for several popular libraries already. Deimos is a collection of the former and Derelict 3 the latter. You'll find some bindings for the same library in both and several that are in one project but not the other. Use what you need and are comfortable with. And I hope that, if the need arises, you can use the advice I've laid out in this series of posts to help fill in the holes and develop static or dynamic bindings yourself.

Given that I'm just under 4 hours from 2013 as I write this in Seoul, Korea, I want to wish you a Happy New Year. May you start and finish multiple projects in the coming year!

Fun with GCC (or A Lesson Relearned)

Posted by , 22 December 2012 - - - - - - · 1,754 views

This story has nothing really to do with D except peripherally, but it's a tale worth telling as a warning to others.This is the best place for me to tell it.

I had a collection of C code that I'd built up over the years. I suppose I still have it, but it's sitting on the hard drive of a closeted dormant computer that I don't want to bother setting up. Besides, I'm not really happy with that bit of code anymore. My style has evolved over the years and I don't write C in quite the same way as I used to. Plus, I've learned a thing or two since I started putting all of that together. Using it again without rewriting it would just bug me too much. So recently, I set out to start over with it.

In a nutshell, I'm putting together a package of C libraries that I can use for different sorts of apps. Granted, most of my hobby coding is in D nowadays, but I don't want to lose my skill at C. It took me too many years to get to where I am to just give it up completely (I did toy with doing my rewrite in C++, but gave up on that rather quickly -- D has me too spoiled to touch C++ anymore). It's all organized rather neatly under a self-contained directory tree. These days, I prefer to avoid IDEs and work from the command line and a text editor (a licensed version of SublimeText 2 is my editor of choice), so I need a good tool to manage my build process. I went with premake4.

I've been using premake for quite a while without any difficulty, so it was only natural to use it again for the rewrite. There are different ways to use premake, one master config file for every project in the "workspace" or a separate config file for each project (my preference). Both have their pros and cons, but in the end the way I want to set up my source tree made me realize early on that I'm going to hit the cons no matter which approach I take. Nothing major, mind you, but when working from the command line every bit of convenience counts. I did make an attempt to restructure things and combine my multiple premake files into one master script, but I managed to run afoul of GCC's pickiness with the order in which libraries are specified on the command line (see below). I realized I was either going to have to make the config script even more complex with some custom Lua, deal with the annoyances of the multiple premake script approach, or do something different.

For my D projects these days, I use a custom build script for compilation, written in D. It was simple to put together and I can copy it from project to project with only minor modifications. And it's dead easy to maintain across projects. I've actually got several different versions of it, as it improves with each new project I create. I realized it wouldn't take too much to knock up a version that could compile my C projects. So that's what I did. Less than 10 minutes after I opened the file I had it automatically pulling in C files and compiling libraries just fine. Then it came time for the executables.

If you've ever used gcc before, you've likely encountered a situation when you were getting "undefined references" all over the place even though you are 100% certain you specified the correct library in the linker settings of your IDE and that you configured the library path properly. After consulting Google, you will have learned that gcc takes its list of libraries and processes them in the reverse of the order in which you passed them along. So if library B depends on library A, you have to pass them in this order: "gcc -lB -lA...". Doing it in the reverse will cause linker errors. This is a lesson I learned long ago, so I rarely run into that sort of error anymore. But, as it turns out, I didn't understand the issue as well as I'd thought.

In my build script, I have separate functions to compile, archive (create a static lib) and link (create an executable). All source file names are pulled from a directory, appended to an array, and passed to the compile function for individual compilation. If the project currently being processed is a library, then the file name extensions are changed from ".c" to ".o", all of the names globbed together into one string, and passed along to the archive function where the library is created. If the project is an executable, a list of required libraries is handed off to the link function along with the object file names for the executable to be created. It was here that things broke down. More undefined references.

This one really had me stumped. The premake build system had been working fine until I combined the configurations into one, resulting in undefined references when compiling. Using the verbose option showed me that the libraries were not being passed in the correct order. But with my D build script, I was certain they were properly set up. The verbose option confirmed it. So what the hell was going on?

Google, unfortunately, showed me nothing. I was at a total dead end. I kept staring at my script, changing things here and there randomly. Staring at the directory tree, moving things around. Sitting in my chair and doing nothing but thinking and cursing. After over an hour of mounting frustration, a thought suddenly popped into my head. A small adjustment to my build script and compilation was successful.

The problem that bit me was that any object files you pass to gcc during the link step need to be specified on the command line before the libraries on which they depend. Given that the libraries are just collections of objects, that makes perfect sense. I'd just never known it or had to care about it before. The string I was sending to the OS originally looked something like "gcc -lfoo -lbar baz.o buz.o...", whereas it should have been "gcc baz.o buz.o -lfoo -lbar...". It doesn't matter in which order the object files are specified, they just need to appear in the list before the libraries. How many years now have I been using gcc?

So that's how what should have been a less-than-20-minute side project turned into a nearly hour-and-a-half time sink. If you are going to work on the command line, it pays to know your compiler inside and out.

TicTacToe and Modules in D

Posted by , in D, Projects 02 December 2012 - - - - - - · 2,022 views

Given that my BorderWatch project has languished on github for months without any updates beyond the first few days of random hacking, it's going to be a while before it can serve as an example of game programming in D. So I had some free time recently and decided to do something different. I put together a simple TicTacToe game, that I call T3, and put the source up on github.

T3 is not a complete game. When it starts, you can play two players, one with the mouse and one on the keypad. You can press space after each game to clear the board, and Esc to exit the game at any time. That's all it is or, in the master branch at least, will ever be. It shows some very basic features of D and I hope can be used as a starting point for people new to D to play around with. Maybe by adding a menu screen and a UI, some AI, networking... whatever.

I've never programmed a TicTacToe game before and didn't consult any source for this one, so don't be surprised if you see something weird. However, there are a couple of architectural points that I want to highlight.

One of the features of D whose usefulness can be overlooked is that of the module. In C++, it is quite common for each class to have its own source file, sometimes even multiple files. And the declaration and implementation are often separated between header and source. C++ classes that are "friends" can have access to each other's internals. It is even more common in Java to see one class per file (and without the declaration/implementation divide), but without the help of the friend keyword. In D, it's much more useful to think in terms not of files or classes, but of modules.

Modules are designed to be used as something analogous to a C++ namespace. You group common functionality into a common module, the difference being that one module equates to one file. It was quite natural for me to start thinking in terms of modules (to an extent... see below), as that's what I have always done when programming with C -- individual C source files are often referred to as modules, and a common C idiom is to group common functionality in a single file. Others may not have such an easy time with it.

So as a result of this focus on modules, you find features that might be surprising. For example, you can have protection attributes (such as public, package and private) at the module level. Modules can have their own (static) constructors. But one module feature that really trips people up is that private class or struct members are visible throughout the entire module in which they are declared.

Take, for example, the tt.game module in T3. The abstract Player class declares a private member, _mark (on line 173 as of this writing). Scroll down a bit to the Game class and you'll see that it accesses this member directly. On first blush, that appears to be a severe violation of the encapsulation principle of OOP. But if you think about it, it really isn't. In this case, the game class needs to perform gets and sets for a player's mark. I could have used D's @property attribute to provide a convenient getter/setter pair in the player class, but that would be rather pointless given that both classes are in the same module. One of the oft-cited reasons (and a good one) for encapsulation is that it can hide changes to an implementation. But here, anyone changing the implementation, maybe by adding some sort of calculation every time _mark is accessed, will always have access to the game class because they are in the same module. Make the changes, get compiler errors, search/replace in the same module to fix them, done. If the outside world needed to have access to Player's _mark, then property accessors would be the right thing (especially if it were in a library).

The converse is true, too. In C++, I rarely, if ever, had variables in a source file that were declared outside of a class, static to the module, but accessed by that class. In D, I do it all the time. An example of that is seen in tt.main, where the module-private _game instance is used by the HumanPlayer class and also by the module-level functions below it. These days, on the rare occasions when I toy around with C++, I often find myself declaring certain variables in an anonymous namespace to be shared by two or more classes in the same file.

Another consequence of thinking in modules is that free functions become less of an issue. In earlier posts on this blog, I had a dilemma over whether or not to use free functions for a library I was wanting to develop. This is the one issue I had with fully embracing the module concept. In C, this problem just doesn't manifest because you're always dealing with free functions. But mixing free functions with objects just feels... dirty. Plus there's the potential for name clashes, and no one likes the this_is_a_function syntax common in C. In C++, there is an easy solution: wrap the free functions in a namespace. There are different ways to handle this in D. The most straightforward is just to not worry about namespace conflicts. When they do arise, the fully-qualified package name can be prefixed to the function name at the call site. I mostly accept this now, but as you can see in tt.gfx and tt.audio, I still add a prefix to free functions whose names I am certain will have a conflict (gfxInit and audioInit, for example). I have no practical reason for avoiding typing tt.gfx.init() at the call site. It's just a personal quirk.

If you do decide to do something with the code, please make sure to give the README a once-over. I would like to highlight the following paragraph.

Pull requests with new game features will likely not be accepted. I would like to keep this simple and useful as a toy for new D users. I might very well create a branch with multiplayer and AI for myself to play around with, but I really want to keep the master as-is. However, I'll happily accept pull requests for bug fixes and improvements to the build script (gdc & ldc support, for example).

This code is just a playground for anyone looking to experiment with D, so while I would love to see people do cool stuff with it, I'd rather not put any of that cool stuff into the master branch.

As an aside, I just realized that I didn't add a license to the repository. I'll take care of that pronto. For the record, I'm releasing this as public domain, with the exception of the Derelict bindings in the import directory which are released under the Boost Software License.
I'm overdue for part 5 of my Binding D to C series. I'll be sure to get that at some point over the next couple of weeks. Thanks for reading.

December 2012 »


Recent Entries