Jump to content

  • Log In with Google      Sign In   
  • Create Account

D Bits

Binding D to C Part Five

Posted by , in D, Binding to C 31 December 2012 - - - - - - · 2,849 views

This is the fifth and final part of a series on creating bindings to C libraries for the D Programming Language.

In part one, I introduced the difference between dynamic and static bindings and some of the things to consider when choosing which kind to implement. In part two, I talked about the different linkage attributes to be aware of when declaring external C functions in D. In part three, I showed how to translate C types to D. In part four, I wrapped up the discussion of type translation with a look at structs. Here in part five, I'm going to revisit the static vs. dynamic binding issue, this time looking at implementation differences.

Terminology

Back in part one, I gave this definition of static vs. dynamic bindings.

By static, I mean a binding that allows you to link with C libraries or object files directly. By dynamic, I mean a binding that does not allow linking, but instead loads a shared library (DLL/so/dylib/framework) at runtime.

Those two unfortunate sentences did not clearly express my meaning. Only one person demonstrated a misunderstanding of the above in a comment on the post, but in the intervening 12 months I've found a few more people using the two terms in the wrong way. Before explaining more clearly what it is I actually mean, let me make emphatically clear what I don't.

When I talk of a static binding, I am not referring to static linking. While the two terms are loosely related, they are not the same at all. A static binding can certainly be used to link with static libraries, but, and this is what I failed to express clearly in part one, they can also be linked with dynamic libraries at compile time. In the C or C++ world, it is quite common when using shared libraries to link them at compile time. On Windows, this is done via an import library. Given an application "Foo" that makes use of the DLL "Bar", when "Foo" is compiled it will be linked with an import library named Bar.lib. This will cause the DLL to be loaded automatically by the operating system when the application is executed. The same thing can be accomplished on Posix systems by linking directly with the shared object file (extending the example, that would be libBar.so in this case). So with a static binding in D, a program can be linked at compile time with the static library Bar.lib (Windows) or libBar.a (Posix) for static linkage, or the import library Bar.lib (Windows) or libBar.so (Posix) for dynamic linkage.

A dynamic binding can not be linked to anything at compile time. No static libraries, no import libraries, no shared objects. It is designed explicitly for loading a shared library manually at run time. In the C and C++ world, this technique is often used to implement plugin systems, or to implement hot swapping of different application subsystems (for example, switching between an OpenGL and D3D renderer) among other things. The approach used here is to declare exported shared library symbols as pointers, call into the OS API for loading shared libraries, then manually extract the exported symbols and assign them to the pointers. This is exactly what a dynamic binding does. It sacrifices the convenience of letting the OS load the shared library for more control over when and what is loaded.

So to reiterate, a static binding can be used with either static libraries or shared libraries that are linked at compile time. Both cases are subject to the issue with object file formats that I outlined in part one (though the very-soon-to-be-released DMD 2.061 alleviates this a good deal, as the 64-bit version knows how to work with the Visual Studio linker). A dynamic binding cannot be linked to the bound library at compile time, but must provide a mechanism to manually load the library at run time.

Now that I've hopefully gotten that point across, it's time to examine the only difference in implementing the two types of bindings. In part two, I foreshadowed this discussion with the following.

Although I'm not going to specifically talk about static bindings in this post, the following examples use function declarations as you would in a static binding. For dynamic bindings, you'll use function pointers instead.

Static Bindings

In D, we generally do not have to declare a function before using it. The implementation is the declaration. And it doesn't matter if it's declared before or after the point at which its called. As long as it is in the currently visible namespace, it's callable. However, when linking with a C library, we don't have access to any function implementations (nor, actually, to the declarations--hence the binding). They are external to the application. In order to call into that library, the D compiler needs to be made aware of the existence of the functions that need to be called so that, at link time, it can match up the proper address offsets to make the call. This is the only case I can think of in D where a function declaration isn't just useful, but required.

I explained linkage attributes in part two. The examples I gave there, coupled with the details in part three regarding type translation, are all you need to know to implement a function declaration for a static D binding to a C library. But I'll give an example anyway.
// In C, foo.h
extern int foo(float f);
extern void bar(void);

// In D
extern( C )
{

    int foo(float);

    void bar();
}
Please be sure to read parts 2 - 4 completely. With all of that, and the info in this post up to this point, you have almost everything you need to know to implement a static binding in D, barring any corner cases that I've failed to consider or am yet to encounter myself. Oh, and function pointers. But that's coming in the next section.

Interlude -- Function Pointers

I could have covered function pointers in part three. After all, it isn't uncommon to encounter C libraries that use function pointers as typedefed callbacks (or declared inline in a function parameter list), or as struct members. And there's no difference in declaring function pointers for callbacks or for loading function symbols from a shared library. The syntax is identical. But, there are some issues that need to be considered in the different situations. And it's important to understand this before we get into dynamic bindings.

Let's look first at the syntax for declaring a function pointer in D.

int function() MyFuncPtr;
Very simple: return type->function keyword->parameter list->function pointer name. Though it's possible to use MyFuncPtr directly, it's often convenient to declare an alias:


alias int function() da_MyFuncPtr;
da_MyFuncPtr MyFuncPtr;
What's the difference? Let's see.


int foo(int i)
{
  return i;
}

void main()
{
    int function(int) fooPtr;
    fooPtr = &foo;


    alias int function(int) da_fooPtr;
    da_fooPtr fooPtr2 = &foo;


    import std.stdio;
    writeln(fooPtr(1));
    writeln(fooPtr2(2));
}
There is none! At least, not on the surface. I'll get into that later. Let's look at another example. Translating a C callback into D.

// In C, foo.h
typedef int (*MyCallback)(void);
 
// In D
extern( C ) alias int function() MyCallback;
Notice that I used the alias form here. Anytime you declare a typedefed C function pointer in D, it should be aliased so that it can be used the same way. Finally, the case of function pointers declared inline in a paramter list.

// In C, foo.h
extern void foo(int (*BarPtr)(int));

// In D.
// Option 1
extern( C ) void foo(int function(int) BarPtr);

// Option 2
extern( C ) alias int function(int) BarPtr;
extern( C ) void foo(BarPtr);
Personally, I prefer option 2. Also, I generally prefer to use extern blocks to include multiple declarations so that I don't have to type extern( C ) or extern(System) all the time (as I did in the previous example).

Now that the function pointer intro is out of the way, it's time to look at dynamic bindings.

Dynamic Bindings

At this point, I would very much like to say that you know everything you need to know about dynamic bindings. But that would be untrue. As it turns out, simply declaring function pointers is not enough. There are two issues to take into consideration. The first is function pointer initialization.

In one of the examples above (fooPtr), I showed how a function pointer can be declared and initialized. But in that example, it is obvious to the compiler that the function foo and the pointer fooPtr have the same basic signature (return type and parameter list). Now consider this example.

// This is all D.
int foo() { return 1; }

void* getPtr() { return cast(void*)&foo; }

void main()
{

    int function() fooPtr;

    fooPtr = getPtr();
}
Try to compile this and you'll see something like:

fptr.d(10): Error: cannot implicitly convert expression (getPtr()) of type void* to int function()

Now, obviously this is a contrived example. But I'm mimicking what a dynamic binding has to go through. OS API calls (like GetProcAddress or dlsym) return function pointers of void* type. So this is exactly the sort of error you will encounter if you try to directly assign the return value to a function pointer declared in this manner.

The first solution that might come to mind is to go ahead and insert an explicit cast. So, let's see what that gets us.

fooPtr = cast(fooPtr)getPtr();
The error here might be obvious to an experienced coder, but certainly not to most. I'll let the compiler explain.

fptr.d(10): Error: fooPtr is used as a type

Exactly. fooPtr is not a type, it's a variable. This is akin to declaring int i = cast(i)x; You can't do that. So the next obvious solution might be to use an aliased function pointer declaration. Then it can be used as a type. And that is, indeed, one possible solution (and, for reasons I'll explain below, the best one).

alias int function() da_fooPtr;
da_fooPtr fooPtr = cast(da_fooPtr)getPtr();
And this compiles. For the record, the 'da_' prefix is something I always use with function pointer aliases. It means 'D alias'. You can do as you please.

I implied above that there was more than one possible solution. Here's the second one.

int foo() { return 1; }void* getPtr() { return cast(void*)&foo; }void bindFunc(void** func) { *func = getPtr(); }void main()
{
    int function() fooPtr;
    bindFunc(cast(void**)&fooPtr);
}
Here, the address of fooPtr is being taken (giving us, essentially, a foo**) and cast to void**. Then bind func is able to dereference the pointer and assign it the void* value without a cast. When I first implemented Derelict, I used the alias approach. In Derelict 2, Tomasz Stachowiak implemented a new loader using the void** technique. That worked well. And, as a bonus, it eliminated a great many alias declarations from the codebase. Until something happened that, while a good thing for many users of D on Linux, turned out to be a big headache for me.

For several years, DMD did not provide a stack trace when exceptions were thrown. Then, some time ago, a release was made that implemented stack traces on Linux. The downside was that it was done in a way that broke Derelict 2 completely on that platform. To make a long story short, the DMD configuration files were preconfigured to export all symbols when compiling any binaries, be they shared objects or executables. Without this, the stack trace implementation wouldn't work. This caused every function pointer in Derelict to clash with every function exported by the bound libraries. In other words. the function pointer glClear in Derelict 2 suddenly started to conflict with the actual glClear function in the shared library, even though the library was loaded manually (which, given my Windows background, makes absolutely no sense to me whatsoever). So, I had to go back to the aliased function pointers. Aliased function pointers and variables declared of their type aren't exported. If you are going to make a publicly available dynamic binding, this is something you definitely need to keep in mind.

I still use the void** style to load function pointers, despite having switched back to aliases. It was less work than converting everything to a direct load. And when I implemented Derelict 3, I kept it that way. So if you look at the Derelict loaders...

// Instead of seeing this
foo = cast(da_Foo)getSymbol("foo");

// You'll see this
foo = bindFunc(cast(void**)&foo, "foo");
I don't particularly advocate one over the other when implementing a binding with the aid of a script. But if you're doing it by hand, the latter is much more amenable to quick copy-pasting.

There's one more important issue to discuss. Given that a dynamic binding uses function pointers, the pointers are subject to D's rules for variable storage. And by default, all variables in D are stashed in Thread-Local Storage. What that means is that, by default, each thread gets its own copy of the variable. So if a binding just blindly declares function pointers, then they are loaded in one thread and called in another... boom! Thankfully, D's function pointers are default initialized to null, so all you get is an access violation and not a call into random memory somewhere. The solution here is to let D know that the function pointers need to be shared across all threads. We can do that using one of two keywords: shared or __gshared.

One of the goals of D is to make concurrency easier than it traditionally has been in C-like languages. The shared type qualifier is intended to work toward that goal. When using it, you are telling the compiler that a particular variable is intended to be used across threads. The compiler can then complain if you try to access it in a way that isn't thread-safe. But like D's immutable and const , shared is transitive. That means if you follow any references from a shared object, they must also be shared. There are a number of issues that have yet to be worked out, so it hasn't seen a lot of practical usage that I'm aware of. And that's where __gshared comes in.

When you tell the compiler that a piece of data is __gshared, you are saying, "Hey, Mr. Compiler, I want to share this data across threads, but I don't want you to pay any attention to how I use it, mmkay?" Essentially, it's no different from a normal variable in C or C++. If you want to share a __gshared variable across threads, it's your responsibility to make sure it's properly synchronized. The compiler isn't going to help you.

So when implementing a dynamic binding, a decision has to be made: thread-local (default), shared, or __gshared. My answer is __gshared. If we pretend that our function pointers are actual functions, which are accessible across threads anyway, then there isn't too much to worry about. Care still need be taken to ensure that the functions are loaded before any other threads try to access them and that no threads try to access them after the bound library is unloaded. In Derelict, I do this with static module constructors and destructors (which can still lead to some issues during program shutdown, but I'll cover that in a separate post). Here's an example.

extern( C )
{
    alias void function(int) da_foo;
    alias int function() da_bar;
}

__gshared
{
    da_foo foo;
    da_bar bar;
}
Finally, there's the question of how to load the library. That, I'm afraid, is an exercise for the reader. In Derelict, I implemented a utility package (DerelictUtil) that abstracts the platform APIs for loading shared libraries and fetching their symbols. The abstraction is behind a set of free functions that can be used directly or via a convenient object interface. In Derelict itself, I use the latter since it makes managing loading an entire library easier. But in external projects, I often use the free-function interface for loading one or two functions at a time (such as certain Win32 functions that aren't available in the ancient libs shipped with DMD). It also supports selective loading, which is a term I use for being able to load a library if specific functions are missing (the default behavior is to throw an exception when an expected symbol fails to load).

Conclusion

Overall, there's a good deal of work involved in implementing any sort of binding in D. But I think it's obvious that dynamic bindings require quite some extra effort. This is especially true given that the automated tools I've seen so far are all geared toward generating static bindings. I've only recently begun to use custom scripts myself, but they still require a bit of manual preparation because I don't want to deal with a full-on C parser. That said, I prefer dynamic bindings myself. I like having the ability to load and unload at will and to have the opportunity to present my own error message to the user when a library is missing. Others disagree with me and prefer to use static bindings. That's perfectly fine.

At this point, static and dynamic bindings exist for several popular libraries already. Deimos is a collection of the former and Derelict 3 the latter. You'll find some bindings for the same library in both and several that are in one project but not the other. Use what you need and are comfortable with. And I hope that, if the need arises, you can use the advice I've laid out in this series of posts to help fill in the holes and develop static or dynamic bindings yourself.

Given that I'm just under 4 hours from 2013 as I write this in Seoul, Korea, I want to wish you a Happy New Year. May you start and finish multiple projects in the coming year!


Binding D to C Part Four

Posted by , in D, Binding to C 05 August 2012 - - - - - - · 3,444 views

This is the fourth part of a series on creating bindings to C libraries for the D Programming Language.

In part one, I introduced the difference between dynamic and static bindings and some of the things to consider when choosing which kind to implement. In part two, I talked about the different linkage attributes to be aware of when declaring external C functions in D. In part three, I showed how to translate C types to D. Here in part four, I'll wrap up the dscussion of type translation with a look at structs.

A D Struct is a C Struct
For the large majority of cases, a C struct can be directly translated to D with little or no modification. The only major difference in the declarations is when C's typedef keyword is involved. The following example shows two cases, with and without typedef. Notice that there is no trailing semi-colon at the end of the D structs.

// In C
struct foo_s
{
	int x, y;
};

typedef struct
{
	   float x;
	   float y;
} bar_t;

// In D
struct foo_s
{
	int x, y;
}

struct bar_t
{
	float x;
	float y;
}

Most cases of struct declarations are covered by those two examples. Sometimes, a slight deviation may be encountered. Such as a struct with two names, one in the struct namespace and one outside of it (the typedef). In that case, the typedefed name should always be used.

// In C
typedef struct foo_s
{
	int x;
	struct foo_s *next;
} foo_t;

// In D
struct foo_t
{
	int x;
	foo_t *next;
}

Another common case is that of what is often called an opaque struct (in C++, more commonly referred to as a forward reference). The translation from C to D is similar to that above.

// In C
typedef struct foo_s foo_t;

// In D
struct foo_t;

Member Gotchas
When translating the types of struct members, the same rules as outlined in Part 3 should be followed. But there are a few gotchas to be aware of.

The first gotcha is relatively minor, but annoying. I've previously mentioned in this series that I believe it's best to follow the C library interface as closely as possible when naming types and functions in a binding. This makes translating code using the library much simpler. Unfortunately, there are cases where a struct might have a field which happens to use a D keyword for its name. The solution, of course, is to rename it. I've encountered this a few times with Derelict. My solution is to prepend an underscore to the field name. For publicly available bindings, this should be prominantly documented.

// In C
typedef struct
{
	// oops! module is a D keyword.
	int module;
} foo_t;

// In D
struct foo_t
{
	int _module;
}

The next struct gotcha is that of versioned struct members. Though rare in my experience, some C libraries wrap the members of some structs in #define blocks. I find this practice rather annoying (libpng, I'm looking at you), because it can cause problems not only with language bindings but also with binary compatibility issues when using C as well. Thankfully, translating this idiom to D is simple. Using it, on the other hand, can get a bit hairy.

Here's an example.

// In C
typedef struct
{
	float x;
	float y;
	#ifdef MYLIB_GO_3D
	float z;
	#endif
} foo_t;

// In D
struct foo_t
{
	float x;
	float y;
	// Using any version identifier you want -- this is one case where I advocate breaking
	// from the C library. I prefer to use an identifier that makes sense in the context of the binding.
	version(Go3D) float z;
}

Then, to make use of the versioned member, the '-version=Go3D' is passed on the command line when compiling. And this is where the headache begins.

If the binding is compiled as a library, then any D application linking to that library will also need to be compiled with any version identifiers the library was compiled with, else the versioned members won't be visible. Furthermore, the C library needs to be compiled with the equivalent defines. So to use foo_t.z from the example above, the C library must be compiled with -DMYLIB_GO_3D, the D binding with -version=Go3D, and the D app with -version=Go3D. And when making a binding like Derelict that loads shared libraries dynamically, there's no way to ensure that end users will have a properly compiled copy of the C shared library on their system unless it is shipped with the app. Not a big deal on Windows, but rather uncommon on Linux. Also, if the binding is intended for public consumption, the versioned sections need to be documented.

Read more about D's version conditions in the D Programming Language documentation.

The final struct member gotcha, and a potentially serious one, is bitfields. The first issue here is that D does not have bitfields. In D2, we have a library solution in std.bitmanip, but for a C binding it's not a silver-bullet solution because of the second issue. And the second issue is that the C standard leaves the ordering of bitfields undefined.

Consider the following example from C.

typedef struct
{
	int x : 2;
	int y : 4;
	int z: 8;
} foo_t;

There are no guarantees here about the ordering of the fields or where or even if the compiler inserts padding. It can vary from compiler to compiler and platform to platform. This means that any potential solution in D needs to be handcrafted to be compatibile with a specific C compiler version in order to guarantee that it works as expected.

Using std.bitmanip.bitfields might be the first approach considered.

// D translation using std.bitmanip.bitfields
struct foo_t
{
	mixin(bitfields!(
		int, "x", 2,
		int, "y", 4,
		int, "z", 8,
		int, "", 2)); // padding
}

Bitfields implemented this way must total to a multiple of 8 bits. In the example above, the last field, with an empty name, is 2 bits of padding. The fields will be allocated starting from the least significant bit. As long as the C compiler compiles the C version of foo_t starting from the least significant bit and with no padding in between the fields, then this approach might possibly work. I've never tested it.

The only other alternative that I'm aware of is to use a single member, then implement properties that use bit shift operations to pull out the appropriate value.

struct foo_t
{
	int flags;
	int x() @property { ... }
	int y() @property { ... }
	int z() @property { ... }
}

The question is, what to put in place of the ... in each property? That depends upon whether the C compiler started from the least-significant or most-significant bit and whether or not there is any padding in between the fields. In otherwords, the same difficulty faced with the std.bitmanip.bitfields approach.

In Derelict, I've only encountered bitfields in a C library one time, in SDL 1.2. My solution was to take a pass. I use a single 'flags' field, but provide no properties to access it. Given that Derelict is intended to be used on multiple platforms with C libraries compiled by multiple compilers, no single solution was going to work in all cases. I decided to leave it up to the user. Anyone needing to access those flags could figure out how to do it themselves. I think that's the best policy for any binding that isn't going to be proprietary. Proprietary bindings, on the other hand, can be targeted at specific C compilers on specific platforms.

Conclusion
I believe that's all I wanted to say about structs. In Part 5, which I'm quite certain will be the final installment, I'll talk about how to declare functions for both dynamic and static bindings and some of the issues that need to be considered when doing so. I'll also tie off any loose ends I think of.


Binding D to C Part Three

Posted by , in D, Binding to C 19 May 2012 - - - - - - · 4,514 views

This is the (long overdue) third part of a series on creating bindings to C libraries for the D programming language.

In part one, I introduced the difference between dynamic and static bindings and some of the things to consider when choosing which kind to implement. In part two, I talked about the different linkage attributes to be aware of when declaring external C functions in D. Here in part three, I'm going to begin discussing how to translate C type declarations into D. I'll continue the discussion in part four.

First off, I want to mention a particular page over at dlang.org called Converting C .h Files to D Modules. This is required reading for anyone planning to work on a D binding to a C library. This series should be considered a companion to that page.

Dealing With Types

When translating C types to D, the large majority take only a handful of forms:

* typedefs
* #defines
* struct declarations
* function parameters

All of the translation guidelines discussed in this post cover all four situations, but there are special cases regarding function parameters that will be covered in part five when I talk about function declarations. It's also possible to find global variable declarations in a C header. But, in my experience, they aren't generally encountered when creating library bindings.

Typedefs, Aliases, and Native Types

D used to have typedefs. And they were strict in that they actually created a new type. Given an int typdefed to a Foo, a type Foo would actually be created rather than it being just another name for int. But D also has alias, which doesn't create a new type but just makes a new name for an existing type. In D2, typedef was deprecated. Now we are left with alias.

alias is what should be used in D when a typedef is encountered in C, excluding struct declarations (more on that in part four). Most C headers have a number of typedefs that create alternative names for native types. For example, you might see something like this in a C header.

typedef int foo_t;
typedef float bar_t;

In a D binding, it's typically a very good idea to preserve the original typenames. The D interface should match the C interface as closely as possible. That way, existing C code from examples or other projects can be easily ported to D. So the first thing to consider is how to translate the native types int and long into D.

Fortunately, on the dlang page I mentioned above, there is a table that lists how all the C native types translate to D. If you look it up, you'll see that an int is an int, a float is a float, and so on. So to port the two declarations above, simply replace typedef with alias and all is well.

alias int foo_t;
alias float bar_t;

One thing I'd like to point out about that table, though. It lists the D int as equivalent to the C long. In most cases, this is true. But there is a possibility that the C long type could actually be 64-bits on some platforms, whereas D's int type is always 32-bits and D's long type is always 64-bits. As a measure of protection against this possible snafu, it's prudent to use a couple of handy aliases on the D side that are declared in core.stdc.config: c_long and c_ulong.

// In the C header
typedef long mylong_t;
typedef unsigned long myulong_t;

// In the D module
import core.stdc.config;

// Although the import above is private to the module, the aliases are public
// and visible outside of the module.
alias c_long mylong_t;
alias c_ulong myulong_t;

One more thing. If you are translating typedefs that use types from C's stdint.h, you have two options for the aliases. You can use native D types, since the sizes are fixed, or you can include core.stdc.stdint, which mirrors the C header, and just replace typedef with alias. For example, here are some types from SDL2 translated into D.

// From SDL_stdinc.h
typedef int8_t Sint8;
typedef uint8_t Uint8;
typedef int16_t Sint16;
typedef uint16_t Uint16;
...

// In D, without core.stdc.stdint
alias byte Sint8;
alias ubyte Uint8;
alias short Sint16;
alias ushort Uint16;
...

// And with the import
import core.stdc.stdint;

alias int8_t Sint8;
alias uint8_t Uint8;
alias int16_t Sint16;
alias uint16_t Uint16;
...

Enums

Translating anonymous enums from C to D requires nothing more than a copy/paste.

// In C
enum
{
	ME_FOO,
	ME_BAR,
	ME_BAZ
};

// In D
enum
{
	ME_FOO,
	ME_BAR,
	ME_BAZ,
}

Note that enums in D do not require a final semicolon. Also, the last member may be followed by a comma.

For named enums, you may want to do just a bit more than a direct copy/paste. Named enums in D require the name be prefixed when accessing members. Example:

// In C
typedef enum
{
	ME_FOO,
	ME_BAR,
	ME_BAZ
} MyEnum;

// In D
enum MyEnum
{
	ME_FOO,
	ME_BAR,
	ME_BAZ
}

// In some function...
MyEnum me = MyEnum.ME_FOO;

There's nothing wrong with this in and of itself. In fact, there is a benefit in that it gives you some type safety. For example, if a function takes a parameter of type MyEnum, you can't just pass any old int in its place. The compiler will complain that int is not implicitly convertible to MyEnum. That may be acceptable for an internal project, but for a publicly available binding it is bound to cause confusion because it breaks compatibility with existing code samples. One work around that maintains type safety is the following.

alias MyEnum.ME_FOO ME_FOO;
alias MyEnum.ME_BAR ME_BAR;
alias MyEnum.ME_BAZ ME_BAZ;

// Now this works
MyEnum me = ME_FOO;

It's obvious how tedious this could become for large enums. If type safety is not important, there's one more workaround.

alias int MyEnum;
enum
{
	ME_FOO,
	ME_BAR,
	ME_BAZ
}

This will behave exactly as the C version.

#defines

Often in C, #define is used to declare constant values. OpenGL uses this approach to declare values that are intended to be interpreted as the type GLenum. Though these values could be translated to D using the immutable type modifier, there is a better way.

D's enum keyword is used to denote traditional enums and also manifest constants. In D, a manifest constant is an enum that has only one member, in which case you can omit the braces in the declaration. Here's an example:

// This is a manifest constant of type float
enum float Foo = 1.003f;

// We can declare the same thing using auto inference
enum Foo = 1.003f; // float
enum Bar = 1.003; // double
enum Baz = "Baz!" // string

For single #defined values in C, these manifest constants work like a charm. But often, such values are logically grouped according to function. Given that a manifest constant is essentially the same as a one-member enum, it follows that we can group several #defined C values into a single, anonymous D enum.

// On the C side.
#define FOO_SOME_NUMBER 100
#define FOO_A_RELATED_NUMBER 200
#define FOO_ANOTHER_RELATED_NUMBER 201

// On the D side
enum FOO_SOME_NUMBER = 100
enum FOO_A_RELATED_NUMBER = 200
enum FOO_ANOTHER_NUMBER = 201

// Or, alternatively
enum
{
	FOO_SOME_NUMBER = 100,
	FOO_A_RELATED_NUMBER = 200,
	FOO_ANOTHER_NUMBER = 201,
}

Personally, I tend to use the latter approach if there are more than two or three related #defines, and the former if it's only one or two values.

But let's get back to the manifest constants I used in the example up above. I had a float, a double and a string. What if there are multiple #defined strings? Do you have to declare a seperate manifest constant for each one? No, not if you don't want to. D's enums can be typed to any existing type. Even structs.

// In C
#define LIBNAME "Some Awesome C Library"
#define AUTHOR "John Foo"
#define COMPANY "FooBar Studios"

// In D, collect all the values into one enum declaration of type string
enum : string
{
	LIBNAME = "Some Awesome C Library",
	AUTHOR = "John Foo",
	COMPANY = "FooBar Studios",
}

Neat, eh? Again, note the trailing comma on the last enum field. I tend to always include these in case a later version of the C library adds a new value that I need to tack on at the end. A minor convenience.

More to Come

I think that's about enough for this session. In part four, we'll take a look at structs. There are a couple of pitfalls to be aware of when porting them over to D and I'll give you some advice to get around them.


Binding D to C Part Two

Posted by , in Binding to C, D 30 January 2012 - - - - - - · 4,072 views

This is part two of a series on creating bindings to C libraries for the D programming language.

In part one, I discussed the difference between dynamic and static bindings and some of the considerations to take into account when deciding which way to go. Here in part two, I'm going to talk about an important aspect of function declarations: linkage attributes.

When binding to C, it is critical to know which calling convention is used by the C library you are binding. In my experience, the large majority of C libraries use the cdecl calling convention across each platform. Modern Windows system libraries use the stdcall calling convention (older libraries used the pascal convention). See this page on x86 calling conventions if you want to know the differences.

D provides a storage class, extern, that does two things when used with a function. It tells the compiler that the given function is not stored in the current module and it specifies a calling convention via a linkage attribute. The D documentation lists all of the supported linkage attributes, but for C bindings the three you will be working with most are C, Windows and System.

Although I'm not going to specifically talk about static bindings in this post, the following examples use function declarations as you would in a static binding. For dynamic bindings, you'll use function pointers instead.

The C attribute is used on functions that have the cdecl calling convention. If no calling convention is specified in the C headers, it's safe to assume that the default convention is cdecl. There's a minor caveat in that some compilers allow the default calling convention to be changed via the command line. This isn't an issue in practice, but it's a possibility you should be aware of if you don't have control over how the C library is compiled.

// In C
extern void someCFunction(void);

// In D
extern(C) void someCFunction();

The Windows attribute is used on functions that have the stdcall calling convention. In the C headers, this means the function is prefixed with something like __stdcall, or a variation thereof depending on the compiler. Often, this is hidden behind a define. For example, the Windows headers use WINAPI, APIENTRY, and PASCAL. Some third party libraries will use these same defines or create their own.

// In C
#define WINAPI __stdcall
extern WINAPI someWin32Function(void);

// In D
extern(Windows) someWin32Function();

The System attribute (extern(System)) is useful when binding to libraries, like OpenGL, that use the stdcall convention on Windows, but cdecl on other systems. On Windows, the compiler sees it as extern(Windows), but on other systems as extern( C ). The difference is always hidden behind a define on the C side.

// In C
#ifdef _WIN32
#include <windows.h>
#define MYAPI WINAPI
#else
#define MYAPI
#endif

extern MYAPI void someFunc(void);

// In D
extern(System) void someFunc();

The examples above are just examples. In practice, there are a variety of techniques used to decorate function declarations with a calling convention. It's important to examine the headers thoroughly and make no assumptions about what a particular define actually translates to.

One more useful detail to note is that when implementing function declarations on the D side, you do not need to prefix each one with an extern attribute. You can use an attribute block like so:

extern(C)
{
	void functionOne();
	double functionTwo();
}

// Or, if you prefer
extern(C):
	void functionOne();
	void functionTwo();

In part three, I'll talk a bit about builtin types and how they translate between C and D. After that, we'll be ready to look at complete function declarations and how they differ between static and dynamic bindings.


Binding D to C

Posted by , in Binding to C, D, Derelict 08 January 2012 - * * * * * · 6,838 views

This is part one of a series on creating bindings to C libraries for the D programming language.

This is a topic that has become near and dear to my heart. Derelict is the first, and only, open source project I've ever maintained. It's not a complicated thing. There's very little actual original code outside of the Utility package (with the exception of some bookkeeping for the OpenGL binding). The majority of the project is a bunch of function and type declarations. Maintaining it has, overall, been relatively painless. And it has brought me a fair amount of experience in getting D and C to cooperate.

As the D community continues to grow, so does the amount of interest in bindings to C libraries. Recently, a project called Deimos was started over at github to collate a variety of bindings to C libraries. There are several bindings there already, and I'm sure it will grow. People creating D bindings for the first time will, usually, have no trouble. It's a straightforward process. But there are certainly some pitfalls along the way. In this post, I want to highlight some of the basic issues to be aware of. For the sake of clarity, I'm going to ignore D1 (for a straight-up "static" binding, the differences are minor, but they do exist.

The first thing to consider is what sort of binding you want, static or dynamic. By static, I mean a binding that allows you to link with C libraries or object files directly. By dynamic, I mean a binding that does not allow linking, but instead loads a shared library (DLL/so/dylib/framework) at runtime. Derelict is an example of the latter; most (if not all) of the bindings in the Deimos repository the former. There are tradeoffs to consider.

D understands the C ABI, so it can link with C object files and libraries just fine, as long as the D compiler understands the object format itself. Therein lies the rub. On Posix systems, this isn't going to be an issue. DMD (and of course, GDC and, I assume, LDC) uses the GCC toolchain on Posix systems. So getting C and D to link and play nicely together isn't much of a hassle. On Windows, though, it's a different world entirely.

On Windows, we have a variety of object file formats to contend with: COFF, OMF, ELF. DMD, which uses an ancient linker called Optlink, outputs OMF objects. GDC, which uses the MingW backend on Windows, outputs ELF objects. I haven't investigated LDC yet, but it uses whichever backend LLVM is configured to use. Meanwhile, the compiler that ships with Visual Studio outputs objects in the COFF format. What a mess!

This situation will improve in the future, but for now it is what it is. And that means when you make a C binding, you have to decide up front whether you want to deal with the mess or ignore it completely. If you want to ignore it, then a dynamic binding is the way to go. Generally, when you manually load DLLs, it doesn't matter what format they were compiled in, since the only interaction between your app and the DLL happens in memory. But if you use a static binding, the object file format determines whether or not the app will link. If the linker can't read the format, you get no executable. That means you have to either compile the C library you are binding with a compiler that outputs a format your D linker understands, use a conversion tool to convert the libraries into the proper format, or use a tool to extract a link library from a DLL. Will you ship the libraries with your binding, in multiple formats for the different compilers? Or will you push it off on the users to obtain the C libraries themselves? I've seen both approaches.

Whichever way you decide to go really doesn't matter. In my mind, the only drawback to dynamic bindings is that you can't choose to have a statically linked program. I've heard people complain about "startup overhead", but if there is any it's negligble and I've never seen it (you can try it with Derelict -- make an app using DerelictGL/SDL/SDLImage/SDLMixer/SDLNet/SDLttf and see what kind of overhead you get at startup). The only drawback to static bindings is the object file mess. But with a little initial work upfront, it can be minimzed for users so that it, too, is negligible.

Once you decide between static and dynamic, you aren't quite ready to roll up your sleeves and start implementing the binding. First, you have to decide how to create the binding. Doing it manually is a lot of work. Trust me! That's what I do for all of the bindings in Derelict. Once you develop a systematic method, it goes much more quickly. But it is still drastically more time consuming than using an automated approach. To that end, I know people have used SWIG and a tool called htod. VisualD now has an integrated C++-to-D converter which could probably do it as well. I've never used any of them (which is really incredible when I think about it, given how precious little time I usually have), so I can't comment on the pros and cons one way or another. But I do know that any automated output is going to require some massaging. There are a number of corner cases that make an automated one-for-one translation extremely difficult to get right. So regardless of your approach, if you don't want your binding to blow up on you down the road, you absolutely need to understand exactly how to translate D to C. And that's where the real fun begins.

That's going to have to wait for another post, though. It's Sunday evening here and I've got things to do. In part two, I'll talk about function declarations. I think they're easier to cover than types, which I'll save for a third post. Until then, Happy New Year!





December 2016 »

S M T W T F S
    123
4 5 678910
11121314151617
18192021222324
25262728293031

Recent Entries