Language for Rapidly Prototyping Binary Files

Started by
12 comments, last by EvincarOfAutumn 14 years ago
In the past, when I've needed to make custom binary data files but wanted something slightly more flexible than a hex editor, I cracked open my assembler, added a bunch of 'db' statements to get the bytes I needed, and assembled the whole thing into what I actually wanted. This was cumbersome and awkward, and I felt like there was a better solution. So I made one. I call it Protodata, and it's super handy for prototyping custom binary file formats, as you may often find yourself doing in game design. It's currently in alpha and there are some issues, but it's usable and I'd appreciate feedback. There's a Windows binary that should run anywhere and it's easy enough to build the source file. Yes, just the one. I intend to add support for specifying endianness, character encoding, and shorthand definitions for repeated structures. Let me know what else is broken and what you'd like to see added. Check out the Sourceforge project page and the project home page for more information.
Advertisement
Great idea! I used to make binary files by writing C++ programs to write them... ugh. Unfortunately, I don't have anything to try your program out on right now, but I'll remember it.

Just a couple things. I don't think you need three char types. I would either drop char or specify it as signed and drop schar. It would be cool to have functions, or parameterized repeated structures, as a way to repeatedly generate similar structures with slight variations. Also, it would be good to have a link to the project page or something on the homepage.

But on the whole, I like it. I like your logo, too.
Yeah, this intends to be a very lightweight solution to a problem that typically gets solved with more general-purpose languages that turn out to be unnecessarily complex in most cases. I'll drop schar and specify char as signed; it was late and somehow I thought it was a good idea at the time.

I'll look into more complex data generation directives, maybe including some conditional and looping constructs. I may include a facility for simple calculations, but nothing overly complex, as that would somewhat defeat the purpose of this tool.

And I know the documentation isn't perfect. I just wrote something up quickly for the sake of getting it out there. I'll be fleshing it out soon with better organisation, more examples, and links, to help people get started.

Do you think I should keep the (u)type convention, add an unsigned qualifier to make the syntax more familiar to C-family users, or have both? I already intend to add a kind of qualifier system for endianness, etc.

Thanks for your input and I hope you get to test it soon and find it useful. It does run on non-Windows platforms, by the way, in case that wasn't clear. I'm glad you like my logo, too...haha...if experience has taught me anything, it's that people like the "face" of a good icon on their software.
I usually use Python's struct module for things like this. Allows you to pass a format string and the variables that need to be serialized and voila. It handles endianness and padding as well.

You may also want to take a look at Protobuf.


As for Protodata, I got it to compile with a little bit of work (turned out I didn't have stdint.h, and I had to include <cctype> in main.cpp). I tried a test file:
size 5 5float 0.0 1.1 2.2 3.3int 0; char 5 6 7 8
and inspected the resulting binary file. Everything but the floats worked - they were stored as integers.


But the logo is funny, yeah. Made me smile. :)
Create-ivity - a game development blog Mouseover for more information.
Thanks for your feedback! I just wanted something that would run on its own without requiring Python or Perl or anything like that. I'm really following the Unix philosophy of "do one thing, and try to do it well".

Which compiler were you using so I can make the right changes to get it to build in as many places as possible?

The reason that your test file didn't work is that you used "float"; I was going to make "float" synonymous with "single", but forgot to get around to it and change the documentation accordingly. So your floats were stored using the implicit type (size) from the previous statement, and float was ignored as a malformed floating-point number. I'll fix that today.

Thanks for the logo comments...I think I'm going to tweak it to better reflect my original idea of making the "Pd" out of ones and zeroes.
Quote:Original post by EvincarOfAutumn
Thanks for your feedback! I just wanted something that would run on its own without requiring Python or Perl or anything like that. I'm really following the Unix philosophy of "do one thing, and try to do it well".

I guess I'm coming from a different background, but that phrase can be applied to Python's struct module just as well, isn't it? These days I prefer to use an existing, tested and documented solution rather than writing, testing and documenting my own. Ah well, I just wanted to give you some alternatives in case you didn't know about them. What do you think about Protobuf?

Quote:Which compiler were you using so I can make the right changes to get it to build in as many places as possible?

I used Microsofts C++ compiler (MSV 2005 express edition).
Create-ivity - a game development blog Mouseover for more information.
Quote:...that phrase can be applied to Python's struct module just as well, isn't it?


Sure. And if you happen to like Python and want to include it in your workflow, then by all means do. I just think that this approach is slightly cleaner, and find it more intuitive and legible to write a file like this:

short 1 2; long 3

Or even a command line like this:

echo short 1 2; long 3 | pd output

Than the equivalent Python, something like:

f = open('output', 'wb')f.write(pack('hhl', 1, 2, 3))f.close()


But your mileage may vary. As with any language, if you don't like it, it's probably not for you, and if you do like it, then you'll probably find a legitimate use for it.

Protobuf looks like a good solution for serialisation, but it has a pretty different problem domain from Protodata. Basically, Protobuf is for standardisation of protocol buffers, while Protodata is for hacking out a data file when you can't afford to write an editor.
I updated the documentation and posted the updated compiler. It should build fine in MSVC++ now, and I fixed the "bug" (just laziness) with semicolons in quoted data. The types make a little bit more sense now, and I added shorthand equivalents for every type, based in part on Python's struct, just for you! [wink]

echo h 1 2; l 3 | pd output


[Edited by - EvincarOfAutumn on April 17, 2010 1:06:46 PM]
The documentation is updated again and version 1.0 beta is out. This version sees endianness qualifiers, error behaviour modifiers, and shell-style comments. Coming in the final 1.0 release: repeat counts and user-defined structures.
This seems a bit useless. When do you need to create tiny binary files with only a couple number types in it?

Wouldn't just a plain text format be more suitable for that? it's practically the same thing.

Binary files are created for complex and heavy data. Data that could not come from scripting the file. Such as a crap ton of vertex data. Or a database.

The tool that aught to have been written is something that can serialize your c++ or python code automatically.

For example:
struct MyData{  float Val1;  int Val2;  std::string SomeString;  std::vector< MyData > Lolgoodluckwiththis;};


Your library aught to be able to take one of those structures and stick it in a binary file. The easier and more automagic you can make that, the more useful.

This topic is closed to new replies.

Advertisement