Creating my own file structure

Started by
7 comments, last by Alberth 6 years, 10 months ago

I found very little info about this on the web. Basically I want to create my own file format that stores mesh data (vertices, skinning info, etc).

I don't know where to start and I have very poor knowledge about file structures in general. What's the difference between opening files the normal way and opening them in binary mode? I always thought that binary mode is supposed to prevent people from being able to read the file in notepad, but I'm pretty sure I'm wrong. :lol:

Any tips and sources will be appreciated. thanks!

edit: I found this https://www.gamedev.net/resources/_/technical/game-programming/resource-files-explained-r902, but it's 17 years old. Is it still good?

Advertisement

A good search term is "serialization" - the act of translating runtime objects into a format that can be stored on disk (and deserialization is the opposite process).
You can do this ad hoc with APIs like fopen/fread/fwrite/fclose, or there's plenty of higher level serialization libraries that make it easier / more robust.

What's the difference between opening files the normal way and opening them in binary mode?

Almost nothing. Text mode might look for specific bytes and make some changes, like converting between Windows and Unix newline character sequences. Binary mode just reads/writes the exact bytes without any interference. You typically always want to use binary mode :P

So about the serialization process. You mean I can just have a class that contains a huge amount of data, reinterpret_cast it to void*, and call fwrite()?

You mean I can just have a class that contains a huge amount of data, reinterpret_cast it to void*, and call fwrite()?
Yes and no. You can, but if it contains any pointers, then these will no longer be valid after you've quit your app, started it again and re-loaded the file... And if it contains any C++ objects with constructors, these won't get re-run during the deserialization process... and if you recompile your code on a different platform/compiler then perhaps the memory layout of the structure could be different and the files from your own version won't load correctly... etc, etc...

It's much better to save/load one field at a time when using this method.

More advanced methods of serialization will tolerate all of these things, and even let you load data from different versions of the file, tolerate errors, etc...

So about the serialization process. You mean I can just have a class that contains a huge amount of data, reinterpret_cast it to void*, and call fwrite()?

Yes, if...

... you don't have pointers, and if you don't have virtual functions, networking and different architectures are of no concern, and a couple of other things to keep in mind (such as different program versions).

For virtual functions, you will have to call placement new on each object's address upon loading, to set up the vtable. For pointers in general, you will have to replace them with something that can be moved around, most likely indices or "handles". Then you can, in principle, just dump the thing to disk and load it from disk again (plus, some fixup).

Serialization is much more heavyweight, but it works in a much more general (and robust) way, too. Add a field in two years from now? It will still work. Make something optional so you don't waste bandwidth when it's not used? Serialization will do. Etc, etc.

Then using some kind of a fileSave(Object *obj) function (and Object* fileLoad() to load) seems a lot more intuitive than serialization, no?

by the way, what do you mean by different program versions?

Many OSs use many different architectures. A Windows XP Systems is a little bit different from a Windows 10 one is a little bit different from a Linux one is a little bit different from an Mac OS one. Some systems use different endianess (the way how bytes are structured to build the data) and also some systems use different size types other than 1 Byte - Char, 2 Byte - Short, 4 Byte - Int and so on. Means your serializer needs to catch all this to ensure that data written as 0x205f is not reversed on an other system to 0x5f20 and vise versa.

Some file formats like .bsa use little endian as standard format for data and other file formats like fbx ascii use pure text to store informations. For small file formats it might be good to store them as JSON encoded file where other file formats may be encrypted or compressed; need to be stored as bytes.

Creating a file format is quite simple, anything you need is to define the file layout and how data is stored in it. As an example see this


//File format header

3 byte [PAK] //Identify .pak files
2 byte Version [1 byte Major, 1 byte Minor] //Version of the .pak file
1 byte Flags //Describing the file mode

//File Header

1 byte Entry Type //May be file or folder
1 byte Name //Length of the name string
N byte Name String

   if file

      1 byte Flags //Again flags that describe the file
      2 byte Offset //The offset from beginning of a chunk to read from
      4 byte Chunk ID //The packed chunk ID
      4 byte Length //Size of the file in bytes

   if directory

      4 bytes Childs //Number of childs included in this directory

...

It is a short sketch of the package format I use in my game engine to bundle a games content in. As you see this describes the layout of the file so anyone nows how to write and how to read / interpret it

Then using some kind of a fileSave(Object *obj) function (and Object* fileLoad() to load) seems a lot more intuitive than serialization, no?

That is serialisation. Or at least it will be, inside those functions, which could well just be using fwrite or whatever. Serialisation is the act of writing something out, byte by byte, hence the name (each byte is written in serial).

by the way, what do you mean by different program versions?
That is where the fun starts. Suppose I write a program, and use your data format for writing and reading files.

All good and well, hooraay!!

Now I change the program, data from your file needs to be stored in a different way in my program.

Or, I change the data that is being saved and loaded, how does one load an old data file (assuming I can compute values for missing data).

Of course, both above things happen in real life, when you evolve your game.

This topic is closed to new replies.

Advertisement