Critique My Serialization API

Started by
25 comments, last by Randy Gaul 4 years, 11 months ago

 

Hi all. I wrote a serialization API in C++ for myself and wanted to ask for a critique from anyone interested. I like the JSON file format and wanted to use it originally. JSON is a good way to support versioning for your serialized stuff, since you can build in mechanisms to handle missing fields, and extra fields can be ignored.

I audited a few JSON options and this one by sheredom was the best, but I still found it a little lacking. Specifically I couldn't figure out the API. It's all... Weird. And the examples are terrible.

Since I couldn't find exactly what I wanted I wrote my own. Here were my specific list of requirements.

  • No external dependencies (other than c runtime).
  • JSON-like text output.
  • Can implement the writer/reader with the same code (instead of two different functions).
  • No annoying linked-lists in the API.
  • Supports arrays.
  • Base64 encoding built-in.
  • Shouldn't do anything special for utf8 (they can just be base64 encoded, or sent as-is).
  • Can inspect a field's type before attempting to read it.
  • Arrays have length prepended.

Example output.


{
    x = 5,
    y = 10.300000,
    str = "Hello.",
    sub_thing = {
        num0 = 7,
        num1 = 3,
    },
    blob_data = "U29tZSBibG9iIGlucHV0LgA=",
    array_of_ints = [8] {
        0, 1, 2, 3, 4, 5, 6, 7,
    },
    array_of_array_of_ints = [2] {
        [3] {
            0, 1, 2
        },
        [3] {
            0, 1, 2
        },
    },
    array_of_objects = [2] {
        {
            some_integer = 13,
            some_string = "Good bye.",
        },
        {
            some_integer = 4,
            some_string = "Oi!",
        },
    },
},

The header itself, kv.h.


struct kv_t;

#define CUTE_KV_MODE_WRITE 1
#define CUTE_KV_MODE_READ  0

kv_t* kv_make(void* user_allocator_context = NULL);
void kv_destroy(kv_t* kv);
error_t kv_reset(kv_t* kv, const void* data, int size, int mode);
int kv_size_written(kv_t* kv);

enum kv_type_t
{
    KV_TYPE_NULL   = 0,
    KV_TYPE_INT64  = 1,
    KV_TYPE_DOUBLE = 2,
    KV_TYPE_STRING = 3,
    KV_TYPE_ARRAY  = 4,
    KV_TYPE_BLOB   = 5,
    KV_TYPE_OBJECT = 6,
};

error_t kv_key(kv_t* kv, const char* key, kv_type_t* type = NULL);

error_t kv_val(kv_t* kv, uint8_t* val);
error_t kv_val(kv_t* kv, uint16_t* val);
error_t kv_val(kv_t* kv, uint32_t* val);
error_t kv_val(kv_t* kv, uint64_t* val);

error_t kv_val(kv_t* kv, int8_t* val);
error_t kv_val(kv_t* kv, int16_t* val);
error_t kv_val(kv_t* kv, int32_t* val);
error_t kv_val(kv_t* kv, int64_t* val);

error_t kv_val(kv_t* kv, float* val);
error_t kv_val(kv_t* kv, double* val);

error_t kv_val_string(kv_t* kv, char** str, int* size);
error_t kv_val_blob(kv_t* kv, void* data, int* size, int capacity);

error_t kv_object_begin(kv_t* kv);
error_t kv_object_end(kv_t* kv);

error_t kv_array_begin(kv_t* kv, int* count);
error_t kv_array_end(kv_t* kv);

void kv_print(kv_t* kv);

The implementation is 948 lines of code - pretty small!

Here's what it generally looks like to use.


char buffer[1024];
kv_reset(kv, buffer, sizeof(buffer), CUTE_KV_MODE_WRITE);

thing_t thing;
thing.a = 5;
thing.b = 10.3f;
thing.str = "Hello.";
thing.str_len = 7;

kv_begin_object(kv);
kv_key(kv, "a");
kv_val(kv, &thing.a);
kv_key(kv, "b");
kv_val(kv, &thing.b);
kv_key(kv, "str");
kv_val(kv, &thing.str, &thing.str_len);
kv_object_end(kv);

printf("%s", buffer);

Which would output:


{
    a = 5,
    b = 10.300000,
    str = "Hello."
}

Depending on if mode is set to read/write the kv_* functions will either write to the buffer, or read (parse) from the buffer. This means the serialization routine only needs to be written once (most of the time) by using some polymorphism.

If anyone was brave enough to read through all this info, allow me to say thanks! I really appreciate it :)

Advertisement

Do you have strict mode or warnings?

Nums int or float/double by default?

Arrays have length prepended - "happy debugging"?

Arrays are maps and I can iterate by [0...1]-key and vector-like by default?

Size limit?

Are you read all-in-one in memory?

Could I decode Base64 in this vice versa?

  1. Errors are reported with a return value.
  2. Numbers are always returned as int64_t or double. The API can then be used to cast the result down to a more specific type.
  3. Yes arrays have length prepended. This was to make parsing a little more trivial.
  4. I'll post an array example below.
  5. Yes all in memory at once. No size limit other than the bit-range of int.
  6. Yes it does base64 encode and decode.

Array example.


// Use case.
kv_object_begin(kv);
	kv_key(kv, "array_of_ints");
	kv_array_begin(kv, &array.count);
	for (int i = 0; i < array.count; ++i)
		kv_val(kv, array.data + i);
	kv_array_end(kv);
kv_object_end(kv);

// Output.
{
	array_of_ints = [8] {
		0, 1, 2, 3, 4, 5, 6, 7,
	},
}

 

If you're actually intending to use C++ instead of C-with-sugar, you should probably...

  • Consider using RAII patterns to manage your object and array constructions.
  • Consider using exceptions to manage errors, since not one line of your sample code seems to demonstrate error-detection (which is almost certainly because it's much harder than it looks the way you've done it).
  • Consider accepting &std::string and &std::array for your kv_key() and kv_array_begin() methods, instead of forcing the user to manage separate "data" and "count" values that must be serialized and deserialized.

Last of all, how does your parser detect or cope with out-of-order keys and/or missing keys, e.g. deserializing a text object with "a", "b", "c" members written, but the code serializes/deserializes them in "d", "c", "b", "a" order?

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.

Hi Wyrframe, thanks for the feedback! I'll consider RAII and exceptions. In the meantime I'll post what I'm currently thinking for error handling since it is not clear.


// Option 1) check return values of each function.
error_t err = kv_key(kv, "array_of_ints");
if (err.is_error()) {
  	const char* details = err.details;
	int code = err.code;
	// handle error here
}

// Option 2) check return value of the kv state when convenient.
kv_object_begin(kv);
	kv_key(kv, "array_of_ints");
	kv_array_begin(kv, &array.count);
	for (int i = 0; i < array.count; ++i)
		kv_val(kv, array.data + i);
	kv_array_end(kv);
kv_object_end(kv);

error_t err = kv_error_state(kv);
if (err.is_error()) {
  	const char* details = err.details;
	int code = err.code;
	// handle error here
}

Good idea on std::string and std::array. I'm actually in the process of building out a utilities header for extra features, like dealing with common scenarios such as std::string or std::vector, or data inheritance as another example, or RAII/Exceptions wrappers. There would be two headers.

  1. 1. kv.h - the header I pasted above
  2. 2. kv_utils.h - variety of useful higher-level features that use kv.h functions as building blocks

For ordering I'm copying JSON's style where the order does not matter, except for arrays the element ordering matters. The idea is to lookup a key, and if no error was returned grab the value matching the key. Order changes are negligable. Missing keys would return an error code from a kv_key() call, so users can assign default values.

That error handling seems fragile, and I'd worry about how to handle calls into a kv object which is in an error state (what does each kv_val call do, for example, if there was an error during kv_array_begin which left the inout param corrupted?)  I'd also worry about the user making assumptions about the values they didn't actually deserialize because they failed to read them earlier in the process.

I'd structure things more like this, if using the same code for serialization and deserialization was my design goal...


// Innerdoc code using exceptions, lambdas, and declarative programming.
void my_type::convert(kv_t& kute)
{
    // A kv_t is an interface to read or write a single value-point (of object, array, or value type).
    // A kv_objt is an interface to read or write a map/table/object type.
    // A kv_arrt is an interface to read or write a sequence/array type.

    kute.object("my_type", [this](kv_objt& object_target) {

     // attr(field_name, present_handler[, absent_handler])
     // Declares an attribute. If discovered (which it "always" is during serialize), the present_handler lambda is called.
     // During deserialize, if the attribute is absent and no absent_handler is given, a "missing required attribute" exception is raised.
      
     // attr<T>(field_name, T& bind_reference, T default_value)
     // ... but more often, you don't need a full lambda to do it, you use a well-known type and value.
     object_target.attr<std::string>("name", & this->name, "unnamed");
     object_target.attr<std::string>("age", & this->age, 0);

     // attr(field_name, array_length, array_handler)
     // Declares an attribute with an array value. If absent, the array is populated zero-length.
     // The length provided is ignored during deserialize.
     object_target.attr("array_of_ints", this->array.length(), [this](kv_arrt& array_target) {
       // Option 1: bind to an actual std::array, or maybe to anything which is a valid argument for std::begin() and std::inserter()?
       array_target.bind_to<(this->array);

       // Option 2: dual-mode iterator; it respects the passed-in length during serialize, and the deserialized length during deserialize.
       // You pass the iterator a bind_to for anything that supports the [] operator, and it does the rest.
       for(auto i = array_target.begin(), n = array_target.end(); i != n; ++i)
         i.bind_to(this->array);

       // Option 3; naive but most flexible indexing.
       for(auto i = 0; i < array_target.size(); ++i) {
         array_target.index(i, this->array[i]);
       }
     });
   });
}

// Again, with just the real code.
void my_type::convert(kv_t& kute)
{
    kute.object("my_type", [this](kv_objt& object_target) {
     object_target.attr<std::string>("name", & this->name, "unnamed");
     object_target.attr<std::string>("age",  & this->age, 0);

     object_target.attr("array_of_ints", this->array.length(), [this](kv_arrt& array_target) {
       for(auto i = 0; i < array_target.size(); ++i) {
         array_target.index(i, this->array[i]);
       }
     });
   });
}

 

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.

I looked at your api, but it looks incomplete. Where is the deserialization part? How would you import a json into objects?

For importing, kv_val() does not make any sense to me or did i missed something?

3 minutes ago, Finalspace said:

I looked at your api, but it looks incomplete. Where is the deserialization part? How would you import a json into objects?

For importing, kv_val() does not make any sense to me or did i missed something?

Thanks for asking! I must not have made this very clear :)

The idea is when you setup the kv instance you choose read mode or write mode. That way you only write the serialization function for your object one time. Depending on if the kv instance is reading or writing, it will either read from the pointers in kv_val, or assign values to them.

So the example code I showed can work for read or write just depending on the setting passed to kv_reset.

1 hour ago, Wyrframe said:

That error handling seems fragile, and I'd worry about how to handle calls into a kv object which is in an error state (what does each kv_val call do, for example, if there was an error during kv_array_begin which left the inout param corrupted?)  I'd also worry about the user making assumptions about the values they didn't actually deserialize because they failed to read them earlier in the process.

I'd structure things more like this, if using the same code for serialization and deserialization was my design goal...

[snip]

Thanks for posting up some code! I'll take a closer look tomorrow. You raise a good concern about the error state and calling further functions. I think this is something I need to make really clear in the docs (which I didn't post). The idea is if there's an unrecoverable error, all subsequent calls will immediately return the previous error.

For example, in my own usage I attempt to serialize an entire instance of a game entity. Afterwards I check the kv state. If there was an error, I consider the instance serialization a failure, and raise a fatal error in my game.

Quote


thing.str = "Hello.";
thing.str_len = 7;

The old and popular "I hate myself and I want to die" design pattern. Please use std::string.

Omae Wa Mou Shindeiru

There seems to be no support for null pointers. How would you serialize (and, more difficult, deserialize) a generic pointer-based tree like this, with arbitrary configurations of live and null pointers that need to be preserved faithfully?


template <typename T, int childCount> class TreeNode{
  std::vector<TreeNode<T,childCount>*> children; 
  T* data;
  }

Or, even worse, a union?


template <typename T, int childCount> class TreeNode{
	bool leaf;
  	union{
  		std::vector<TreeNode<T,childCount>*> children; 
  		T data;
  	}
}

 

Omae Wa Mou Shindeiru

This topic is closed to new replies.

Advertisement