Jump to content
  • Advertisement
Randy Gaul

Critique My Serialization API

Recommended Posts

Posted (edited)
44 minutes ago, LorenzoGatti said:

The old and popular "I hate myself and I want to die" design pattern. Please use std::string.

I dont agree. It should be choice of the user-code to use std::string, but the api should never force you to use std::string. Instead it should support both, std::string and the C-style. If you look at the api you will clearly see, that its designed to support any kind of custom allocations:

kv_make(void* user_allocator_context = NULL);
Edited by Finalspace

Share this post


Link to post
Share on other sites
Advertisement
3 hours ago, Finalspace said:

I dont agree. It should be choice of the user-code to use std::string, but the api should never force you to use std::string. Instead it should support both, std::string and the C-style. If you look at the api you will clearly see, that its designed to support any kind of custom allocations:


kv_make(void* user_allocator_context = NULL);

Custom allocators don't protect against passing incorrect string lengths on serialization. And with separate lengths there's no easy way to convert between std::string and char* (or more precisely, the former becomes as cumbersome and error-prone as the latter).

Share this post


Link to post
Share on other sites
10 minutes ago, LorenzoGatti said:

Custom allocators don't protect against passing incorrect string lengths on serialization. And with separate lengths there's no easy way to convert between std::string and char* (or more precisely, the former becomes as cumbersome and error-prone as the latter).

Its a programming mistake if you pass the wrong length or the wrong pointer, so why care? It should crash or throw a exception.

Also using std::string wont protect you as well, because the ctor std::string(const char *s, size_t n) with passing the wrong ptr or length will result in undefined behavior as well -> See http://www.cplusplus.com/reference/string/string/string/ (Exception safety).

Share this post


Link to post
Share on other sites
Posted (edited)

I am not sure I understand the API. Do I need to set a compile time define to either read or write? Personally I would have read and store functions. The write functions would be by value (not pointer) for the intrinsic types. The read function would take a reference and most importantly a default value. E.g.

int value = kv_read( "Key", 1234 );

This allows to write de-serialization without error checking everything.

Otherwise I like the simplicity of this. If you augment your library with blob (nice!) I would also add a vector type.

 

Edited by Dirk Gregorius
typo

Share this post


Link to post
Share on other sites

@Dirk Gregorius; their idea is, you consistently pass references to the serializer methods, which are then treated as in parameters during serialization and out parameters during deserialization.

Share this post


Link to post
Share on other sites
Posted (edited)

I took everyone's suggestion and added support for std::string and std::vector. The utility functions look like this, and reside in kv_utils.inl.

// Utility functions for std::string or std::vector.
// These are unit tested and ready-to-go :) (unit tests not shown here)

error_t kv_val(kv_t* kv, std::string* val)
{
	const char* ptr = val->data();
	int len = (int)val->length();
	error_t err = kv_val_string(kv, (char**)&ptr, &len);
	if (err.is_error()) return err;
	val->assign(ptr, len);
	return error_success();
}

template <typename T>
error_t kv_val(kv_t* kv, std::vector<T>* val)
{
	int count = (int)val->size();
	kv_array_begin(kv, &count);
	val->resize(count);
	for (int i = 0; i < count; ++i)
	{
		kv_val(kv, &(*val)[i]);
	}
	kv_array_end(kv);
	return kv_error_state(kv);
}

And here is what it looks like to use these functions on a std::string or std::vector.

// Read or write a string
kv_key(kv, "book_title");
kv_val(kv, &my_string);

// Read or write a vector
std::vector<int> my_vector;
kv_key(kv, "my_vector");
kv_val(kv, &my_vector);

It seems like the biggest confusion for the API is read mode vs writing mode. Perhaps the best thing is to take Dirk's suggestion and write down separate functions for read/write. Implementing this optional API with the current API is trivial.

// Separated read and write functions - an alternative API.

int kv_read(kv, const char* key, int default_value)
{
	int val = default_value;
	kv_key(kv, key);
	kv_val(kv, &val);
	return val;
}

error_t kv_write(kv, const char* key, int val)
{
	kv_key(kv, key);
	kv_val(kv, &val);
	return kv_error_state(kv);
}

I wrote down an example of what I'm thinking for making kv_val use polymorphism for *both read and write*. Maybe it is clear this time how I am trying to use polymorphism for the common serialization case.

// Serializing an object, supports both read and write.

error_t serialize_object(kv_t* kv, object_t* object)
{
	kv_object_begin(kv);

	kv_key(kv, "x");
	kv_val(kv, &object->x);

	kv_key(kv, "y");
	kv_val(kv, &object->y);

	kv_key(kv, "data_vector");
	kv_val(kv, &object->data_vector);

	kv_object_end(kv);

	return kv_error_state(kv);
}

int write_to_disk()
{
	object_t* get_object_list();
	kv_t* kv = kv_make();
	kv_reset(kv, buffer, buffer_size, CUTE_KV_MODE_WRITE);
	while (object) {
		serialize_object(kv, object);
		object = object->next;
	}
	return kv_error_state(kv).is_error() ? -1 : 0;
}

int read_from_disk()
{
	kv_t* kv = kv_make();
	kv_reset(kv, buffer, buffer_size, CUTE_KV_MODE_READ);
	while (kv_has_data(kv)) {
		object_t* object = allocate_object();
		set_initial_params(object);
		error_t err = serialize_object(kv, object);
		if (!err.is_error()) return -1;
		add_to_object_list(object);
	}
	return kv_error_state(kv).is_error() ? -1 : 0;
}

@Dirk Gregorius Yep there's blob and vector support! :) 

Edited by Randy Gaul

Share this post


Link to post
Share on other sites
Posted (edited)
Quote

Yep there's blob and vector support! :) 

You probably do this already and I haven't checked your implementation, but if you use a simple union for your vector type note that you can use the 16 bytes of the vector for small strings. Meaning you only need allocations if strings are larger than this. 

Edited by Dirk Gregorius

Share this post


Link to post
Share on other sites
Posted (edited)
10 hours ago, Randy Gaul said:

The idea is if there's an unrecoverable error, all subsequent calls will immediately return the previous error.

This is part of the careful design you need to have which I mean. You can't immediately return, doing no other work, because the caller will expect certain things to happen after each method call, error or not. For example, after calling kv_array_begin(), they expect their array-length member to be updated. Your code would have to update it to zero, to prevent leaving it uninitialized and used by the immediately-following calls to kv_val() which expect to extract a now-undefined number of array members.

The only way to avoid needing to work extremely hard to make all your methods fail-safe for all possible invoking user code paths is to use the language tools that manipulate code paths; exceptions, callbacks (in the form of lambdas), and RAII types which will be safely deconstructed automatically at every scope transition, regardless of why it happened.

If you organize those well, you might even be able to make your parser recoverable, where e.g. an exception during deserialization of one object will skip the parser forward to the matching }, and potentially allow user code to catch the error, discard the offending object, and continue safely deserializing the document.

void my_container::convert(kv_t& stream) {
  stream.kv_array(my_vector.size(), [this](kv_arrt& target_array) {
    if( target_array.isReading() ) my_vector.clear();
    for(auto i = 0, n = target_array.size(); i < n; ++i) {
      try {
        // Cast included just to document we're expecting an int array here.
        // Your kv_val would have to push_back() the deserialized value, to extend the
        // vector to the new index length, since the [] operator cannot change vector length.
        target_array.kv_val( (std::vector<int>)my_vector, i );
      } catch( kv_error& exn ) {
        // Discard and ignore this array member. Try again with the next.
        n--; i--;
      }
    }
  });
}

In this example, if a non-integer is found in the document's array, kv_val() pitches a fit, but attempts to skip to the next well-matched "," (because it's kv_arrt::kv_val, which knows it's expecting an array element, which is always followed by a "," or "}"), and can continue parsing from there.

Edited by Wyrframe
typo

Share this post


Link to post
Share on other sites
1 minute ago, Dirk Gregorius said:

I haven't checked your implementation, but if you use a simple union for your vector type note that you can use the 16 bytes of the vector for small strings. Meaning you only need allocations if they are larger than this. 

This is a good suggestion.

Luckily for me though, Internally the string is not copied anywhere. I'm actually returning to the user a pointer inside the buffer they pass to kv, so no strings are copied around.

This does mean they would need to copy data out of the returned pointer if they want to keep it around. For std::string it's going to make its own copy, and all is well.

4 minutes ago, Wyrframe said:

This is part of the careful design you need to have which I mean. You can't immediately return, doing no other work, because the caller will expect certain things to happen after each method call, error or not. For example, after calling kv_array_begin(), they expect their array-length member to be updated. Your could would have to update it to zero, to prevent leaving it uninitialized and used by the immediately-following calls to kv_val() which expect to extract a now-undefined number of array members.

The only way to avoid needing to work extremely hard to make all your methods fail-safe for all possible invoking user code paths is to use the language tools that manipulate code paths; exceptions, callbacks (in the form of lambdas), and RAII types which will be safely deconstructed automatically at every scope transition, regardless of why it happened.

If you organize those well, you might even be able to make your parser recoverable, where e.g. an exception during deserialization of one object will skip the parser forward to the matching }, and potentially allow user code to catch the error, discard the offending object, and continue safely deserializing the document.


void my_container::convert(kv_t& stream) {
  stream.kv_array(my_vector.size(), [this](kv_arrt& target_array) {
    if( target_array.isReading() ) my_vector.clear();
    for(auto i = 0, n = target_array.size(); i < n; ++i) {
      try {
        // Cast included just to document we're expecting an int array here.
        target_array.kv_val( (std::vector<int>)my_vector[i] );
      } catch( kv_error& exn ) {
        // Discard and ignore this array member. Try again with the next.
        n--; i--;
      }
    }
  });
}

In this example, if a non-integer is found in the document's array, kv_val() pitches a fit, but attempts to skip to the next well-matched "," (because it's kv_arrt::kv_val, which knows it's expecting an array element, which is always followed by a "," or "}"), and can continue parsing from there.

Oh you bring up a good point about the array initializer. It should definitely be set to zero upon error. I'll have to think a bit more about all your RAII suggestions; thanks for posting them up! I appreciate it :) 

Share this post


Link to post
Share on other sites
Posted (edited)
7 hours ago, LorenzoGatti said:

There seems to be no support for null pointers. How would you serialize (and, more difficult, deserialize) a generic pointer-based tree like this, with arbitrary configurations of live and null pointers that need to be preserved faithfully?


template <typename T, int childCount> class TreeNode{
  std::vector<TreeNode<T,childCount>*> children; 
  T* data;
  }

Or, even worse, a union?


template <typename T, int childCount> class TreeNode{
	bool leaf;
  	union{
  		std::vector<TreeNode<T,childCount>*> children; 
  		T data;
  	}
}

 

This would happen just like serializing in JSON. The tree would have to be flattened somehow. This is a pretty well studied problem, so a quick google search for "serialize tree in json" got my a bunch of good results. Here's an example. So I don't think this is something kv couldn't handle, since this is handled in JSON quite commonly.

I don't support null value like JSON does, but this is just a minor detail. Not critical.

Union can be implemented in a variety of ways. One way is upon reading to use the type param in kv_key. Another way would be to serialize a string representing the type, and query it before your value. I’m sure there are other ways as well.

Edited by Randy Gaul

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!