C++ Loading config files, with extend/inheritance and full error handling

Started by
9 comments, last by Zipster 8 years, 8 months ago

Most of the times I have wanted to load some sort of config file (be it simple key values, JSON, XML, whatever) I tended to have to deal with the type conversions, validations and trying to report useful errors myself and generally this also left a lot of error cases out with people I am working with struggling to understand what they did wrong when things did not quiet work.

I also don't really like the fact that as well as defining the data struct/class itself, I then need this other bit of code that needs to be perfectly synced with it, with all the fields and their data types a second time.

e.g. I may have some code like below.

In the first simpler case, as well as having to implement this template wrapper around the basic file loading lib to get some basic errors, some errors are just not detected, e.g. if the file contained keys/elements I did not expect, they are slightly ignored. And in the second case, that simple logic does not even guarantee that some property was defined at all anywhere in the inheritance chain, leaving it upto the objects constructor to specify useful defaults for everything or post-validation to detect invalid "default" values.

Is there any better way?

In java Ive seen this sort of thing done via attributes and reflection to be able to write a generic loader that can use a Java type's public setters and attributes for extra validation/etc info. But trying to think of ways to do something like that in C++ I just ended up with a mess of magic macro's trying to set up template functions and lambda's without even getting as far as how to best handle embedded collections like VehicleType::weapons.

I know I could manually add things to the below code like checking the list/set of keys/elements in KeyValues against a "expected_members" constant, and for inheritance I could similarly keep track of a "set_members" collection, but that is yet another thing to manually keep track of (e.g. at the very least another copy of all the member names for each object/struct to keep synced).

//Basic without inheritance


//... for all basic types read from config
static const char * ConfigTypeName<bool>::name = "boolean";

//basic version, via std::stringstream since most other solutions seemed to
//either miss error cases or threw exceptions around
//return false on error
template<class T> bool parseString(const std::string &str, T *out);
//Special versions for some types, e.g. "true" and "false" literal strings to bool
bool parseString(const std::string &str, bool *out);
bool parseString(const std::string &str, MyEnum *out);
//templated getter, either for some own simple std::map thing, or wrapping the xml/json/whatever lib
//key is just whatever string was previously used, so can format an error message to throw
//e.g. "'10.5' for 'MyTankgun/Mass' in 'Data/Units/HeavyTank.txt' is not a valid integer"
//Plus lots of overloads/variations to deal with optional items, restrictions like min and max, etc.
template<class T> T KeyValues::get(const std::string &name);

void loadVehicleType(VehicleType *obj, const KeyValues &kv)
{
    obj->cost = kv.get<int>("cost", 0, MAX_ITEM_COST);
    obj->myfloat = kv.getRanged<float>("myfloat", 1.0f, 1000.0f);//1.0f <= myfloat <= 1000.0f
    for (auto weaponKv : kv.getSubList("Weapons"))
    {
        obj->weapons.push_back(loadWeaponType(weaponKv));
    }
}

With inheritance example


void loadVehicleType(VehicleType *obj, const KeyValues &kv)
{
    auto extends = kv.get("extends");
    if (extends) loadVehicleType(obj, KeyValues(extends));
    
    kv.getOpt<int>("cost", &obj->cost, 0, MAX_ITEM_COST);
    kv.getOpt<float>("myfloat", &obj->myfloat, 1.0f, 1000.0f);
    auto weapons = kv.getSubList("Weapons");
    if (weapons)
    {
        obj->weapons.clear();
        for (auto weaponKv : weapons)
        {
...

Advertisement

If I understand you correctly, you need a schema for your config file, another file that defines a list of all the expected fields and their types, and if necessary the structure. XML has its official schema definition file, while for JSON you may need to create one your own. Your config loader will read the schema first, and store the expected fields and types in a separate map.

When you load your config files, then comes the error handling. I would recommend raising the errors as early as possible, rather than populating it with default values that may not even work. Before it even reaches the `loadVehicleType`, KeyValues should only contain valid values, structures, and types. If you want to enforce some min-max range, I would put that check in the object's constructor, as it pertains to application logic.

So your step looks like this:

1. Load schema

2. Load config files

3. Check config files against the schema (field's existence, type checking, and structure)

4. Pass values to object constructors

5. Object validates for acceptable values.

Alternatively, you can also put range validation in the schema (step 3), if the schema supports it. Although, I wouldn't do that personally. For example, if I have a field "gravity = -2.8" in my config file, I would just accept that as is. Shit are going to be all over the place, but I am not going to dictate whether that is a good or bad thing. The solution to that problem is simply "well, don't put -2.8 for gravity".

Well, you can skip worrying about types if you just use overloads. In your second code snippet, you somewhat pointlessly explicitly specify a template parameter, when the compiler could just as well figure it out automatically. Also, if the only template parameter is the type of the variable you want to write to, there is no need for templates in the first place (though it can make for a useful default).

void getOpt(const char* key, int* value, int min, int max) { parse as int }
void getOpt(const char* key, int* value, int min, int max) { parse as float }

Personally, I'd add one more layer to only overload some "parse as X" function and keep the rest a template. Then you can always just call

template<typename T>
void getOpt(const char* key, T* value, T min, T max) {}

getOpt("x", obj->x, ..);
getOpt("y", obj->y, ..);

Which is also the first step to introduce black metaprogramming magic that would allow for things like

for (each element in config file)
    getOpt(elementName, object);

This would automatically assign the value of an element of name X to the member of object that is also named X and automatically use the overload for the type of that member. As a bonus, for any types you didn't define overloads for, you could automatically recurse into the object (and config file element) to handle nested constructs.

While that level of automatism can be nice and convenient, it's also a major pain to debug if things go wrong. See this thread as an example of "just because you can, doesn't necessarily mean you should".

f@dzhttp://festini.device-zero.de

I also don't really like the fact that as well as defining the data struct/class itself, I then need this other bit of code that needs to be perfectly synced with it

Fixing this the way you want now has a huge cost in terms of code you'll have to maintain and understand to fix any new issues. Anything goes wrong, you won't have just two or three places to check but a whole system.

So, I can suggest just these things:

  • [generic advice] Optimize for process, not automation. Automation is only a tool, process is what you have to do. The easier it is to define the rules to make code work, the easier it is to ensure correctness of code.
  • [specifically] In each struct you want to read from a configuration, write Load/Save member functions, with implementation in the header. Anything goes wrong, you can check if any parameter is out of sync within ~100 densely packed lines of code, which is very easy to do.
  • [ideas] Use templated functions (with T& input/output parameter) to automate conversion function selection. In code, you'd only see a list of LoadParam/SaveParam calls which could be resolved to type-specific ones automatically and could be replaced with other code if the type-based generic version doesn't fit. Wrap these calls with a macro to reuse member name both as parameter name string and to access the struct member.

As for error handling, you're free to do it in the loading function, I'm not quite sure what's the issue. You may write error messages to the log immediately or use a "loading context" class to store the error message(s) for later processing. If some parameters are required to exist in the configuration, add a parameter to loading function to mark them as such etc.


Well, you can skip worrying about types if you just use overloads. In your second code snippet, you somewhat pointlessly explicitly specify a template parameter, when the compiler could just as well figure it out automatically. Also, if the only template parameter is the type of the variable you want to write to, there is no need for templates in the first place (though it can make for a useful default).

Yeah, I guess I wasn't too consistent there. The "parseString" function that does the actual conversion work is overloaded with a template for the via std::stringstream case, the template "getX" just wrap it, e.g. something like


template<class T> T KeyValues::get(const std::string &name)
{
    T tmp;
    auto value = get<std::string>(name);
    if (parseString(value, &tmp))
    {
        return tmp;
    }
    else
    {
        std::stringstream err;
        err << "'" << value << "' for '" << name << "' in '" << getFileName()
            << "' is not a valid " << ConfigTypeName<T>::name << ".";
        throw std::runtime_error(err.str());
    }
}

I guess adding a schema as well is one option, although I found in the second inheritance case this is not a full solution, least the way I have seen it done before. At work (so no code this time) we have some XML config files, that may "extend" some other XML config file. The result of this is in the XML schema nearly every element and attribute had to be optional, and then it loads them recursively like in my example above. So thinking about it, this does give the basic type and range validation (with just about understandable errors) but we have had issues with it falling into that uninitialised field case, because neither the schema or the recursive loader ensure every field was specified at least once.

Although this may just be us missing something in XML schema? An idea was to use XSLT to do the extend/merge, then have another schema to validate post-merge, but did not follow it through because meant had another set of schemas to maintain, a complex XSLT to maintain and post-merge error messages were hard to understand for people editing the files because they went and tried to find a non-existing line number and such.

There is already plenty of XML library that supports schema. If I were you, checking out those libraries before writing your own validators/schema loader would be my first stop.

And it looks like XML schema supports range:

http://www.w3schools.com/schema/schema_facets.asp

At work (so no code this time) we have some XML config files, that may "extend" some other XML config file. The result of this is in the XML schema nearly every element and attribute had to be optional, and then it loads them recursively like in my example above.

Trying to maintain inheritance of all these XML config sounds like a nightmare. I would treat each of those XML independent of each other even though one may inherit from another. For example, I can probably do something like this:


KeyValues kv = loadXML('schema/first.xml', 'config/first.xml');

loadXML will throw an exception if config/first.xml does not follow the schema. Then you can have:


kv.merge(loadXML('schema/second.xml', 'config/second.xml'));
kv.merge(loadXML('schema/third.xml', 'config/third.xml'));
...
loadVehicleType(vehicle, kv);

Where `merge` writes over any existing fields on kv with values from the new XML. The code inside loadVehicleType assumes that all fields are already valid. You just do kv.get("cost"), kv.get("foo"), etc.

You get the idea.

Without knowing your project and your actual use case, I can't really make an exact recommendation, so take this with a grain of salt.

I think he meant that a required field can be defined in any of the extending files, so he can't use individual schemas for all of them, as almost every field is effectively optional when you look at separate files.

You should probably look for an XML library that can run schema validation on an existing key-value object, instead of during file parsing - that way you can load all the files, merge them, and then validate the result. I don't know of any myself (I prefer JSON), but I suppose there should be plenty of those.

Take a look at something like CodeSynthesis, LMX, or XmlPlus. Given an XSD file, they generate the matching C++ object model, complete with serialization code. You no longer have to worry about all that mundane boilerplate, and it's not too difficult to integrate into a build process.

XmlPlus has the least restrictive licensing of the three, so you'll probably want to start there. We use an in-house solution that does essentially the same thing as these tools, except it only needs to handle input serialization and is limited to the precise featureset we need.

Hi.

Just been looking in to some file formats my self and SQLite is looking good. And Tutorials.


I think he meant that a required field can be defined in any of the extending files, so he can't use individual schemas for all of them, as almost every field is effectively optional when you look at separate files.

Yes, so while in that project we do have schema's and one of the Java things to validate and load an XML into a generated class, nearly every element and attribute is "optional" in the schema, and there is no effective validation of file inheritance/extension related things.


ake a look at something like CodeSynthesis, LMX, or XmlPlus. Given an XSD file, they generate the matching C++ object model, complete with serialization code. You no longer have to worry about all that mundane boilerplate, and it's not too difficult to integrate into a build process.

Wasn't aware of XmlPlus. I suppose may be able to get LGPL to work, if it is practical to make it into a self contained DLL without any inclusion of the main code. I think a commercial licence would be a hard sell in most of the cases I have run into this issue though, since only a small number of custom data files/schemas. Also doesn't look like they solve the inheritance/extension issue.


Just been looking in to some file formats my self and SQLite is looking good. And Tutorials.

Is there any compelling reason to use SQL as the primary data definition format? I have used SQL in many context's, but only as something for the software to manipulate. It seems a big ask for people (often non-programmers) to define content directly in SQL, especially since you end up with a fairly nasty foreign key structure for when things contain any kind of list/collection.

This topic is closed to new replies.

Advertisement