"Blueprint" Serialization Format

Engines and Middleware Programming Scripting Management

Started by Shaarigan September 04, 2019 07:58 AM

5 comments, last by Shaarigan 4 years, 7 months ago

Shaarigan

1,471

Author

September 04, 2019 07:58 AM

Hey all,

we are currently discussing about different data formats to serialize a visual data structure similar to Unreal Blueprint and I wanted to get more input from other people to get a better decision. We are currently refactoring our build-tool to provide a more flexible way of defining configurable pipelines or pipeline pieces and decided for a visual node based approach. Our nodes are of different type performing different actions and have their desired input/ output pins like in this example

1841511728_NodeExample.png.a90014cf3df3733e5a897657c68cd49c.png

Some pins (including output) are optional, most of the input pins have to be connected. Those processors are used in the build-tool only for now, there will be other pipelines in the future we want to configure in the same way, and are connected automatically at runtime by an Actor Approach System that scans the in- and output pins of every pre-defined processor to connect them to a task graph. This way we will keep the flexibility we currently have but gain much more control over edge cases.

Because we aren't set to a single programming language, there will be some kind of compiler that is intended to process the result of our configuration and either create source code for a specific language (like C# for the tooling or C++ for the engine) or straight compile it into a plugin assembly that can be loaded from the build-tool. From this reason and because we want to keep the possibility to write those files without graphical tools, we need a grammar that is capable of handling the complexity of those node graphs but is at the same time as simple (and not binary) enougth to be editted on disk using a text editor.

I already looked at the most common formats like

JSON seems to be getting very complex and the minified form our code produces isn't very readble
XML has been removed from the list because I think that it's grammar is too old-school and parsing is more complex than JSON
YAML, TOML and AXOM seems to be an alternative because it is slim and simple but parsing it is a nightmare and I don't realy like the strict whitespace handling

So long story short, does anyone of you know of an already existing "language" that would fit our needs or do you have any idea for a slim but powerfull grammar?

Thanks in advance!

LorenzoGatti

4,648

September 06, 2019 04:06 PM

You seem concerned about non-problems and unconcerned about basic issues.

I've never heard about AXOM, but for all the other structured text formats you mention there are good libraries that parse them to very convenient data structures in any decent programming language. Complex parsing is a solved problem; you should only worry about complex and error-prone manual editing, and from this point of view XML with advanced editors becomes a good choice.
Your files are probably small enough to allow carefree use of easy to use parsers that load a whole file to a fancy generic object model, with no need for the extra effort of streaming file content or building frugal specialized data structures to save memory.
As soon as you allow arbitrary user input (as opposed to editing operations allowed by your node-based GUI editor) you need to validate task definitions, regardless of syntax.
I don't think visual editing of nodes and wires and textual editing are compatible authoring approaches. Given that a build pipeline or the like tends to have few and complex nodes and nothing interesting to show graphically, reducing the usefulness of a visual editor, and if editing task definitions by hand is important (why?) you should probably sacrifice node-based visual editing.
For example you could use strictly automatic layout (or, more simply, read-only graphical reports of what a task definition does) to avoid polluting the task definitions with irrelevant graphical details and automatically generated meaningless identifiers.
What's challenging to design in your task definition language is the abstract syntax (what building blocks? What data should they require and produce?) and semantics (e.g. where do tasks look for files? Is the language Turing-complete, or otherwise too powerful?), not the concrete syntax.

Omae Wa Mou Shindeiru

Shaarigan

1,471

Author

September 06, 2019 08:13 PM

3 hours ago, LorenzoGatti said:

I've never heard about AXOM, but for all the other structured text formats you mention there are good libraries that parse them to very convenient data structures in any decent programming language.

I never asked for a library to do the task, I know how to write those parsers on my own, that is not the problem here.

I don't want to use XML, JSON or YAML because those have decent drawbacks in their grammar. XML is so old (even if widely used) that closjng tags seem to be outdated to me from a user perspective. JSON is very formatting oriented, it suits well if you want to write easy simple configuration files but not for those complex ones. YAML when going into the user scope is not suiteable as well because of the whitespace counting. So we decided against all of them.

3 hours ago, LorenzoGatti said:

Given that a build pipeline or the like tends to have few and complex nodes and nothing interesting to show graphically, reducing the usefulness of a visual editor, and if editing task definitions by hand is important (why?) you should probably sacrifice node-based visual editing

This is just a very shortsighted view. Both ways of editing can fit well into each other and are valid to live along each other as well. We are creating a software for managing a game project so different kinds of people with different technical and non-technical background tend to work with that system.

Having a visual editor for the pipelines helps non-developer contributors like artists to introduce their creations to the project without the need to ask the developers for assistance every time. We worked in professional software development for too long to not take other departments into account in case of a bigger project. Our designers want to define behavior for our software as same as artists want to optimize workflows and building. On the other hand, a visual editing allows for a straight overview of a decent pipeline in a way a pure text format could never provide.

Text editing of the pure definition files is on the other hand important as well because we as developers are convinient with configuration files of all kind. We fix small issues in Visual Studio projects, maintain our Yarn building or host automatic building on a server side task so it is more than important to have a file format that could be easy opened, edited and saved to save time and experience. Many people struggle on writing make, ninja or other build files and we think that those tools should be easy to use without taking too much time to configure for every new project.

We could achieve exactly that by providing a well defined set of tools/ tokens to for example add a new resource to the project that a generic processor would otherwise ignore. This usecase was first needed when we wrote some low-level assembly code to support swapping of execution contexts in our tasking library. Those files were ignored by the C++ processor by design so we needed to tell it that in this project are also some files that have to be in the build but different for each platform and architecture.

This is the reason we build our forge system that does most of the task by its own without the need of a human to manage and watch every step. Our tool is inspired by Unreal who's developers created UBT, a tool that could be configured by adding C# classes to the project btw.

3 hours ago, LorenzoGatti said:

What's challenging to design in your task definition language is the abstract syntax

The syntax itself is quite easy, you have to provide the type of token, the token ID and parameters you set along with each connection to determine where input comes from and output has to be delivered to. The issue here is to find an easy grammar to provide these information in a way that is on the same time readable and rich of information without the need of another special code editing tool to not loose the overview

AtomicWinter

September 06, 2019 09:33 PM

I don't understand why you're so keen on not using a library to handle the parsing for you? Why would you want another set of code that you and your team have to maintain versus using a battle tested library that another team maintains? Also, that eliminates the concerns you have with the difficult parsing?

If you're still keen on reinventing the wheel then it's still hard to say. I say that because you state "...the minified form our code produces..." and we don't know what that means or your pipeline. I don't know why you don't have a visual editor, whatever that is, that not only displays a visual representation of the file, but a textual representation as well. Then your pipeline can do whatever you want it to do and not have to worry about editing the raw files themselves.

With that being said, and with a DB structure out of mind since you mentioned no binary, what about something like a CSV file? It a flat file and you can easily edit it with Microsoft Excel, OpenOffice, etc. Again, without knowing your pipeline I'm not sure how this would be minified as it may corrupt the file since it relies on line breaks.

Either way, best of luck on your project.

JTippetts

13,307

September 07, 2019 04:26 AM

I wrote a node graph tool for visually assembling noise functions some time back, and the graphs are saved as Lua tables. The syntax of a table in Lua is quite easily editable by hand, can be neatly prettified, and you can even do trickery such as embedded executable code.