Tools Dev - Handling file paths
As my second public blog post, bear with me as I learn how to share some of the things I have learned in tools development.
Yes, you have to think about it.
UI design is not the only issue of usability when it comes to tools. There are many unseen details which affect it just as much. Case in point: "Windows Notepad" is a very simple, quick tool for jotting down a couple of notes. But have you ever noticed how its undo system works? It is very simple, and yet it really throws one off. The first time you undo, it undoes your latest text entry. However, hit undo again, and it puts it all back! It undoes the undo, instead of undoing more.
I do plan on talking about undo systems soon. Today, I want to talk about another unseen detail - handling file paths in your tools and data files.
There are no details in tools development (any software really) that can be implemented without thinking about them. We all know this. Sometimes it can be surprising, however, what those details are. Even after many years, we can take very simple things for granted. How often do we consider file paths? I'm sure they exist, but I don't recall ever seeing an article or paper on this problem.
It can be a problem.
The issue is actually two-fold, with some details underneath.
- How to store them. Should they be relative or absolute? How do we know the difference?
- How to store them, again. But now, how do we support multiple languages? Unicode, right? So which encoding?
Keep in mind that our focus is mainly tools which create or manipulate game data. Therefore, it is important to keep in mind what will happen on multiple platforms in multiple countries, using software other than the tool to read the data. While this problem has been addressed again and again, there really is no one best solution. Opinions will vary on this - each with sensible arguments. When it comes down to it, though, it will ultimately depend on the application.
The Theory of Relativity
So let's address the first question. Should the paths be stored as relative paths, absolute paths, or perhaps support both? Again, this entirely depends on the application. We will look at 3 examples, each with an entirely different solution.
At work, our software captures a number of images from an external camera/microscope system. It may do measurements or analysis on these images, as well as any other type of processing to make the images usable. One can end up with a few small images totaling a few MB maximum, or one can end up with hundreds of thousands of very large multi-plane images totaling many GB. An additional requirement - thanks to the interesting process in determining what is "accepted" and what is "wrong" - is that all image data must remain completely uncompressed.
So, we store all data within a single data file which essentially incorporates its own file system (based on FAT32). All images, measurements, blobs/shapes, results, processes, camera settings, tables, and so on are stored in this data file. Everything is set to a fixed location (or 2 for historical purposes) so that software just knows where to find the various "files" it is looking for. There are paths, but they need not be determined - only checked for.
CodeSync (here) is a very specific tool for comparing two or more folder structures at a time. Naturally, it would not make much sense to store relative paths for this. All saved paths in CodeSync are absolute. If it can't find a path any more, it simply highlights it in red so that you know there is an issue. I do not think much else needs to be said here.
Syncing relative paths would be ... interesting!
Stickimator is an interesting example, as it is our main focus: Game data! Stickimator creates 2d bone-based animations, and therefore has a fair amount of both internal data (bones, frames, etc) and external data (texture images for skinning, background reference images). It uses a main data file containing internal data combined with relative paths to the external data.
The first issue is about internal and external data. Why not just put it all into a stickimator data file? And the answer is very simple: you may wish to reuse textures.
As for the second issue - relative paths, let's imagine a complex scenario. Our super awesome game will have a main data file with all its resources. However, we also want it to be able to override with external files on disk for fast iterative development. We also want to be able to mod our super awesome game. Our mods should follow the same pattern: a main data file with all resources which can be overridden by external disk files.
Now our super awesome game has 4 possible search paths when running a mod. It will check, first, the disk path from the mod itself. Then it might check the mod's main data file to see if the path exists within it. Then it may check the main game folder. And finally it might check the main game's data file for the resource. This can only be done with relative paths.
As an additional proof, even when just sending your stickimator data file + texture resources to someone else, the paths to the texture resources will have to be relative or it likely won't work anywhere but your own computer. Imagine your coworker: "F: drive not found? I don't even have an F drive! Why is it looking for an F: drive to load these textures?!" Oops!
For now, we do not need to get too far into the details of internationalization. We will simply decide which unicode encoding to use. Our major options here are UTF-8, UTF-16, or UTF-32. Many environments (Java, Windows, Python...) were built at some level or another to support UCS-2 rather than UTF-16 traditionally, but for the most part each has resolved this in its own way.
And therein lies the problem. Did you see that? In its own way.
Win32 favors UTF-16. It also supports basic ascii, but not UTF-8 directly. Linux handles UTF-8 most naturally and UTF-32 in second place. Mac OSX seems to be UTF-8 then UTF-16, though I have no experience there yet. Java uses UTF-16 though at higher levels supports most any encoding. And the list goes on and on...
At work, we needed to at least support Japanese and English. The entire 20 year old codebase used win32 ASCII functions, so we had to convert the whole source to the wide versions. Because our software is Windows only and very niche, it made the most sense to us to just save our new paths and other text data as UTF-16 strings. This is great and works very well for us, but were someone to want to read our data files on, say, a linux system, they would now have some extra work on their hands, and their application would probably have an additional dependency.
Some games may choose to use UTF-32 because of its simplicity. Especially if your game does a lot of text processing and/or searching, UTF-32 makes things easier because their is no ambiguity as to number of characters vs number of double-words. One double-word equals one character! It is very simple. As long as you have code that supports this (reading file systems, displaying text, accessing internet addresses, etc) you get to do less thinking, and concentrate on something else. The only real drawbacks are limited support if you don't already have code that handles it, and the fact that often your text data will take up much more space than they would otherwise. Between 2x and 4x in the average case.
In Stickimator, after much back-and-forth, I have resolved to use UTF-8. Again, it is not the only "right" answer, but it won in my own mind in the end. Although the application is Windows only right now, it is intended on creating data that should be easily read in any environment. Because my own second priority is linux, UTF-16 becomes a bit uncomfortable. While win32 doesn't directly support UTF-8, it does include a couple functions for converting to/from UTF-8 and UTF-16, so conversion on loading/saving is very straightforward. Besides this, most file paths' characters tend to be in the basic latin character range, so the majority of our strings will only use 1-byte UTF-8 characters, thus saving much space. Since OSX also supports UTF-8 very easily, this is win-win. I also didn't have to change the actual file format when adding UTF-8 support to Stickimator. Now it's win-win-win!
Of course, a full unicode library such as ICU will handle most of your problems. You can pick whichever encoding you want then. Still, if your data file will be read by other people, or if your tool is not part of a closed asset pipeline (ie it is not for just one specific game), you will need to consider the above factors and figure out what truly makes the most sense.
In regards to unicode encoding - there is one more thing that you may catch you off guard. If, for some reason, you include any textual data directly in your code, you will actually have to consider your compiler and how it handles unicode text data! Take this interesting line of code, for example:
mTimeL = make_shared<GuiLabel>(CONTROL_ID_STATIC, 230, 3, 10, 18, this, L"≈", GdiFont::GUI_FONT, MakeDelegate(this, &GuiFrames::OnDefaultGui));
In MSVC, you have to make this code a wide-string literal. If you assume it will put a UTF-8 string into a normal string literal, you are wrong. I was wrong when I assumed this! The compiler will whine and complain, and when you run your program only the 1st (or is it the last?) byte in each UTF-8 character will be present.
In GCC, it will gladly take this string literal without the 'L' and gladly include all of the utf-8 characters' bytes.
Can you find the unicode squiggly?