Cross platform filepath naming conventions

Started by
10 comments, last by Servant of the Lord 11 years, 8 months ago
Hi, I'm trying to create a set of guidelines for my cross-platform project concerning filepaths and filenames,
I startd with Boost Filesystem portabillity guide, but I have some additional questions.

First: I mostly only care about Windows XP and Later, Mac OSX and later, and the most common flavors of Linux.
Second: What about Android and iOS limitations? Anyone know (without violating any NDAs) about XBox Arcade limitations?

Is Windows XP still limited to a max path of 260 characters? What about Windows Vista and beyond?

Amongst the common OSes I mentioned, is there any limitation on the size of a single filename or directory name? What is the maximum compatible filename or directory name size?

Is there any limitation still on the "depth" of the folders? Am I still limited to 8 folders deep on Mac OSX, WinXP, or major Linux distros? I can go to at least 15 on Windows 7, just glancing at my current directory.

[size=2]For fun:
[size=2]D:\Jamin\Programming\Projects\AdventureFar\GameClient\Game\Data\World\Areas\Tests\MyTestArea\(-1,2)\Floor\Floor 0\Layers

Can I really not have spaces in my path? I can seriously only have 0-9, a-z, A-Z, '_', and '-'?
On the common OSes mentioned above, I can't cross-platformly use < >, ( ), or [ ]?
What about commas, am I banned from those if I want to be cross-platform?

Any help here is much appreciated! Most of the cross-platform filepath articles I find and trying to keep me cross-platform to OSes from before 2000 or so, which doesn't concern my project.
Advertisement
Windows has a limit of 32*1024 characters in wide character names. But to access paths longer than MAX_PATH(=260, which includes the terminating nul), you have to use either the \\?\ prefix for paths on mapped drives, or \\?\UNC\ for paths on network shares. The \\?\ and \\?\UNC\ prefixes are generally harmless on paths shorter than MAX_PATH.

32K * 2 bytes is not a particularly friendly size for an array on the stack, so in most cases I tend to just ignore the limit and use a resizable string or similar for storage (wrapped in a path class).

These prefixes also allow filenames to be created that the Windows shell typically doesn't like e.g. "aux.txt", "com1.jpg", anything ending with a dot or space, etc. In fact, some of the Windows API functions will mess with your path if you omit the prefix. For example, if you call CreateFileW with a path such as L"C:\\hello.", it will attempt to open L"C:\\hello" (without the trailing dot). The \\?\ prefix must be used to force Windows to open the file truly requested: L"\\\\?\\C:\\hello.". It's a bit of a mess, especially since the shell functions that open dialog boxes will happily return L"C:\\hello." without a prefix, meaning you have to check all such incoming files.

There are other restrictions on names such as illegal characters "<>:\"/\\|?*" or any code unit less than 0x20. There is a separate list for illegal characters in UNC server names: "~!@#$^&()=+[]{};',<>:\"/\\|?*" or any code unit less than or equal to 0x20 i.e. spaces aren't allowed in UNC server names though they are elsewhere.

Windows path APIs typically accept both '/' and '\\' as valid segment separators. Strictly speaking the \\?\ prefixes are supposed to change the meaning of forward slashes, but I haven't observed anything going wrong there. Nevertheless, I always normalize them to back slashes when constructing path objects.

Be careful with things like "C:\..\abc.txt" as some APIs will silently assume you really meant "C:\abc.txt".

Windows paths don't specify a Unicode normalization form for paths, so if you create a file from e.g. L"\x00C0.txt" (UTF-16, NFC) and L"\x0041\x0300.txt" (UTF-16, NFD) you will end up with two files that appear identical in explorer (for example), but are actually separate files.

Other special cases include:

  • "D:abc.txt" really means the file "abc.txt" in the current working directory of the D drive. The lack of slash after the colon gives it this interpretation.
  • "\abc.txt" really means the file "abc.txt" in the root directory of the drive for the current working directory.
  • These two forms are not allowed as far as I can tell after the "\\?\" prefix.

For path comparisons on Windows things are a little tricky. Here I'm defining a path as a route through the file system hierarchy to get to a path, thus making two hardlinks to the same file different paths, for example.

Assuming we're sticking with case-insensitive NTFS volumes, the sanest way I've found to compare paths is to use ChrCmpIW(), looking at corresponding UTF-16 code units in the two paths. All other string comparison functions I've tried fall down in one way or another due to the way they perform internal normalization or are locale dependent, or other such things. I suspect CompareStringOrdinal() does the right thing, but given that it's not available on Windows XP, I didn't explore that option.

At a deeper level, each NTFS volume has a hidden file called $UpCase that contains the mapping for lower to upper case for that volume. If you have files on the same volume and that volume exists and you want truly accurate case insensitive comparisons, you could possibly use that if you can get to it. I feel this would be rather extreme, though.

There's also the fact that under the POSIX subsystem on Windows, paths are treated in a case sensitive fashion.

Then there's non-NTFS volumes, paths on shares that might not even be Windows machines, and so on.

Essentially, path comparisons will always be somewhat fragile.

For OS X, things are a little bit easier, but not much smile.png

Here, all characters except '/' are allowed in path segments, including funny things like '\n'. Stay away from using ':' in paths if possible as that can sometimes confuse the Finder due to Mac OS 9 baggage. Backslash, '\\', really does mean a backslash character. It's not a separator under any circumstances.

OS X ensures all file names are in Unicode normalization form D. This can result in some odd effects when sharing files between Windows machines; you can copy a file from Windows to OS X and back to Windows again and end up with two files next to each other that appear to have identical names!

For basic case insensitive path comparisons, Google for Apple's FastUnicodeCompare function. Since OS X used to ask users at installation time whether or not they wanted a case sensitive volume, there are instances of both case sensitive and case insensitive volumes in the real world, so FastUnicodeCompare may not always be applicable.

You'll also need to ensure the strings are in NFD before calling that function. I have found it a good policy to immediately convert to NFD when constructing paths on OS X.

Some of this information comes from this page on MSDN, the rest from scraps on the web and experimentation. I've written path handling stuff three different times now (twice for different jobs, once for home use) so it's pretty deeply wedged in to my brain smile.png FWIW, I have convinced myself over the years that robust path handling must be done with path objects, rather than strings. Paths, especially on Windows just have too many invariants to maintain consistently with unchecked string manipulation.

For the other platforms you mention, I don't know so much. I suspect Linux is rather similar to OS X but it wouldn't surprise me if there are a number of crucial differences.

If you'd like, I can upload my fs library somewhere for further perusal.
All of that complexity is why a lot of games go with a single data file approach. That is you create a single file, containing all game data, which works like (and often is) a zip file. There are cross platform libraries like http://icculus.org/physfs/ to handle reading zip files.

Of course storing files like that isn't very convenient during development, so you'll also want to support reading the files directly from the filesystem on at least one platform.

You may also find it useful to support more than one zip file, it makes patches easier for example, but the main thing is that you end up with a very small number of files, each with a simple file name that will work on any platform you support.

Here, all characters except '/' are allowed in path segments, including funny things like '\n'. Stay away from using ':' in paths if possible as that can sometimes confuse the Finder due to Mac OS 9 baggage. Backslash, '\\', really does mean a backslash character. It's not a separator under any circumstances.

I just want to add to this, because OS X is a little weird in this regard. In the Finder, you can give a filename with a '/' character in it, but it get's converted to a ':' character. So if you name a file "a/z" it will show as "a/z" in the Finder, but from the Terminal it will show as "a:z".

Moral of the story: just don't even try to use "/" or ":" in file/folder names.

I'd say more, but edd said it all, pretty much.
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
I know you can use spaces in file/folder names in XP (I mean, a system installed folder is "Program Files" after all), and with Linux/OS X (which is also a UNIX) you can use spaces (I know on most Linux distros you can even use , \b, \r, \n, etc, though it's generally frowned upon since it makes displaying file names fussy). As far as I can tell commas are fine (I just named a folder "Browsers, things"). Colons (':') are not, since Windows uses them for basically resource forks and some OSes use them for path separators (I don't know if any of them are still used today, but Boost::filesystem deals with them.).

What are you planning on using the long paths for? Is it something that could be better served using a file format (world info) or an archive (static game info)? Or do you just want to make sure you can write saved games to your game's data folder?

Edit: Note to self: refresh thread before replying...
...too much stuff to go through in one sitting....

Wow, thanks for that write-up; it contains alot of the kind of information I'm trying to understand. Some of it is really surprising, like the "D:abc.txt" and "\abc.txt" - both those meanings are new to me, and seem likely to accidentally occur in day-to-day programming.

I did a poor job of explaining myself: I'm currently using boost::filesystem for the actual OS interaction.
Boost gives a list of "cross platform" recommendations, but it seems exceedingly strict for modern OSes, so I'm wondering where I can ease up the restrictions if I only include things made in the past 10 years.

Since I'm using boost::filesystem, I don't need to directly deal with the nitty-gritty details of Windows or Mac specific function calls (thankfully!), but my concern is about my folder structure and filenames, and where I might have complications in the future in copying the entire data directory of my game onto a Mac or Linux machine for when I make a Mac or Linux port. I would like, as much as possible, that the layout of my datafiles remain constant between Mac and Linux and Windows, for ease of porting and also so player mods of my game work cross-platform. I would rather not ship and release the Windows version and then three months after that, when I complete the Mac port, find out I need to change the whole folder structure and filenames of critical data files, and then have to update the Windows version to match the new minimum requirements of both Mac and Linux (ruining any player mods, as well as just being a nuisance), and then go through that again in another three months when I port to Linux.

I understand there will be inevitable complications when porting from Windows to Mac or Linux, but I'd like to do what I can ahead of time, to at least reduce the complications, and getting the file system as stable as possible would greatly assist that.

My biggest concern at the moment, is I have folders filled with sub-folders with names in the format: "(x,y)"
My recent understanding is that you can't use ( or ), or even commas, on Linux machines (right?), so now I have a problem! Better to catch it now, then after I ship for Windows.


All of that complexity is why a lot of games go with a single data file approach. That is you create a single file, containing all game data, which works like (and often is) a zip file. There are cross platform libraries like http://icculus.org/physfs/ to handle reading zip files.

That's a great idea, and I would do that... but for one thing: Modding
My game is very very modable, and I do expect users to make use of that. My game's build-in level editor is (untestedly) cross platform, even.
If a user places a file in a special mod folder that follows the same folder structure as the game, the game will make use of the file instead of the game's original file, allowing users to do things like completely replace or alter the game's graphics, change NPC dialog, make fan-made translations, create their own levels or enemies, or even make an almost entirely new game - all without having to damage or replace the original data files of the game.

I suppose I could have the game data be entirely contained in one zip file, and then each mod be another zip file. It's definitely something to ponder...

What are the limitations of filenames and folder structures for .zips? I see there is a total limit of 4-ish GB, which isn't a problem, and to use forward slashes '/' instead of backslashes, which I already do. The file name, comment, and meta-data combined cannot exceed 65,535 bytes, which is plenty... But are there any limitations on symbols used? I can't see anything in the zip file format documentation about that. Filenames are optionally UTF-8 or the MSDOS ASCII, either of which work fine for me (the filenames don't need to be translated).
I just don't know what limitations there are on filenames. Can I use commas and parentheses and exclamation marks in zip filenames? Other than the 'use forward slashes', I don't see any information on the subject.

If I use zips, I'll let physFS handle it, but I still need to know how to name my files before I zip them up, and with what name to use when loading data from the zip.

[quote name='Firestryke31']What are you planning on using the long paths for? Is it something that could be better served using a file format (world info) or an archive (static game info)? Or do you just want to make sure you can write saved games to your game's data folder?[/quote]
Static game data (world, dialog, etc...) and dynamic data (config files, game saves), but all modifiable by the end-user.
At a basic level, users might just play around with the map editor a little, but at a more advanced level, they'd be manually adjust config files and creating new files in their own folder structure that mirrors the game's folder structure.

So yes, I suppose a archive file would really be the best option. One for the game, one for each mod or expansion.

My recent understanding is that you can't use ( or ), or even commas, on Linux machines (right?), so now I have a problem! Better to catch it now, then after I ship for Windows.


I'm pretty sure that I have seen/used both ( ) and , in linux paths.
I don't think round brackets -- '(' and ')' -- should be a problem anywhere. Same goes for commas.

I think the suggestion of using pack files to distribute data is a good one, with additional search paths for mod content.

Zip files support UTF-8 if a bit is set in one of the headers. However (sigh) many versions of Windows (through to Vista, I think) do a poor job of unzipping such files if characters outside of the ASCII subset are used in path names.
Out of interest, if you're aiming for cross-platform why not go for the most conservative approach? Obviously you need to figure out what the most conservative approach is, but it seems safest. Do you perhaps have automated tools that output assets with a particular filename structure that would make that cumbersome?

Would it perhaps make sense to have a function/method which creates the filenames based on enum values, asset ids, etc? That would at least mean you'd have the code side covered, you could easily change the filename format if you discovered issues. Of course renaming and moving the files would be a pest, but that could be handled with a tool.

Out of interest, if you're aiming for cross-platform why not go for the most conservative approach? Obviously you need to figure out what the most conservative approach is, but it seems safest. Do you perhaps have automated tools that output assets with a particular filename structure that would make that cumbersome?

I'm trying to go with the lowest common denominator of rules, but I don't know what the rules are.
No automated tools or anything.
Would it perhaps make sense to have a function/method which creates the filenames based on enum values, asset ids, etc? That would at least mean you'd have the code side covered, you could easily change the filename format if you discovered issues. Of course renaming and moving the files would be a pest, but that could be handled with a tool.[/quote]
Well, I need users to being able to alter things if they want, and IDs for filenames would make it hard for users to understand what is referencing what.
Most of my filenames are perfectly fine, but a few have symbols in it, like the (x,y) folders I mentioned.


I'm pretty sure that I have seen/used both ( ) and , in linux paths.


Really? Someone told me they are used for a special purpose in linux, and that some linux tools choke on them. QtCreator's QMake (a linux tool ported to Windows) messes up with ( ) for example.
If my program can handle it, fine. But does Linux's filesystem allow those symbols or will it cause unexpected issues later on? What about Mac OSX with ( ) and ','?

This topic is closed to new replies.

Advertisement