Loading file paths and how to access them

Started by
25 comments, last by Kylotan 6 years, 11 months ago

I totally agree my previous writers in what they wrote but have some kind of completition. For the purpose of going to multiple platforms I integrated a set of static functions that are inside the storage namespace handling some common type functions like creating/searching/moving/deleting files/folders and resolve special path requests for Application Path, Root Path and User Data where each path is equalized to use forward slashes everywhere (Windows doesnt has a problem with that) so query the application path may result in


C:/Program Files/myTest //Windows
/data/data/myTest.myCompany.com //Android

In the next step I set root directory (working directory in Windows) to something where my assets are and could then call


Storage::File::Open("./MyAsset", Storage::File::Open)

Finally anything in my environment uses streams instead of plain byte arrays where MemoryStream wrapps arround a byte array so they may be used too. Any stream is based on a general interface IDataReader/IDataWriter that supports Reading/Writing of a single byte, array of bytes and Position/Length properties where one specialization is StreamReader that wrapps arround another IDataReader for substream reading like in a package.


InFileStream ifs("./MyAsset", Storage::File::Open);
if(ifs)
{
   //DoStuff
   while(!ifs.Eof())
   ;
}
if(ifs)
  ifs.Close();

Is the standard way to open-read a file from disk in my case.

For production code it is totally ok to use direct file paths that come from an entry point file that you need. Otherwise your program wouldnt know what file to load and in the end where to start from. This entry point may be an asset package or an .ini file. That totally depends on your game environment.

This is a common technic used by for example The Elder Scrolls games to load the games content (Oblivion.bsa asset package as example) that may be replaced my modders to wipe change the game completely.

When you have got processed your asset entry point you need to give that to the asset manager for loading levels, main screen, whatever. I personally prefer asset hashes too as Hodgman already wrote because they have some advantages for protecting your assets (modders wouldnt now so fast what kind of file it is and what it is used for) and saving memory to use 4 bytes instead of a large indefined byte length string.

Again in production you could use strings or a file table hash <=> path conversion to get your assets loaded.

At least I would recommend to use some kind of internal managed asset storage (e.g. a package). I use an aligned, signed and encrypted custom format on this action but you may also use .zip for starting purposes. My format is build from 64 kb aligned chunks with an algorithm to minimize fragmentation to speedup asset reads from disk so the package handler contains a small filetable in its head that points into the chunk and offset. A stream finally needs to jump to position (chunk * 64kb + offset) and read N bytes equal to the file size.

Finally my asset reader functions are also static functions that take an instance of the abstract IDataReader (or sometimes also IDataWriter) interface putting out processed data like a texture from image or a mesh that may be handled however depending on the engine

Very nice answer ... thanks

Advertisement

Another issue to keep in mind when designing a file system API is whether you want it to be blocking or asynchronous.

The typical C-style interface of fopen/fread/fclose is terrible for game asset loaders, because they block the CPU while they're operating. Filesystem interactions can be very expensive operations (milliseconds), so you don't want the CPU to grind to a halt while they occur.

Likewise you don't really want to use an API that returns a fully constructed object in one go, as this kind of design also requires the use of blocking, e.g.
Texture* myTexture = load("myTexture.tex");

Of course, this depends on what you want from your loading screens. If you're ok with having static loading screens while loading assets, then blocking is fine... but if you want to have smoothly animating loading screens, or no loading screens (world streaming), then you need an asynchronous asset loader.

I break my asset loading up into 4 stages -- measuring, allocating, loading and parsing.
When you load an asset, you immediately get back an Asset*, which has a boolean member function IsLoaded. You can query that function to tell if it has finished all 4 stages yet or not.
In the background, the blob loader determines the file size (measuring) and then calls a function on the factory to allocate enough memory to store the asset data (allocating). The blob loader then starts streaming the asset data into the allocation that was returned from the factory (loading). Once that streaming is complete, the blob loader calls a factory function to let it know that the file has been loaded and can now be deserialized (parsing). After that has completed, the Asset is marked as loaded.

On Windows, you can use the CreateFile (with FILE_FLAG_OVERLAPPED flag) / ReadFileEx (with OVERLAPPED structure) / CloseHandle functions to open a file and read its contents asyncrhonously. Even though these functions are themselves async, they can sometimes still block for several milliseconds when loading many files at the same time, so I also take the precaution of calling them from a background thread.

The issue I have is that the e.g. loading is often implemented by other libraries. Therefore, they will return a type that pretty much is ready-to-use. It feels weird to transform such to an e.g. byte-array and then cast it back to its original type. It feels like a weird overhead. I will need to do some benchmarks on how expensive this can become.

When you read data out of a file, it's always a byte array to begin with, because files themselves are just byte arrays...

Typically libraries have two ways to load their objects from a file -- either you give them a filename and they do all the OS file loading work:
fancy_object = library_load_object_from_file("filename");
Or they let you do the file loading and they do the deserialization:
bytes = read_entire_file("filename")
fancy_object = library_load_object_from_memory(bytes)
free(bytes)

Or they define an abstract "stream" interface and then you implement it, which is just a fancy spin on the second option.

It may seem that reading an entire file into memory and then decoding it is very wasteful compared to decoding+reading at the same time.
e.g.
file_handle = open_file("filename")
int foo = read_int(file_handle)
int bar = read_int(file_handle)

vs
bytes = read_entire_file(file_handle)
int foo = *(int*)bytes+0
int bar = *(int*)bytes+4

However, disks do not like to stop and start. Transferring data from disk is slow, but starting a transfer is extremely slow. Because of this, most engines will always try to keep the disk active with large file reading operations, instead of issuing many small file reading operations. Reading a whole file into memory before processing any of it is usually faster than mixing reading/decoding tasks together.

There's also a very advanced form of asset storage that many game engines use, where you use the exact same data structures in memory and on-disk (i.e. no serialization/deserialization steps). If you can achieve that, then loading data is just a typecast, which is free:
bytes = read_file(file_handle)
MyStruct* foo = (myStruct*)bytes;//no decoding step :o

This kind of thing is very complex though, as it means you can't use pointers at all inside your structures, have to be hyper aware of padding and alignment issues in your structures, and very careful with data types. It's typically only used in low-level engine asset structures, and not as a general purpose scheme for high level game files.

You must be careful about loading data directly into memory. Many data types must be properly aligned. Some data types on some systems will crash if they aren't properly aligned. Others on other systems may still run but with a performance penalty.

Typically 32-bit values like int or float must be on 32-bit boundaries. 64-bit values must be on 64-bit boundaries, the same for SIMD data of 128-bit, 256-bit, or 512-bit sizes.

Byte arrays can be manipulated at any offset, so the easiest (albeit slow) method is to create the raw data type with the compiler automatically aligning it, then using bitwise operations to put it in place. For example:

int foo = (buffer[0] << 24) | (buffer[1] << 16) | (buffer[2] << 8) | buffer[3];

It is faster but more work to ensure your data is completely aligned correctly in advance, then create a buffer, find the proper offset to match the expected alignment, then load the data to the properly aligned position within the buffer. If you can't be sure then you'll need to pack and unpack your data during serialization.

Therefore, they will return a type that pretty much is ready-to-use. It feels weird to transform such to an e.g. byte-array and then cast it back to its original type. It feels like a weird overhead. I will need to do some benchmarks on how expensive this can become

I think Hodgman still wrote again anything to the conversion topic :wink:

Performance is something you shouldnt care about yet but if your environment is able to run multithreaded just stay away from the overlapped IO stuff and go into memory mapped files instead. This creates a memory page in the OSes holy private RAM and returns a pointer to that location depending on whatever type of access you like (read in this case would be enougth I think)

Taking that pointer and e.g. wrap it into a MemoryStream lets multiple threads access the same data at the same time.

I use a mix of memory mapping chunks from the asset package for something like scenes and buffered streams to the harddrive in a multithreaded task and event based fashion to speed things up. Needs a little synchronisation like look if a chunk was loaded into a mapped location that fits the position and size requested, otherwise look if it is a small asset and a stream traverse could fit it or else open a mapping for this and maybe the next chunk.

Thank you!

I have one more question about accessing the textures, as I have assets that are part of the main game, assets that have been created by a user (and encapsulated in their own custom/level_x/assets/texture file path) and assets that got added via updates.

Do you prioritise customised content, if a custom-level is being played? Because if the player decided to name one texture "wall.png" - coincidentally being a match to an actual texture of the main game - there is a chance of loading the texture from the main game instead of the intended one.

It is probably a better approach to give the derived LoadBlob type all possible texture-paths and maybe a priority-predicate? Then, custom-paths would have priority.

The only ugly moment is when both textures would be used.

This is how a file describing a game-object looks like at the moment:


custom_object_example =
{
texture = "wall.png"
-- other values
}

Are there any common solutions to this?

Prioritising custom content is a common solution.

Allowing custom configuration of asset loading priority is another solution. (e.g. An ordered list of directories or archives.)

I can imagine a system where a mod's config file contains a list of content sources, e.g. In my 'Kylomod' package I might have an assets.cfg file, like:


asset_paths = {
    "/": "kylomod/base_asset_overrides.zip",
    "/kylomod": "kylomod/data.zip"
}

The first line says "anything in kylomod/base_asset_overrides.zip is assumed to be located at the root of the asset directory, and would therefore override built-in data. The second line says "anything in kylomod/data.zip" is located at /kylomod, and so won't clash with anything else (probably).

This is working on the assumption that this mod has its paths added at a high priority to the asset loader system.

That's just one way of trying to accommodate both overrides and new data in a single system; I'm sure there are others.

Instead of explicit mount points, it might be better to use absolute paths directly in the packages and assume they're all mounted at the root. This allows user-defined content to unambiguously reference both its own assets and external assets (i.e. provided by other packages and/or the base game) without requiring any additional configuration.

Sounds simpler to me! But also rather prone to break. What if the level-name-changes? I doubt the player would rename all their classes. Seems to be a dependency that should not exist, solely from a standpoint of user-friendliness?

Changing asset names (which is essentially a remove/add as far as any versioning scheme is concerned) is prone to breakage regardless of the loading approach you use. It's often much better to just create a new asset and deprecate the old one, rather than remove anything. This is especially true if there's user content you don't necessarily control.

Not entirely sure what you mean. If the user decides to rename their level, I would have to iterate through all their asset-files and change all file-paths.

What do you mean by "create a new asset"? Seems weird to change or create new assets because of an altered level-name.

I probably misunderstood something, though.

This topic is closed to new replies.

Advertisement