Sign in to follow this  
Angelic Ice

Loading file paths and how to access them

Recommended Posts

Hello forum!

Quick disclaimer, I think this topic is language agnostic but whenever I refer to programming concepts (classes, objects, instantiating etc.), they will refer to C++.

How are you handling fixed file paths for your software/game? E.g., you want to load a texture, how do you tell your load-system where to find all textures?

I do not want to hardcode my paths, so I would be happy to hear some more flexible approaches. Especially moving them to a file that contains all of them. The actual issue is more on how to pass them around.

Do you instantiate one class at the very beginning and simply access your "file-path"-file? Do you pass this around? Did you consider working with globals for this? Or do you simply instantiate the object again and again wherever it is needed?

I'm not really sure if I want them to be part of the dependency injection, but that is after all just a weird feeling. It might be the only sane choice in the end.

Thanks for your time!

 

 

Share this post


Link to post
Share on other sites

Oh, interesting.

But I do not really understand what a BlobLoader is after all.

My exact problem is that during prototyping, I called my file-loaders like this:

load("asset/texture/" + texture_name)

But I now I would like to change this to:

load(textures + texture_name)

I usually pass dependencies via constructor already : )

 

So, the BlobLoader implements the abstract behaviour of loading to the factory, is that right to assume?

m_myMaterial = sys.m_materialFactory.Load( AssetName("myMaterial.mat"), m_levelScope );

Let me attempt to assume the further loading-procedure, if the object is not cached. There is probably a moment, where the factory will call "load" of its LoadBlob, and the LoadBlob pretty much implemented a custom behaviour.

If the LoadBlob is nothing but a packed set of data, it will search through it after content whose name is identical to the given hash. It might also be only another loading-call to the OS, providing an already implemented file-path-prefix.

This sounds really neat, would have to restructure my code though, I have a namespaced file with commonly used file-system calls to the OS, e.g. Load-File-Content, which felt pretty KISS to me. But it requires a full file-path from the initially calling class.

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

There is actually quite a bit written about that sort of loader that hodgman described if you want to look it up.  Right now I only load textures via stb_image.h and text files.  I am in the future going to do as hodgman describes so it's easier to package my assets up for distribution and for on multiple systems (mobile, console, PC/Mac, etc) without having to do much more than mentioning the filename and by extension it handles it magically for me.

Different OS's have different rules for file locations though.  On PC I keep mine under the default directory (GetDirectory()) under assets\textures\some_texture.png or assets\map\some_map.txt, etc. But I know you're suppose to save, save files to a certain directory to comply with Windows rules (though nothing is currently stopping you from doing differently).

Two options I can think of for you off the top of my head:

1) Pass in a list of file locations to look in, and run through it until you find the asset or fail.  So you would get a directory "assets\fonts\" and append the filename to that.

2) Pass in the list like before but tell it what each location handles extension wise to limit searches.

Share this post


Link to post
Share on other sites

Hard-coded asset paths aren't necessarily a problem if they are considered paths within an abstract asset database rather than a filesystem. The default implementation may well be a filesystem, but as a trivial improvement it's very common to have a virtual filesystem that automatically treats zip files (or other archives) as directories, or one where you explicitly mount zip files (to avoid ambiguity of the automatic approach). It's also common to have multiple targets available for a given path with different priorities (e.g. so that a new asset can be tested locally before committing it into the built data archive). This can be useful for more than art assets (e.g. configuration files).

Share this post


Link to post
Share on other sites
There is actually quite a bit written about that sort of loader that hodgman described if you want to look it up

I tried finding it, what is the actual name of this loading-concept? Blob Loader gave me no results.

Thanks everyone! This gave me a new point of view onto the loading-process, I will dig deeper into this, too. :)

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

But I do not really understand what a BlobLoader is after all.
So, the BlobLoader implements the abstract behaviour of loading to the factory, is that right to assume? Let me attempt to assume the further loading-procedure

Sounds like you understand :)
Yeah my Blob Loader is an abstract interface that you can give a name to, and it gives back an array of bytes. When paired with a factory, you can convert a name into some kind of useful object instead of just bytes.

For my simplest blob loader, you pass a base path into the constructor and it reads files in that directory. Other implementations could read files out of a ZIP archive, or read files over TCP/IP (that last one is very useful when developing a game for another platform - console/mobile/etc)

Share this post


Link to post
Share on other sites

Ah! So the byte-array is probably to generalise a type that all Blob-Loader can translate their "loaded" objects to. Using so many high-level-libraries makes it feel a bit weird to pass around byte-arrays. I'm really anxious about reinterpret_casts, haha :')

Just out of curiosity, are there any other ways to pass these loaded objects back to the factory? I mean, if I could make sure that every content/object loaded by every Blob-Loader is deriving from another base-class, I guess that could work. Nonetheless, I really don't want to and possibly cannot rely on being permitted to edit my libraries.

Share this post


Link to post
Share on other sites

I totally agree my previous writers in what they wrote but have some kind of completition. For the purpose of going to multiple platforms I integrated a set of static functions that are inside the storage namespace handling some common type functions like creating/searching/moving/deleting files/folders and resolve special path requests for Application Path, Root Path and User Data where each path is equalized to use forward slashes everywhere (Windows doesnt has a problem with that) so query the application path may result in

C:/Program Files/myTest //Windows
/data/data/myTest.myCompany.com //Android

In the next step I set root directory (working directory in Windows) to something where my assets are and could then call

Storage::File::Open("./MyAsset", Storage::File::Open)

Finally anything in my environment uses streams instead of plain byte arrays where MemoryStream wrapps arround a byte array so they may be used too. Any stream is based on a general interface IDataReader/IDataWriter that supports Reading/Writing of a single byte, array of bytes and Position/Length properties where one specialization is StreamReader that wrapps arround another IDataReader for substream reading like in a package.

InFileStream ifs("./MyAsset", Storage::File::Open);
if(ifs)
{
   //DoStuff
   while(!ifs.Eof())
   ;
}
if(ifs)
  ifs.Close();

Is the standard way to open-read a file from disk in my case.

For production code it is totally ok to use direct file paths that come from an entry point file that you need. Otherwise your program wouldnt know what file to load and in the end where to start from. This entry point may be an asset package or an .ini file. That totally depends on your game environment.

This is a common technic used by for example The Elder Scrolls games to load the games content (Oblivion.bsa asset package as example) that may be replaced my modders to wipe change the game completely.

When you have got processed your asset entry point you need to give that to the asset manager for loading levels, main screen, whatever. I personally prefer asset hashes too as Hodgman already wrote because they have some advantages for protecting your assets (modders wouldnt now so fast what kind of file it is and what it is used for) and saving memory to use 4 bytes instead of a large indefined byte length string.

Again in production you could use strings or a file table hash <=> path conversion to get your assets loaded.

At least I would recommend to use some kind of internal managed asset storage (e.g. a package). I use an aligned, signed and encrypted custom format on this action but you may also use .zip for starting purposes. My format is build from 64 kb aligned chunks with an algorithm to minimize fragmentation to speedup asset reads from disk so the package handler contains a small filetable in its head that points into the chunk and offset. A stream finally needs to jump to position (chunk * 64kb + offset) and read N bytes equal to the file size.

Finally my asset reader functions are also static functions that take an instance of the abstract IDataReader (or sometimes also IDataWriter) interface putting out processed data like a texture from image or a mesh that may be handled however depending on the engine

Share this post


Link to post
Share on other sites

Ah, that is similar to what I'm doing at the moment. I have no interfaces up yet, but consider doing so.

The issue I have is that the e.g. loading is often implemented by other libraries. Therefore, they will return a type that pretty much is ready-to-use. It feels weird to transform such to an e.g. byte-array and then cast it back to its original type. It feels like a weird overhead. I will need to do some benchmarks on how expensive this can become.

But I'm really thankful for your ideas and concepts that definitely changed my point of view on this topic overall.

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

I totally agree my previous writers in what they wrote but have some kind of completition. For the purpose of going to multiple platforms I integrated a set of static functions that are inside the storage namespace handling some common type functions like creating/searching/moving/deleting files/folders and resolve special path requests for Application Path, Root Path and User Data where each path is equalized to use forward slashes everywhere (Windows doesnt has a problem with that) so query the application path may result in

C:/Program Files/myTest //Windows
/data/data/myTest.myCompany.com //Android

In the next step I set root directory (working directory in Windows) to something where my assets are and could then call

Storage::File::Open("./MyAsset", Storage::File::Open)

Finally anything in my environment uses streams instead of plain byte arrays where MemoryStream wrapps arround a byte array so they may be used too. Any stream is based on a general interface IDataReader/IDataWriter that supports Reading/Writing of a single byte, array of bytes and Position/Length properties where one specialization is StreamReader that wrapps arround another IDataReader for substream reading like in a package.

InFileStream ifs("./MyAsset", Storage::File::Open);
if(ifs)
{
   //DoStuff
   while(!ifs.Eof())
   ;
}
if(ifs)
  ifs.Close();

Is the standard way to open-read a file from disk in my case.

For production code it is totally ok to use direct file paths that come from an entry point file that you need. Otherwise your program wouldnt know what file to load and in the end where to start from. This entry point may be an asset package or an .ini file. That totally depends on your game environment.

This is a common technic used by for example The Elder Scrolls games to load the games content (Oblivion.bsa asset package as example) that may be replaced my modders to wipe change the game completely.

When you have got processed your asset entry point you need to give that to the asset manager for loading levels, main screen, whatever. I personally prefer asset hashes too as Hodgman already wrote because they have some advantages for protecting your assets (modders wouldnt now so fast what kind of file it is and what it is used for) and saving memory to use 4 bytes instead of a large indefined byte length string.

Again in production you could use strings or a file table hash <=> path conversion to get your assets loaded.

At least I would recommend to use some kind of internal managed asset storage (e.g. a package). I use an aligned, signed and encrypted custom format on this action but you may also use .zip for starting purposes. My format is build from 64 kb aligned chunks with an algorithm to minimize fragmentation to speedup asset reads from disk so the package handler contains a small filetable in its head that points into the chunk and offset. A stream finally needs to jump to position (chunk * 64kb + offset) and read N bytes equal to the file size.

Finally my asset reader functions are also static functions that take an instance of the abstract IDataReader (or sometimes also IDataWriter) interface putting out processed data like a texture from image or a mesh that may be handled however depending on the engine

Very nice answer ... thanks

Share this post


Link to post
Share on other sites

Another issue to keep in mind when designing a file system API is whether you want it to be blocking or asynchronous.

The typical C-style interface of fopen/fread/fclose is terrible for game asset loaders, because they block the CPU while they're operating. Filesystem interactions can be very expensive operations (milliseconds), so you don't want the CPU to grind to a halt while they occur.

Likewise you don't really want to use an API that returns a fully constructed object in one go, as this kind of design also requires the use of blocking, e.g.
Texture* myTexture = load("myTexture.tex");

Of course, this depends on what you want from your loading screens. If you're ok with having static loading screens while loading assets, then blocking is fine... but if you want to have smoothly animating loading screens, or no loading screens (world streaming), then you need an asynchronous asset loader.

I break my asset loading up into 4 stages -- measuringallocatingloading and parsing.
When you load an asset, you immediately get back an Asset*, which has a boolean member function IsLoaded. You can query that function to tell if it has finished all 4 stages yet or not.
In the background, the blob loader determines the file size (measuring) and then calls a function on the factory to allocate enough memory to store the asset data (allocating). The blob loader then starts streaming the asset data into the allocation that was returned from the factory (loading). Once that streaming is complete, the blob loader calls a factory function to let it know that the file has been loaded and can now be deserialized (parsing). After that has completed, the Asset is marked as loaded.

On Windows, you can use the CreateFile (with FILE_FLAG_OVERLAPPED flag) / ReadFileEx (with OVERLAPPED structure) / CloseHandle functions to open a file and read its contents asyncrhonously. Even though these functions are themselves async, they can sometimes still block for several milliseconds when loading many files at the same time, so I also take the precaution of calling them from a background thread.

The issue I have is that the e.g. loading is often implemented by other libraries. Therefore, they will return a type that pretty much is ready-to-use. It feels weird to transform such to an e.g. byte-array and then cast it back to its original type. It feels like a weird overhead. I will need to do some benchmarks on how expensive this can become.

When you read data out of a file, it's always a byte array to begin with, because files themselves are just byte arrays...

Typically libraries have two ways to load their objects from a file -- either you give them a filename and they do all the OS file loading work:
fancy_object = library_load_object_from_file("filename");
Or they let you do the file loading and they do the deserialization:
bytes = read_entire_file("filename")
fancy_object = library_load_object_from_memory(bytes)
free(bytes)

Or they define an abstract "stream" interface and then you implement it, which is just a fancy spin on the second option.

It may seem that reading an entire file into memory and then decoding it is very wasteful compared to decoding+reading at the same time.
e.g.
file_handle = open_file("filename")
int foo = read_int(file_handle)
int bar = read_int(file_handle)

vs
bytes = read_entire_file(file_handle)
int foo = *(int*)bytes+0
int bar = *(int*)bytes+4

However, disks do not like to stop and start. Transferring data from disk is slow, but starting a transfer is extremely slow. Because of this, most engines will always try to keep the disk active with large file reading operations, instead of issuing many small file reading operations. Reading a whole file into memory before processing any of it is usually faster than mixing reading/decoding tasks together.

There's also a very advanced form of asset storage that many game engines use, where you use the exact same data structures in memory and on-disk (i.e. no serialization/deserialization steps). If you can achieve that, then loading data is just a typecast, which is free:
bytes = read_file(file_handle)
MyStruct* foo = (myStruct*)bytes;//no decoding step  :o

This kind of thing is very complex though, as it means you can't use pointers at all inside your structures, have to be hyper aware of padding and alignment issues in your structures, and very careful with data types. It's typically only used in low-level engine asset structures, and not as a general purpose scheme for high level game files.

Share this post


Link to post
Share on other sites
You must be careful about loading data directly into memory. Many data types must be properly aligned. Some data types on some systems will crash if they aren't properly aligned. Others on other systems may still run but with a performance penalty.

Typically 32-bit values like int or float must be on 32-bit boundaries. 64-bit values must be on 64-bit boundaries, the same for SIMD data of 128-bit, 256-bit, or 512-bit sizes.

Byte arrays can be manipulated at any offset, so the easiest (albeit slow) method is to create the raw data type with the compiler automatically aligning it, then using bitwise operations to put it in place. For example:

int foo = (buffer[0] << 24) | (buffer[1] << 16) | (buffer[2] << 8) | buffer[3];

It is faster but more work to ensure your data is completely aligned correctly in advance, then create a buffer, find the proper offset to match the expected alignment, then load the data to the properly aligned position within the buffer. If you can't be sure then you'll need to pack and unpack your data during serialization.

Share this post


Link to post
Share on other sites

Therefore, they will return a type that pretty much is ready-to-use. It feels weird to transform such to an e.g. byte-array and then cast it back to its original type. It feels like a weird overhead. I will need to do some benchmarks on how expensive this can become

 

I think Hodgman still wrote again anything to the conversion topic :wink:

Performance is something you shouldnt care about yet but if your environment is able to run multithreaded just stay away from the overlapped IO stuff and go into memory mapped files instead. This creates a memory page in the OSes holy private RAM and returns a pointer to that location depending on whatever type of access you like (read in this case would be enougth I think)

Taking that pointer and e.g. wrap it into a MemoryStream lets multiple threads access the same data at the same time.

I use a mix of memory mapping chunks from the asset package for something like scenes and buffered streams to the harddrive in a multithreaded task and event based fashion to speed things up. Needs a little synchronisation like look if a chunk was loaded into a mapped location that fits the position and size requested, otherwise look if it is a small asset and a stream traverse could fit it or else open a mapping for this and maybe the next chunk.

Share this post


Link to post
Share on other sites

Thank you!

I have one more question about accessing the textures, as I have assets that are part of the main game, assets that have been created by a user (and encapsulated in their own custom/level_x/assets/texture file path) and assets that got added via updates.

Do you prioritise customised content, if a custom-level is being played? Because if the player decided to name one texture "wall.png" - coincidentally being a match to an actual texture of the main game - there is a chance of loading the texture from the main game instead of the intended one.

It is probably a better approach to give the derived LoadBlob type all possible texture-paths and maybe a priority-predicate? Then, custom-paths would have priority.

The only ugly moment is when both textures would be used.

This is how a file describing a game-object looks like at the moment:

custom_object_example =
{
texture = "wall.png"
-- other values
}

Are there any common solutions to this?

Edited by Angelic Ice

Share this post


Link to post
Share on other sites

Prioritising custom content is a common solution.

Allowing custom configuration of asset loading priority is another solution. (e.g. An ordered list of directories or archives.)

I can imagine a system where a mod's config file contains a list of content sources, e.g. In my 'Kylomod' package I might have an assets.cfg file, like:

asset_paths = {
    "/": "kylomod/base_asset_overrides.zip",
    "/kylomod": "kylomod/data.zip"
}

The first line says "anything in kylomod/base_asset_overrides.zip is assumed to be located at the root of the asset directory, and would therefore override built-in data. The second line says "anything in kylomod/data.zip" is located at /kylomod, and so won't clash with anything else (probably).

This is working on the assumption that this mod has its paths added at a high priority to the asset loader system.

That's just one way of trying to accommodate both overrides and new data in a single system; I'm sure there are others.

Share this post


Link to post
Share on other sites

Instead of explicit mount points, it might be better to use absolute paths directly in the packages and assume they're all mounted at the root. This allows user-defined content to unambiguously reference both its own assets and external assets (i.e. provided by other packages and/or the base game) without requiring any additional configuration.

Share this post


Link to post
Share on other sites

Sounds simpler to me! But also rather prone to break. What if the level-name-changes? I doubt the player would rename all their classes. Seems to be a dependency that should not exist, solely from a standpoint of user-friendliness?

Share this post


Link to post
Share on other sites

Changing asset names (which is essentially a remove/add as far as any versioning scheme is concerned) is prone to breakage regardless of the loading approach you use. It's often much better to just create a new asset and deprecate the old one, rather than remove anything. This is especially true if there's user content you don't necessarily control.

Share this post


Link to post
Share on other sites

Not entirely sure what you mean. If the user decides to rename their level, I would have to iterate through all their asset-files and change all file-paths.

What do you mean by "create a new asset"? Seems weird to change or create new assets because of an altered level-name.

I probably misunderstood something, though.

Share this post


Link to post
Share on other sites

Allow me to clarify:

As the game developer, you're providing the base set of assets available to all user content. So it's risky to remove any of the base assets, because there could be any number of published user mods that depend on the removed content, and they would break. If the base game needs to fix or change an asset, you're better of creating a new version and allowing the old one to remain so user mods don't break outright. If user mods record which version of the game they're compatible with, then you can easily display a warning to the user when the game is updated but a mod hasn't been republished and could be using old assets.

User content on the other hand doesn't generally depend on other user content (unless such a dependency is explicitly stated), so they can usually remove/rename assets without much fear of breakage. Since they own the content, they can also fix up any references. You as the developer aren't responsible for this.

Share this post


Link to post
Share on other sites

Oh, I totally agree with your point on "do not delete deprecated assets from the core-set".

But in general, I find it weird that renaming the level-name (which also would change a file-path, e.g. root/creator_name/level_name/[..]) would kill asset-loading - it would break literally every customised asset. Assets, at least in my opinion, should not depend on an absolute file-path. When I create a level, create a ton of assets and link to them in the new created entities and then decide to change the level-name from a WIP-name to one that is worth publishing, it would be an enormous punishment to me.

Also, what happens if I want to use customised assets from another customised level? I would have to fix all these file-paths.

It is surely a way to deal with this by saying "it is not my responsibility", but not one that I would like to take.

Share this post


Link to post
Share on other sites
This is what a namespace hierarchy is for.

You can set up paths like /game/asset_group_name/asset_name for stuff that belongs to a broader scope than just one "level."

For things that are highly customized, you can do /game/level_name/asset_name or even /game/package_name/level_name/asset_name or what have you.

Use the implicit hierarchy of your path structure as an organizational tool, and if you plan ahead a little, you won't need to sacrifice modularity or flexibility.

Share this post


Link to post
Share on other sites

Do you prioritise customised content, if a custom-level is being played? Because if the player decided to name one texture "wall.png" - coincidentally being a match to an actual texture of the main game - there is a chance of loading the texture from the main game instead of the intended one.

That's a feature question, not a technical question :)
If, as a feature, you want users to be able to replace game textures, then sure make a layered file system where the loader searches through each set of assets in turn. If you don't want that feature, then let assets explicitly specify which set of assets they will be loaded from.

e.g. On one game that I worked on, they wanted to allow modders to change some base textures but not others, so they used a whitelisting system. Any asset not on the whitelist would always load from the official asset pack, while whitelisted names would check the user's directory first and the official asset pack second.

What if the level-name-changes?

Your mileage may vary with this one, but I've made the decision on our latest engine to get rid of file paths completely on the engine side :)
The artists work with source art files that they can store however they like (e.g. a Maya scene). They then export them into intermediate file formats (e.g. an FBX or collada model), again in any folder structure that they like... Then the engine's data compiler takes all the files required by the game and converts them into custom file formats. At the same time, it loses all information about file paths and just retains the filenames.
This means that each filename has to be unique -- e.g. if an artist creates level1/concrete.tga and level2/concrete.tga then the tool complains about this situation with an error message. On the plus side, if that artist renames level1 to level3, then the engine doesn't care -- the file is still just called "concrete.tga" in all situations.
We've been using this system for about 4 years now and we still like it :D

Share this post


Link to post
Share on other sites

Oh, that sounds interesting.

But also a bit more complex, to create a custom file format and omits file-path-logic. I think I will look into this in a separate project : )

At the moment I implemented a derived "LoadBlob"-type that takes multiple paths with a priority-value. Custom-paths will be added with priority 1 and everything else with 2. That should work for now. There will only be a small overhead for custom-levels, because default users tend not to create their own assets but use original ones for a fair amount of time : )

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this