• Create Account

### #ActualHodgman

Posted 31 March 2012 - 11:52 PM

(Slightly) harder for me to work with
Increased save times when modifying the file (which the game won't do, but I will in my editor, so it's a con for me, though users won't experience this)

In my engine, I only use packs/archives for retail/shipping builds. In development builds, each asset is stored in a separate file in the data directory. When making a shipping build, the data directory is "zipped" into an archive, and the code is compiled to use a different asset-loading class.
This lets me have the benefits of archives on the end-user's machine, while having the benefit of easy content iteration during development.

IMO, I find working with "files" much harder than working with assets. Yes, in development my assets are stored as files, so when I say Load("foo.texture"), that does turn into CreateFile("data/foo.texture", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)...

However, because I never think of these assets as "files", I'm free to change the behind the scenes behavior. Maybe I'll look in the OS's file system first, and then in a patch archive, and then in the shipping archive. Maybe I'll pre-load a chunk of an archive that contains several assets at once, then when the user asks for any of those assets, I've already got the data sitting there, etc...

Also, because I don't think of them as files, I don't author them as files. I never copy files into the data directory, and I never use file->save as to create any of the data files.
Instead, we have a content directory, which does contain files, and a data-compiler, which scans the content directory for modifications, compiles any modified files, and writes them into the data directory. This means that for example, if I want to change my textures to use a different DXT compression algorithm, or I decide that materials should be embedded into level files, then I change the data-compiler's rules, and the data directory can be recompiled.

The association between an asset name (e.g. "foo.texture") and the content file path (e.g. "d:\myProject\content\foo.png") aren't hard-coded; our compilation routines are written in C#, the build steps described in Lua, and regular expressions are used to find a suitable file to use as input when building an asset.
e.g. The following Lua script tells the asset-compiler that:
* If the asset "foo.geo" is required, then use the GeometryBuilder plugin (a C# class) and load "temp/foo.daebin" as input.
* If "temp/foo.daebin" is required, then use the DaeParser plugin and search the content directory recursively for "foo.dae" (a COLLADA model).
local DaeParser = Builder("DaeParser")
Rule(DaeParser, "temp/(.*).daebin", "$1.dae") local GeometryBuilder = Builder("GeometryBuilder") Rule(GeometryBuilder, "data/(.*).geo", "temp/$1.daebin")

These kinds of data compilers can also be used to ensure that only the data that's actually used by the game ends up in the data directory. Instead of compiling every file inside the content directory, we only compile the required files. We start off with the hard-coded asset-names used in the game's source code (ideally this number is quite small), then we find the linked assets (e.g. a material links a texture and a shader), and repeat until we've got a full dependency tree.
Another neat feature you can add to a system like this is asset-refreshing -- the data-compiler is already scanning for file modifications to rebuild new data, so when it re-builds a file it can check if the game is currently running, and if so, send a message to the game instructing it to reload the modified asset.

In the industry, every company I've worked for the past 6 years has used some kind of automated asset pipeline like this, and I just can't imagine going back to manually placing files in the game's data directory -- to me, it seems like a lot more of a hassle

Faster read times? (I've heard it can help not thrash the hard drive so much, but is this really much of a concern today on modern operating systems, and does it really help a significant amount?)

Assuming a non-SSD drive, it can give a significant reduction in loading times. With 1000 seperate files, the OS can keep each one defragmented, so that each individual file load can be done without wasteful seek periods, however, the OS doesn't know the order in which you want to load all of those files, so you'll pay a seek penalty in-between each file and likely won't benefit from automatic pre-caching.
If you pack all the files end-to-end, in the order in which you want to load them, you can spend more time reading and less time seeking.

As for modern OS's helping out, either way, make sure you're using the OS's native file system API (e.g. on windows, CreateFile/ReadFileEx instead of fopen/fread). By using these API's you can take advantage of modern features like threadless background loading (DMA), file (pre)caching or memory mapping.

### #6Hodgman

Posted 31 March 2012 - 11:50 PM

(Slightly) harder for me to work with
Increased save times when modifying the file (which the game won't do, but I will in my editor, so it's a con for me, though users won't experience this)

In my engine, I only use packs/archives for retail/shipping builds. In development builds, each asset is stored in a separate file in the data directory. When making a shipping build, the data directory is "zipped" into an archive, and the code is compiled to use a different asset-loading class.
This lets me have the benefits of archives on the end-user's machine, while having the benefit of easy content iteration during development.

IMO, I find working with "files" much harder than working with assets. Yes, in development my assets are stored as files, so when I say Load("foo.texture"), that does turn into CreateFile("data/foo.texture", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)...

However, because I never think of these assets as "files", I'm free to change the behind the scenes behavior. Maybe I'll look in the OS's file system first, and then in a patch archive, and then in the shipping archive. Maybe I'll pre-load a chunk of an archive that contains several assets at once, then when the user asks for any of those assets, I've already got the data sitting there, etc...

Also, because I don't think of them as files, I don't author them as files. I never copy files into the data directory, and I never use file->save as to create any of the data files.
Instead, we have a content directory, which does contain files, and a data-compiler, which scans the content directory for modifications, compiles any modified files, and writes them into the data directory. This means that for example, if I want to change my textures to use a different DXT compression algorithm, or I decide that materials should be embedded into level files, then I change the data-compiler's rules, and the data directory can be recompiled.

The association between an asset name (e.g. "foo.texture") and the content file path (e.g. "d:\myProject\content\foo.png") aren't hard-coded; our compilation routines are written in C#, the build steps described in Lua, and regular expressions are used to find a suitable file to use as input when building an asset.
e.g. The following Lua script tells the asset-compiler that:
* If the asset "foo.geo" is required, then use the GeometryBuilder plugin (a C# class) and load "temp/foo.daebin" as input.
* If "temp/foo.daebin" is required, then use the DaeParser plugin and search the content directory recursively for "foo.dae" (a COLLADA model).
local DaeParser = Builder("DaeParser")
Rule(DaeParser, "temp/(.*).daebin", "$1.dae") local GeometryBuilder = Builder("GeometryBuilder") Rule(GeometryBuilder, "data/(.*).geo", "temp/$1.daebin")

These kinds of data compilers can also be used to ensure that only the data that's actually used by the game ends up in the data directory. Instead of compiling every file inside the content directory, we only compile the required files. We start off with the hard-coded asset-names used in the game's source code (ideally this number is quite small), then we find the linked assets (e.g. a material links a texture and a shader), and repeat until we've got a full dependency tree.
Another neat feature you can add to a system like this is asset-refreshing -- the data-compiler is already scanning for file modifications to rebuild new data, so when it re-builds a file it can check if the game is currently running, and if so, send a message to the game instructing it to reload the modified asset.

In the industry, every company I've worked for the past 6 years has used some kind of automated asset pipeline like this, and I just can't imagine going back to manually placing files in the game's data directory

Faster read times? (I've heard it can help not thrash the hard drive so much, but is this really much of a concern today on modern operating systems, and does it really help a significant amount?)

Assuming a non-SSD drive, it can give a significant reduction in loading times. With 1000 seperate files, the OS can keep each one defragmented, so that each individual file load can be done without wasteful seek periods, however, the OS doesn't know the order in which you want to load all of those files, so you'll pay a seek penalty in-between each file and likely won't benefit from automatic pre-caching.
If you pack all the files end-to-end, in the order in which you want to load them, you can spend more time reading and less time seeking.

As for modern OS's helping out, either way, make sure you're using the OS's native file system API (e.g. on windows, CreateFile/ReadFileEx instead of fopen/fread). By using these API's you can take advantage of modern features like threadless background loading (DMA), file (pre)caching or memory mapping.

### #5Hodgman

Posted 31 March 2012 - 11:48 PM

(Slightly) harder for me to work with
Increased save times when modifying the file (which the game won't do, but I will in my editor, so it's a con for me, though users won't experience this)

In my engine, I only use packs/archives for retail/shipping builds. In development builds, each asset is stored in a separate file in the data directory. When making a shipping build, the data directory is "zipped" into an archive, and the code is compiled to use a different asset-loading class.
This lets me have the benefits of archives on the end-user's machine, while having the benefit of easy content iteration during development.

IMO, I find working with "files" much harder than working with assets. Yes, in development my assets are stored as files, so when I say Load("foo.texture"), that does turn into CreateFile("data/foo.texture", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)...

However, because I never think of these assets as "files", I'm free to change the behind the scenes behavior. Maybe I'll look in the OS's file system first, and then in a patch archive, and then in the shipping archive. Maybe I'll pre-load a chunk of an archive that contains several assets at once, then when the user asks for any of those assets, I've already got the data sitting there, etc...

Also, because I don't think of them as files, I don't author them as files. I never copy files into the data directory, and I never use file->save as to create any of the data files.
Instead, we have a content directory, which does contain files, and a data-compiler, which scans the content directory for modifications, compiles any modified files, and writes them into the data directory. This means that for example, if I want to change my textures to use a different DXT compression algorithm, or I decide that materials should be embedded into level files, then I change the data-compiler's rules, and the data directory can be recompiled.

The association between an asset name (e.g. "foo.texture") and the content file path (e.g. "d:\myProject\content\foo.png") aren't hard-coded; regular expressions are used to find a suitable file to use as input when building an asset.
e.g. The following Lua script tells the asset-compiler that:
* If the asset "foo.geo" is required, then use the GeometryBuilder plugin (a C# class) and load "temp/foo.daebin" as input.
* If "temp/foo.daebin" is required, then use the DaeParser plugin and search the content directory recursively for "foo.dae" (a COLLADA model).
local DaeParser = Builder("DaeParser")
Rule(DaeParser, "temp/(.*).daebin", "$1.dae") local GeometryBuilder = Builder("GeometryBuilder") Rule(GeometryBuilder, "data/(.*).geo", "temp/$1.daebin")

These kinds of data compilers can also be used to ensure that only the data that's actually used by the game ends up in the data directory. Instead of compiling every file inside the content directory, we only compile the required files. We start off with the hard-coded asset-names used in the game's source code (ideally this number is quite small), then we find the linked assets (e.g. a material links a texture and a shader), and repeat until we've got a full dependency tree.
Another neat feature you can add to a system like this is asset-refreshing -- the data-compiler is already scanning for file modifications to rebuild new data, so when it re-builds a file it can check if the game is currently running, and if so, send a message to the game instructing it to reload the modified asset.

In the industry, every company I've worked for the past 6 years has used some kind of automated asset pipeline like this, and I just can't imagine going back to manually placing files in the game's data directory

Faster read times? (I've heard it can help not thrash the hard drive so much, but is this really much of a concern today on modern operating systems, and does it really help a significant amount?)

Assuming a non-SSD drive, it can give a significant reduction in loading times. With 1000 seperate files, the OS can keep each one defragmented, so that each individual file load can be done without wasteful seek periods, however, the OS doesn't know the order in which you want to load all of those files, so you'll pay a seek penalty in-between each file and likely won't benefit from automatic pre-caching.
If you pack all the files end-to-end, in the order in which you want to load them, you can spend more time reading and less time seeking.

As for modern OS's helping out, either way, make sure you're using the OS's native file system API (e.g. on windows, CreateFile/ReadFileEx instead of fopen/fread). By using these API's you can take advantage of modern features like threadless background loading (DMA), file (pre)caching or memory mapping.

### #4Hodgman

Posted 31 March 2012 - 11:48 PM

(Slightly) harder for me to work with
Increased save times when modifying the file (which the game won't do, but I will in my editor, so it's a con for me, though users won't experience this)

In my engine, I only use packs/archives for retail/shipping builds. In development builds, each asset is stored in a separate file in the data directory. When making a shipping build, the data directory is "zipped" into an archive, and the code is compiled to use a different asset-loading class.
This lets me have the benefits of archives on the end-user's machine, while having the benefit of easy content iteration during development.

IMO, I find working with "files" much harder than working with assets. Yes, in development my assets are stored as files, so when I say Load("foo.texture"), that does turn into CreateFile("data/foo.texture", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)...

However, because I never think of these assets as "files", I'm free to change the behind the scenes behavior. Maybe I'll look in the OS's file system first, and then in a patch archive, and then in the shipping archive. Maybe I'll pre-load a chunk of an archive that contains several assets at once, then when the user asks for any of those assets, I've already got the data sitting there, etc...

Also, because I don't think of them as files, I don't author them as files. I never copy files into the data directory, and I never use file->save as to create any of the data files.
Instead, we have a content directory, which does contain files, and a data-compiler, which scans the content directory for modifications, compiles any modified files, and writes them into the data directory. This means that for example, if I want to change my textures to use a different DXT compression algorithm, or I decide that materials should be embedded into level files, then I change the data-compiler's rules, and the data directory can be recompiled.

The association between an asset name (e.g. "foo.texture") and the content file path (e.g. "d:\myProject\content\foo.png") aren't hard-coded; regular expressions are used to find a suitable file to use as input when building an asset.
e.g. The following Lua script tells the asset-compiler that:
* If the asset "foo.geo" is required, then use the GeometryBuilder plugin (a C# class) and load "temp/foo.daebin" as input.
* If "temp/foo.daebin" is required, then use the DaeParser plugin and search the content directory recursively for "foo.dae" (a COLLADA model).
local DaeParser = Builder("DaeParser")

Rule(DaeParser, "temp/(.*).daebin", "$1.dae") local GeometryBuilder = Builder("GeometryBuilder") Rule(GeometryBuilder, "data/(.*).geo", "temp/$1.daebin")

These kinds of data compilers can also be used to ensure that only the data that's actually used by the game ends up in the data directory. Instead of compiling every file inside the content directory, we only compile the required files. We start off with the hard-coded asset-names used in the game's source code (ideally this number is quite small), then we find the linked assets (e.g. a material links a texture and a shader), and repeat until we've got a full dependency tree.
Another neat feature you can add to a system like this is asset-refreshing -- the data-compiler is already scanning for file modifications to rebuild new data, so when it re-builds a file it can check if the game is currently running, and if so, send a message to the game instructing it to reload the modified asset.

In the industry, every company I've worked for the past 6 years has used some kind of automated asset pipeline like this, and I just can't imagine going back to manually placing files in the game's data directory

Faster read times? (I've heard it can help not thrash the hard drive so much, but is this really much of a concern today on modern operating systems, and does it really help a significant amount?)

Assuming a non-SSD drive, it can give a significant reduction in loading times. With 1000 seperate files, the OS can keep each one defragmented, so that each individual file load can be done without wasteful seek periods, however, the OS doesn't know the order in which you want to load all of those files, so you'll pay a seek penalty in-between each file and likely won't benefit from automatic pre-caching.
If you pack all the files end-to-end, in the order in which you want to load them, you can spend more time reading and less time seeking.

As for modern OS's helping out, either way, make sure you're using the OS's native file system API (e.g. on windows, CreateFile/ReadFileEx instead of fopen/fread). By using these API's you can take advantage of modern features like threadless background loading (DMA), file (pre)caching or memory mapping.

### #3Hodgman

Posted 31 March 2012 - 11:41 PM

(Slightly) harder for me to work with
Increased save times when modifying the file (which the game won't do, but I will in my editor, so it's a con for me, though users won't experience this)

In my engine, I only use packs/archives for retail/shipping builds. In development builds, each asset is stored in a separate file in the data directory. When making a shipping build, the data directory is "zipped" into an archive, and the code is compiled to use a different asset-loading class.
This lets me have the benefits of archives on the end-user's machine, while having the benefit of easy content iteration during development.

IMO, I find working with "files" much harder than working with assets. Yes, in development my assets are stored as files, so when I say Load("foo.texture"), that does turn into CreateFile("data/foo.texture", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)...

However, because I never think of these assets as "files", I'm free to change the behind the scenes behavior. Maybe I'll look in the OS's file system first, and then in a patch archive, and then in the shipping archive. Maybe I'll pre-load a chunk of an archive that contains several assets at once, then when the user asks for any of those assets, I've already got the data sitting there, etc...

Also, because I don't think of them as files, I don't author them as files. I never copy files into the data directory, and I never use file->save as to create any of the data files.
Instead, we have a content directory, which does contain files, and a data-compiler, which scans the content directory for modifications, compiles any modified files, and writes them into the data directory. This means that for example, if I want to change my textures to use a different DXT compression algorithm, or I decide that materials should be embedded into level files, then I change the data-compiler's rules, and the data directory can be recompiled.

Also, this can be used to ensure that only the data that's actually used by the game ends up in the data directory. Instead of compiling every file inside the content directory, we only compile the required files. We start off with the hard-coded asset-names used in the game's source code (ideally this number is quite small), then we find the linked assets (e.g. a material links a texture and a shader), and repeat until we've got a full dependency tree.
Another neat feature you can add to a system like this is asset-refreshing -- the data-compiler is already scanning for file modifications to rebuild new data, so when it re-builds a file it can check if the game is currently running, and if so, send a message to the game instructing it to reload the modified asset.

In the industry, every company I've worked for the past 6 years has used some kind of automated asset pipeline like this, and I just can't imagine going back to manually placing files in the game's data directory

Faster read times? (I've heard it can help not thrash the hard drive so much, but is this really much of a concern today on modern operating systems, and does it really help a significant amount?)

Assuming a non-SSD drive, it can give a significant reduction in loading times. With 1000 seperate files, the OS can keep each one defragmented, so that each individual file load can be done without wasteful seek periods, however, the OS doesn't know the order in which you want to load all of those files, so you'll pay a seek penalty in-between each file and likely won't benefit from automatic pre-caching.
If you pack all the files end-to-end, in the order in which you want to load them, you can spend more time reading and less time seeking.

As for modern OS's helping out, either way, make sure you're using the OS's native file system API (e.g. on windows, CreateFile/ReadFileEx instead of fopen/fread). By using these API's you can take advantage of modern features like threadless background loading (DMA), file (pre)caching or memory mapping.

### #2Hodgman

Posted 31 March 2012 - 11:38 PM

(Slightly) harder for me to work with
Increased save times when modifying the file (which the game won't do, but I will in my editor, so it's a con for me, though users won't experience this)

In my engine, I only use packs/archives for retail/shipping builds. In development builds, each asset is stored in a separate file in the data directory. When making a shipping build, the data directory is "zipped" into an archive, and the code is compiled to use a different asset-loading class.
This lets me have the benefits of archives on the end-user's machine, while having the benefit of easy content iteration during development.

IMO, I find working with "files" much harder than working with assets. Yes, in development my assets are stored as files, so when I say Load("foo.texture"), that does turn into CreateFile("data/foo.texture", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)...

However, because I never think of these assets as "files", I'm free to change the behind the scenes behavior. Maybe I'll look in the OS's file system first, and then in a patch archive, and then in the shipping archive. Maybe I'll pre-load a chunk of an archive that contains several assets at once, then when the user asks for any of those assets, I've already got the data sitting there, etc...

Also, because I don't think of them as files, I don't author them as files. I never copy files into the data directory, and I never use file->save as to create any of the data files.
Instead, we have a content directory, which does contain files, and a data-compiler, which scans the content directory for modifications, compiles any modified files, and writes them into the data directory. This means that for example, if I want to change my textures to use a different DXT compression algorithm, or I decide that materials should be embedded into level files, then I change the data-compiler's rules, and the data directory can be recompiled.

Also, this can be used to ensure that only the data that's actually used by the game ends up in the data directory. Instead of compiling every file inside the data directory, we only compile the required files. We start off with the hard-coded asset-names used in the game's source code (ideally this number is quite small), then we find the linked assets (e.g. a material links a texture and a shader), and repeat until we've got a full dependency tree.
Another neat feature you can add to a system like this is asset-refreshing -- the data-compiler is already scanning for file modifications to rebuild new data, so when it re-builds a file it can check if the game is currently running, and if so, send a message to the game instructing it to reload the modified asset.

In the industry, every company I've worked for the past 6 years has used some kind of automated asset pipeline like this, and I just can't imagine going back to manually placing files in the game's data directory

Faster read times? (I've heard it can help not thrash the hard drive so much, but is this really much of a concern today on modern operating systems, and does it really help a significant amount?)

Assuming a non-SSD drive, it can give a significant reduction in loading times. With 1000 seperate files, the OS can keep each one defragmented, so that each individual file load can be done without wasteful seek periods, however, the OS doesn't know the order in which you want to load all of those files, so you'll pay a seek penalty in-between each file and likely won't benefit from automatic pre-caching.
If you pack all the files end-to-end, in the order in which you want to load them, you can spend more time reading and less time seeking.

As for modern OS's helping out, either way, make sure you're using the OS's native file system API (e.g. on windows, CreateFile/ReadFileEx instead of fopen/fread). By using these API's you can take advantage of modern features like threadless background loading (DMA), file (pre)caching or memory mapping.

PARTNERS