File handling for a resource manager

Started by
5 comments, last by KulSeran 15 years, 2 months ago
I'm developing a resource manager in C++. I'm developing an archive format, but I have a doubt about how C++ handles files (streams), that would condition my file format design. First I'll explain how my ideal archive format would work. I have two types of files (let's call them DATA type and POINTER type). The DATA type would be a huge "array" of binary files. The POINTER type would be a collection of pointers to a binary file that is on the DATA file. To exemplify: POINTER file type: (...) id: "texture/coat.bmp" pointer: 23341 ------ id: "audio/test.wav" pointer: 11931 ------ DATA file type: (...) position: 11931 data: xxxxxxxxxxxxxxxxxxxxx (...) At the beginning of level 1, level1.POINTER would be opened. This would make all DATA ids available to the resource manager (without loading the data, unless specified). At the end of level 1, level1.POINTER would be freed making all loaded resources to be freed. This simple design would allow less overhead to map resource ids to its resource (each id would only be loaded when really needed) and it would allow different levels to point to the same data chunk. My question is the following: How does fstream open a file? Does it copy the file completely to memory (making it useless to create a big DATA file, in that case it would be better to separate the data into levels and merge the POINTER into the data file)? Or does it only copy a pointer to the file and only copies to memory when asked to (this would be my guess, but I'm not sure...). Also do you think my design is viable, or should I opt for another? And should I use C++ fstream or C file handling (I would prefer to use C++ whenever possible, but if C is more efficient, tell me!) Thanks!
Advertisement
Quote:Original post by Agnor
How does fstream open a file?

That's implementation defined.

Quote:Does it copy the file completely to memory (making it useless to create a big DATA file, in that case it would be better to separate the data into levels and merge the POINTER into the data file)?

Generally not, but it could for small files.

Quote:Or does it only copy a pointer to the file and only copies to memory when asked to (this would be my guess, but I'm not sure...).

Generally it reads in a small section as a buffer and when you've exhausted the buffer, grabs another chunk, and so on.

Quote:Also do you think my design is viable, or should I opt for another?

I would recommend using an pre-existing library like PhysFS.
Quote:Original post by Agnor
Also do you think my design is viable, or should I opt for another? And should I use C++ fstream or C file handling (I would prefer to use C++ whenever possible, but if C is more efficient, tell me!)

Thanks!


I think your approach is viable. You've just re-invented the header/data design which is so common already (I don't mean that in a negative way -- I mean, *obviously* it's viable)

Beware of using big files that pack everything together, they can end up being a pain for several reason (source control, distributed work, etc.) I'd recommend finding a good balance between file count and file size. Also, I suggest you don't pack together files that aren't in relation with each other.

You might want to look at asynchronous file IO if you have to deal with big files.

Thanks for your answers.

Quote:I would recommend using an pre-existing library like PhysFS.


I've looked into the library and liked it, but I'll try to use a design of my own, for learning purposes.

Quote:I think your approach is viable. You've just re-invented the header/data design which is so common already (I don't mean that in a negative way -- I mean, *obviously* it's viable)


I've based my approach on zip and pak formats, so, yes, it's quite common. However, they didn't have separate files for header/data (they stored the header below the data).

Quote:Beware of using big files that pack everything together

I'm planning the format so it can support multiple DATA files.
Quote:Original post by Agnor

Also do you think my design is viable, or should I opt for another? And should I use C++ fstream or C file handling (I would prefer to use C++ whenever possible, but if C is more efficient, tell me!)


Under Windows:
fstream calls C run-time library.
C run-time library calls the OS routines.
Actual IO is performed using Create/Read/WriteFile API functions.

The overhead of layers here isn't zero, but it's not the bottleneck, considering that streams offer lots of flexibility, and the actual speed of disk-access vs. handful of instructions.

If IO does become a problem (let's say doing streaming processing on gigabyte-sized data sets), then there's always async IO, but that one is platform-specific, since C++ standard library does not define such operations.

Simply put, use the most convenient API (streams can be incredibly handy in C++ with proper operator << and >> overloading), since if disk IO becomes the problem, you'll need to drop to OS API anyway.

Quote:How does fstream open a file? Does it copy the file completely to memory (making it useless to create a big DATA file, in that case it would be better to separate the data into levels and merge the POINTER into the data file)? Or does it only copy a pointer to the file and only copies to memory when asked to (this would be my guess, but I'm not sure...).


fstreams are not some virtual mapping. You use them to read data into your structures. That's it. Whatever they use internally is irrelevant, if you want raw file image, simply read entire file into char[], then work from there.

Anything else, and you're talking about file systems, which are a huge topic.
In more practical terms:

Imagine I have a huge file (say, 100 MB) and I only want to access to 500 KB that are around the middle of the file (retrieve them with a char array)

How much memory (approximately) would the program use?

Also if I open a file stream (as in fstream file("test.big")) on, say, a constructor it wouldn't load into memory the full 100 MB file, it would only open a small chunk (a few KBs), ready for more extraction?
It will buffer as much or as little as you want. You can always tell the system you know best, and force the fstream to be in unbuffered mode. The default buffers are not huge, and usually tuned to some performance metric (not so big as to make random seeks kill performance, not so small as to cause reading to seek the disk every time).

But I'm going to warn that your idea is probably very bad for performance.
Having one large file that you load entierly into memory is MUCH faster than having one file that you seek around and read parts of, which is faster still than reading random unpacked files off the disk.

Your best bet is to store all the resources of a level into one DATA file(not one "Pointer" file as you put it) and load the entire data file into a memory buffer in one go. If you have common resources between levels (or types of levels) break it off into a few DATA files that you chose to load based on "this is all the textures for outdoor levels" or "this is all the player character assets". This also means you could keep common data already loaded while just dumping out the level specific data file. This can result in the same asset in several DATA files, but that is far more preferable than having a single copy. (Metalgear solid 4, and infact many console games, do this because reading off the CD/Blueray is slow, and seeking is veryveryvery slow (and wastes batteries) ). So reading one big file that contains copies of a few smaller files makes things simple and fast.

This topic is closed to new replies.

Advertisement