File Systems! Argh!

Started by
9 comments, last by coldacid 21 years, 10 months ago
I''ve been trying to figure out what to do for the file subsystem in an engine I''m currently working on. I want to be able to have the best of both worlds - a single container file a la pak or DocFile, but also be able to pull out a handle or name of an actual file when it''s needed (such as by my video player function). Originally I was thinking DocFile (structured storage, yay, pretty useful actually, but only if you know how to use it), but between a lack of quality docs/tuts and the fact that IStorage/IStream has no way of giving me a filename for a stream when I need one, I''ve had to discard it. The VFS tutorials from flipCode aren''t descriptive enough to be put to serious use by me (for starters I want a system I code myself, not the great cut+paste adventures). So, I''m without a clue on how to do this. Which bothers the hell out of me. So, I implore, if anyone has any hints or pointers for me that''ll help me get this out of the way, thanks. Chris ''coldacid'' Charabaruk <ccharabaruk@meldstar.com> <http://www.meldstar.com/ccharabaruk/> Meldstar Studios <http://www.meldstar.com/> - Creation, cubed. This message double ROT-13 encrypted for additional security.

Chris 'coldacid' Charabaruk – Programmer, game designer, writer | twitter

Advertisement
Well, I''m implementing something of that sort right now. I decided to go with a container file that has "records" (so that I don''t use the term "file" to mean the container file and the contained files). The container file consists of three parts:

1. Header: magic value, version, # of records stored, etc.

2. "Record information area": for each record that is stored in the file, the RecordInfo struct contains two pieces of data. For the client application, it stores the record ID, its type, and its name. The class can find records by the name or by the id; it''s up to the client program to put meaningful values in these fields. Also, RecordInfo contains offset at which the record data is stored in the third section relative to the start of the third section, as well as data size.

3. "Record data area": basically, all the records are written seqentially in this area. The order in which record information records appear in the second section is the same in which the record data blocks appear in the data section.

Presently, clients can add/remove records, find records by id or name, save all records as files or load all files within a directory as records. I''ll be implementing record enumeration shortly. Although most methods directly support loading of data from data files to the container file and vise versa, the data that is stored in the container file can be generated and used by the application directly without any external files.

Some implementation details: I''m using Win32 file mappings, which allows for easy container file resizing and easy record access. In particular, clients get a pointer to the record data; they don''t need to provide a buffer to which the class would copy the records. The second and third sections are aligned (at 1 KB and 4 KB presently, respectively) to avoid frequent reallocations. Record lookup is O(n). Thanks to the file mapping, there is practically no memory overhead: you''re referencing the data straight from the file, not from a memory copy. Since all information blocks are stored in the beginning of the file, you don''t need to browse through megabytes of data to find a record. There are "add all files satisfying this mask to the container file" and "extract all files" convenience functions.

I haven''t had the time to test this class in my real-world apps yet, but it looks very promising.
---visit #directxdev on afternet <- not just for directx, despite the name
I still need a way to take a ''record'' (as you put it) and move it into a temporary file, then get that file''s name to pass to certain routines in my engine.

If only DirectX (and other libs & APIs I''m using) allowed me to pass IStream objects instead of file names.


Chris ''coldacid'' Charabaruk <ccharabaruk@meldstar.com> <http://www.meldstar.com/ccharabaruk/>
Meldstar Studios <http://www.meldstar.com/> - Creation, cubed.

This message double ROT-13 encrypted for additional security.

Chris 'coldacid' Charabaruk – Programmer, game designer, writer | twitter

quote:Original post by coldacid
I still need a way to take a 'record' (as you put it) and move it into a temporary file, then get that file's name to pass to certain routines in my engine.


No problem. Here's the declaration of a record info:

      struct DatRecordInfo{	// Size of the information block	DWORD InfoSize;	// Type of record stored, for application reference	DWORD RecordType;	// Offset from the beginning of record data area to this record	DWORD RecordOffset;	// Record size	DWORD RecordSize;	// Record identifier, for application reference	DWORD RecordId;	// Record name, variable length, zero-terminated	char RecordName[1];};      

RecordName can be either "\0", an arbitrary string to identify a record, a file name, or a fully qualified path name -- whatever you like.

Edit: I might as well post the whole class declaration.

  struct CDatFile{// Construction and destruction	// Standard constructor	CDatFile();	// No opening constructor, because constructors can't return errors	// Destructor automatically closes the file	~CDatFile();// Opening and closing	// Creates or opens the file	// AccessMode can be one or both of GENERIC_READ and GENERIC_WRITE	// Disposition can be CREATE_NEW, CREATE_ALWAYS, OPEN_EXISTING, OPEN_ALWAYS, or TRUNCATE_EXISTING	BOOL Create(LPCTSTR FileName, DWORD AccessMode, DWORD Disposition);	// Closes the file	void Close();// Statistics	DWORD GetNumRecords();// Read operations	// Finding records	const DatRecordInfo *FindRecordInfo(DWORD RecordId, LPCSTR RecordName);	LPCVOID FindRecordData(DWORD RecordId, LPCSTR RecordName);	LPCVOID GetRecordDataForRecordInfo(const DatRecordInfo *pRecordInfo);	// Extracts all named records from the .dat file to a specified directory as files,	// keeping the directory structure	// Disposition can be CREATE_NEW, CREATE_ALWAYS, OPEN_EXISTING, OPEN_ALWAYS, or TRUNCATE_EXISTING	BOOL ExtractAllNamedRecordsTo(LPCTSTR Directory, DWORD Disposition);	// Record enumeration	const DatRecordInfo *FindFirstRecord();	const DatRecordInfo *FindNextRecord(const DatRecordInfo *pCurrentRecord);// Write operations	// Appending or replacing Records	// This function will copy the specified file from harddrive to the dat file as record	BOOL AddFile(DWORD Mode, DWORD RecordType, DWORD RecordId, LPCSTR FileName);	// This function will copy the record data from memory to the dat file	BOOL AddRecord(DWORD Mode, DWORD RecordType, DWORD RecordId, LPCSTR RecordName, PVOID RecordData, DWORD RecordDataSize);	// Removing records	BOOL RemoveRecord(DWORD RecordId, LPCSTR RecordName);	// Finds all files in the specified directory and subddirectories and adds them as records to the dat file	BOOL AddDirectoryFiles(LPCTSTR Directory, LPCTSTR FileMask, BOOL Recurse, DWORD Mode);private:	// yadda yadda};    



[edited by - IndirectX on June 4, 2002 6:23:47 AM]
---visit #directxdev on afternet <- not just for directx, despite the name
I didn''t ask for the struct... What I need is some quick and _fast_ method of copying out a file from storage and plopping it down in the current temp dir, and then getting the filename of the new temporary file.

Chris ''coldacid'' Charabaruk <ccharabaruk@meldstar.com> <http://www.meldstar.com/ccharabaruk/>
Meldstar Studios <http://www.meldstar.com/> - Creation, cubed.

This message double ROT-13 encrypted for additional security.

Chris 'coldacid' Charabaruk – Programmer, game designer, writer | twitter

quote:Original post by coldacid
I didn''t ask for the struct... What I need is some quick and _fast_ method of copying out a file from storage and plopping it down in the current temp dir, and then getting the filename of the new temporary file.

Let me ask why you want to extract data from storage to a temporary file. In any event, I would:

- call FindRecordInfo to retrieve the information block for the given filename. This is pretty fast, only need to do a linear search on DatRecordInfo struct array. This will give you the "real" filename.

- call GetRecordDataForRecordInfo to obtain data pointer for this file. This is done in O(1) time.

- call GetTempFileName to generate a temporary filename.

- Open, WriteFile() the record data to it, and close it. This is as fast as you can make it; data is copied straight from the container file to the harddrive.
---visit #directxdev on afternet <- not just for directx, despite the name
That would do the trick. Why I want this is easy: DirectShow, among other APIs I''m using in this engine, want filenames, not a pointer to a file''s contents in memory, a handle to a file, an IStream object, or anything else.

It''s me at the whims of the APIs, not the other way around. If I want to get the most out of them, I play by their rules. That''s what I''m doing.


Chris ''coldacid'' Charabaruk <ccharabaruk@meldstar.com> <http://www.meldstar.com/ccharabaruk/>
Meldstar Studios <http://www.meldstar.com/> - Creation, cubed.

This message double ROT-13 encrypted for additional security.

Chris 'coldacid' Charabaruk – Programmer, game designer, writer | twitter

why not just leave the files that get loaded by the api out of the data package file?
And leave them where they can be tampered with by cheating gamers. No, I don''t think so.

Chris ''coldacid'' Charabaruk <ccharabaruk@meldstar.com> <http://www.meldstar.com/ccharabaruk/>
Meldstar Studios <http://www.meldstar.com/> - Creation, cubed.

This message double ROT-13 encrypted for additional security.

Chris 'coldacid' Charabaruk – Programmer, game designer, writer | twitter

quote:Original post by coldacid
And leave them where they can be tampered with by cheating gamers. No, I don''t think so.


Do you think they are going to mess with your audio too? The performance cornerstone of the container files is the ability of the game to load data directly from them. Quake3, for instance, uses compressed containers and therefore has substantially longer level load times than UT, which doesn''t use compression. This is especially true for repeat level loads: while UT can just pull the data straight from the memory cache, if you have enough RAM, Q3 must re-decompress them again. You can keep your game data in your custom files while keeping the music that is loaded by DirectShow in regular files, which will give you optimal performance. Copying megabytes of audio/video from your custom file to disk is going to waste quite a bit of memory, disk space, and processing power.
---visit #directxdev on afternet <- not just for directx, despite the name

This topic is closed to new replies.

Advertisement