Best Structure

Started by
7 comments, last by Shael 12 years, 10 months ago
G'day,

I'm trying to figure out what the best structure to use for storing a list of file paths. I want to be able to ask the structure a filename and have it return the full path to the file if it exists. I'm guessing I need some kind of tree structure which gets filled by recursing a directory and adding each file found into the tree. I'm just not sure what the most efficient kind of tree I should be using for the job.

Binary tree? map<> ?

Just looking for suggestions really and why you'd use a particular structure.

Cheers
Advertisement
Can you describe the problem a bit more completely? e.g. do you get a list of files and their paths at the beginning, and then have to do lots of lookups to get a path given a filename? Can one file name exist in multiple paths in your situation? What is the average number of files per unique path?

All of these things (and more!) will have a bearing on what data structure makes the most sense. Without a lot more detail on what exactly you need to do, it's hard to make any particularly relevant suggestions.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

So you want to make a short-file-name to full-path lookup table? What if there are multiple short-file-names that are the same, but for files in different full-paths?

In C++ perhaps a std::multimap would work. In C# you could use a Dictionary<K,List<V>> for this, or just a flat List<T> if you don't mind taking a hit doing linear-time lookups.
If there's no duplicates, then a [font="'Lucida Console"]map<string,string>[/font] will do fine.
The reason for needing this structure is for my file system. When the engine starts up it should recursively scan the engine directory and form some kind of tree for every file that exists. This tree will be used for quick lookups to find the absolute path of a file given a filename. If there is a file with the same name in two places, the second file that is found is ignored. This is a restriction of the file system but the advantage of having the user only have to ask for a filename outweighs it. I'm not sure on the average number of files per path as it can vary, but a guess would be anywhere from 10-30 in most cases.

Hope this helps show where I want to go with this.

Cheers
Hey there!

From what you described I would suggest that you have lots and lots of files which share a reasonably long prefix, right? For example, if you put all your music and sounds into an '/sfx' sub-folder of your engine. I would suggest a trie structure. I don't know/think that this is implemented anywhere in the STL, but it is quite fast and very very space efficient if decently implemented.

- Michael.

PS: Just read your second post. Sounds to me like when you want to allow the 'original content' to be overwritten by providing additional files. Wasn't it done like this with the Half-Life mods back then? ...not sure... Anyway, it depends on what you want to store exactly. If the file names '/base/level.res' and '/addon/level.res' are dependent or somehow related then a trie won't help you (as would any other structure mentioned so far). In this case I would suggest a 'layered' approach, i.e. build your structure (whatever choice) for files in '/base' and then build other structures for other folders like '/addon'. Then you can query the structures in a specific order to achieve the 'one file shadows another' effect.

Just use a map<string, string> as suggested, you probably won't use too much memory. If that uses way too much memory, make a tree* that mirrors the directory tree you scanned and have a map<string, dirtree::iterator> alongside the dirtree for fast lookups by filename.

*Also, use tricks to make it memory-compact.

The reason for needing this structure is for my file system. When the engine starts up it should recursively scan the engine directory and form some kind of tree for every file that exists. This tree will be used for quick lookups to find the absolute path of a file given a filename. If there is a file with the same name in two places, the second file that is found is ignored. This is a restriction of the file system but the advantage of having the user only have to ask for a filename outweighs it. I'm not sure on the average number of files per path as it can vary, but a guess would be anywhere from 10-30 in most cases.

Hope this helps show where I want to go with this.

Cheers


You probably know your needs better, but isn't this a potentially problem-causing decision? It seems to me quite possible that in the future you may need to have two folders containing files with the same names... As I see it, this behavior (igonring files) is too much for the convenience it provides...
Thanks for the suggestions. I had thought about using map but thought maybe a more efficient structure was around and it would seem Trie might be that structure. I'll investigate both.

@Dr_Tr:

This restriction seems to be adopted by a few game engines so it might not be as problematic as it appears?

This topic is closed to new replies.

Advertisement