Sign in to follow this  
Omaha

File Identification

Recommended Posts

Short of a fully qualified path and filename... is there anyway to uniquely identify a given file on a Win32 system? I'm writing a resource manager and to prevent multiple instances of the same file from being loaded into memory I'm trying to figure out a way to check if a particular file has already been loaded, so every resource carries with it a reference count, and if the reference count is zero then the file truly needs to be read otherwise it can just be incremented and the previously loaded copy can be reused. Originally I assigned every resource a unique ID of my own as it was read, but in the end I'd still need to get that unique ID from a filename-based lookup table, so that's just an extra step. I could of course just use the filename, but that seems kludgy and overly complex to me.

Share this post


Link to post
Share on other sites
I really think using the file name is your best bet, it IS 100% unique.

How would that be too complex? Do your files move around?

Share this post


Link to post
Share on other sites
You could use a checksum algorithm on files...
When read in, calculate the files checksum and record it. On any new file, read it in and calculate the checksum. if it is identical, you have identical files.

Or you could just check files byte for byte against one another.
The first is better though...

Share this post


Link to post
Share on other sites
Quote:
Original post by Mulligan
...

How would that be too complex? Do your files move around?


Not TOO complex, just MORE complex.

The code and process of checking to see if two paths are the same is naturally and unavoidably more complex than seeing if two values are the same. I'll write it, I was just curious if there was another way, but apprarently not a very much BETTER way. :P

Now I sleep. Tomorrow I'll write some more.

Share this post


Link to post
Share on other sites
Quote:
Original post by Omaha
Quote:
Original post by Mulligan
...

How would that be too complex? Do your files move around?


Not TOO complex, just MORE complex.

The code and process of checking to see if two paths are the same is naturally and unavoidably more complex than seeing if two values are the same. I'll write it, I was just curious if there was another way, but apprarently not a very much BETTER way. :P

Now I sleep. Tomorrow I'll write some more.


assuming that youre using a character array, just do:

if( strcmp( str1, str2 ) == 0 )
{
//They are the same
}

Share this post


Link to post
Share on other sites
Yeh... the process of comparing paths is about as simple as it gets. Since you're on Win32, you don't even have to worry about symbolic/hard links.

Share this post


Link to post
Share on other sites
One way to check if two files are identical is to use Cyclic Redundancy Checking (CRC32), which will give you an identical value if two files are the same. This would allow you, for example, to detect that the files at c:\image01.bmp and c:\image02.bmp are actually the same file, just named differently.
Personally, i think that might go a bit overboard, as i often load in two identical textures, and then manipulate them in different ways, which would be impossible with this method. But it might be what your looking for.

Heres a link
http://www.gamedev.net/reference/articles/article1941.asp


Hope it helps
Spree

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Note that on Win2k and XP, the filename is NOT sufficient to uniquely determine whether two files are the same file - these two OSes support the CreateHardLink system call so one file can have more than one name. You'll want to use GetFileInformationByHandle to get a volume serial number and index numbers for the files and compare those to determine whether multiple names refer to the same on-disk object.

Share this post


Link to post
Share on other sites
people, he doesn't need an algorithm to guarantee that 2 files with the same content yield the same value -

he needs an algorithm that guarantees that 2 paths to disk that point to the same "file" give the same value -

for purposes of sharing loaded buffers and such things ... and if your algorithm works using just a simple string comparison, preceded by a case normalization step (convert the string to lower case or upper case prior to compare), then you should have no real trouble.

The only trouble would be if some paths we're anchored (C:\xxx), some paths we're uris (file:C:/xxx), and some paths we're relative (bin\xxx);

the one additional normalization normally performed on paths is converting both / and \ to /.

good luck.

Share this post


Link to post
Share on other sites
Considered using a hash-map?

Using either the fully qualified file path or the MD5/SIA1 hash of the fully qualified file path as value to the hash-map should be enough, shouldn't it?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this