[web] Organizing storage of images

Started by
4 comments, last by Sander 17 years, 6 months ago
I am using a LAMP environment. If I have a gallery system that stores media files on the server and creates a database entry for each image to uniquely identify it (with filename, caption, tags, filesize, etc.), what would be the best way to organize where the file is stored on the server? Is this the best way to store a large amount of media? Could I store all of the images in a single directory? They all have a unique filename. My main concerns are organization, speed, and scalability. I am talking upwards of 25,000 image files which will increase as time goes on. Thanks in advance, Eric
Advertisement
Do not put them all in the same directory. You have to remember that the filesystem is a "program" too and when you want to look up some arbitrary file it has to locate it in some kind of data structure. More files in a directory means slower lookup times.

One way to get around it is to create subdirectories. You could mimic the actual layout of the files in the gallery. But if you just want centralized mass storage try this:

Create a directory called "images" and inside that directory create a subdir for each letter of the alphabet plus 10 more for the digits. Then store the files in a subdir that matches the first letter of the filename. For instance:

levs_picture.jpg

...would get stored under the "L" subdir. That way it breaks up the strain on the file fetching.

You can expand on that technique by hashing the filenames or by renaming the files some 32-char random string to avoid name collisions.

A similar system involves sequencial numeric naming (my canon camera actually does this). Every new file gets a name. For simplicity, just make the filename the numeric database index/key. Start at "0001.jpg" and put it in a folder called "01". When the folder fills up with 100 files, you create another folder called "02" and start putting them in there. That way, each folder holds 100 files max and lookups are easy:

// get picture database ID
$picture_db_key = 856;

// get folder it's in - divide by 100 and pad with zero
$folder = sprint( "%02d", $picture_db_key / 100 );

// get file
$picture = GetFileSomehow( "$folder/$picture_db_key.jpg ");
Thanks, that is what I needed to know.
Using the first letter or letters of a filename isn't optimal. The reason you're splitting the file over many directories is that you don't want too many files in a single directory (some filesystems can get unstable if you put too much in a single directory). So, you want to split them over the subdirectories as evenly as possible. Using the md5 hash of the filename leads to a reasonable even distribution across the range [0-9a-e] (hexadecimal). I suggest you use that instead. If you want, you can even split it further on the second character. That's what I did on one of my websites:

files/    0/    1/        10/        11/        ...        1d/        1e/    2/    ...    d/    e/

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

Firstly, some filesystems SHOULD NOT "get unstable" if you put too many files in one directory. No operating system worth its salt should let you put more files in a directory than it can cope with- therefore, IF a limit exists on your filesystem, it should refuse to add more once it reaches it.

However, some filesystems will have poor performance if there are too many files in one directory, and some TOOLS (not the filesystem itself though) will have trouble if there are too many files (GUI tools especially, think of Windows Explorer).

Personally I'd store them as BLOBs in a relational database, especially if they're fairly small.

Putting them in the filesystem is fine as long as you're aware of the performance characteristics- e.g. some fs index directories, others do not. The latter are likely to have poor performance with a directory with a lot of files.

Mark
Quote:Original post by markr
Firstly, some filesystems SHOULD NOT "get unstable" if you put too many files in one directory.


Ofcourse not. And neither should the fragment into oblivion, crash for unexpected reason, In that same vein, an OS should never crash and have no exploits [grin]

Most modern filesystems are fine with really large ammounts of files in a single directory but older one's aren't always that good.

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

This topic is closed to new replies.

Advertisement