When To Put Images Into MySQL?

So, most of the "I want images in MySQL" conversations are terminated with "Don't." Some articles say "MySQL's overhead isn't worth it" and others that say "If you put the images in MySQL you don't have to deal with the filesystem overhead."

My company's site has over 2 million images to store/retrieve, and our current setup is at its limit and does not scale well -- 2 NFS servers. We'd rather avoid adding another NFS server, because it involves changing where images are located whenever we scale. The problem is that with so many images, there are so many inodes that the filesystem cannot keep up. As well, if a server is rebooted or offline for more than an hour (ie, during maintenance) it is extremely slow until the cache catches up (about half an hour).

We believe the best retrieval method is to use MySQL. We will be doing our own speed testing, so I'm not relying on what folks say here. But I'm curious -- what do other folks think? Flickr uses MySQL for image storage.....

Do any of you have any more

Do any of you have any more insight into this problem? How did storing large numbers of files in the DB work out for you?

Very interesting topic! What

Very interesting topic!
What do you guys think about so -called "chunk approach" - when you cut your BLOB in some little chunks, say 64 kb big and store your blob in those smaller chunks in the database instead of one big BLOB. What do you think, theoretically, will it do some good? Can it optimize search or retrieval effectivity?
Thank you all for your thoughts.

Hi Sheeri, Any news on the

Hi Sheeri,

Any news on the speed test? I am currently developing my own site and this would be a very useful info.

Thanks,
Giorgio

Hrm.....I actually meant 36

Hrm.....I actually meant 36 million images, not 2 million! my bad.

[...] Dezbaterea privind

[...] Dezbaterea privind păstrarea fişierelor (audio/video/imagini) in baza de date contra sistem de fişiere continuă cu noi argumente. [...]

Started typing here and

Started typing here and decided to write my own post about my experience with this: http://mike.kruckenberg.com/archives/2006/03/storing_binary.html

Sheeri's blog does not

Sheeri's blog does not support trackback autodetection.

So here is my manual trackback:

http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database...
Serving Images From A Database, The Inside Story

I serve files straight into

I serve files straight into the browser. Well...I set the content-type and all but you can just present them to the broswer and it will know what to do. An alternative method would be to dump to temp file and redirect to that link.

I agree, a very interesting

I agree, a very interesting topic. I will await Shiri's findings with interest. One question I have is how the images are actually "served" from the database. Say I have 3 photos of cats that I wish to serve from my DB. Are they munged into a temporary file locations and the enclosing document (dynamically produced) served up over HTTP or is there a much more clever way that I am missing?

This is definitely an

This is definitely an interesting conversation. I am currently using MySQL InnoDB tables to store all types of blobs. I currently have one DB up to about 30GB (I will be working on compression next) and it is growing. The schema is setup up so that the metadata is not on the same table as the blobs so I only hit the blob table when I want to retrieve.

I like this setup because it is very portable and easy to implement as an application. BUT (there is always a but) I am wondering just how big can I go and will there be performance losses. I plan on these DB's growing to very large numbers.

Has anyone had real experience with this? How big have you gotten these databases and what issues did you see? Are you happy with the setup?

Thanks,
Michael

It seems that everyone here

It seems that everyone here tries to keep you to using files :)

Try to look at how various unix programs keep their cache and indexes, especially at squid.

You'll see a pattern of using a hierarchy of nested directories without more than 256 subdirs or files (leaves in the tree) in any given dir.

As Stewart pointed out its

As Stewart pointed out its best to avoid having a large number of files in any one directory (partly for your own sanity - if you type ls in a directory with many files you can wait a long time for a response). Perhaps you could design a suitable directory tree structure and hashing algorithm to store the images.

As I understand from PDF

As I understand from PDF flickr doesn't use DB for storing images :
http://www.niallkennedy.com/blog/uploads/flickr_php.pdf

I second the thoughts of

I second the thoughts of James, this is one of the ideal situations when using MySQL makes great sense.

I remember last year working on a news system where a leading news provider was delivering us so many images that managing them was becoming a real mess.

If you do go with storing image, like James says, make sure to put thumbnails in a different table than main images.

Best of luck
Frank

If your servers crunch under

If your servers crunch under only 2 million files, you're probably using the wrong file system. Remember, on ext3, things get sad with more than a few hundred files in a directory. XFS (and reiser) do a *lot* better.

One way to "replicate" between filesystems on different machines could be to have a central NFS share for "incoming" with a directory for each slave. Copy the file into an "incoming" directory, hard link into directory for each slave. As a script running on the slaves finds files in its directory, it copies it to its local storage and unlinks (rm) it from its incoming tree.

If you need to wait until a file is copied to a slave - just check for it in the incoming directory of that slave!

To resync/add a new slave - add a directory in the incoming tree (as your scripts are smart about this and look at what directories are in the incoming tree) and rsync the data across!

What's extra tricky dicky is to serve the images directly off these machines - i.e. run an unlta-light web server! This way your application (running in, for example PHP) runs on one box just generating URLs to another - load balancing!

You've given a good example

You've given a good example of a case where storing images in the database makes sense. Images and video can also benefit from the database advantages like replication for spreading the load. Yes, people do store video, in chunks, in MySQL databases.

What you might do is store in a metadata table an object location. You can then specify that location in many ways, one of them being in a particular set of database servers, another being in the filesystem. Then you have the flexibility to do things like have a size or other threshold above which you use the filesystem, as well as a migration strategy, since you can start with them all pointing to the filesystem and migrate them over time.

If you use thumbnails I suggest putting them in a different table from the full size images and both different form the metadata. That is likely to help cache efficiency, since the thumbnails are likely to be far more frequently accessed.

Wikipedia doesn't (yet) store images in the database but it does use this object approach to store text records first uncompressed alone, then compressed combining many together for higher compression, and finally into sets of databases running on regular web servers. Effectively, hierarchical storage.

You should also investigate the MogileFS system developed for use by LiveJournal.