Saturday, September 29, 2007

MogileFS Storage engine!

I came across this today, It seemed interesting.. MogileFS is intended for storage hungry applications, its all about spreading your files across cheap devices on different hosts, something like RAID+NFS+DataReplication.

The Idea is very nice and simple, you have multiple servers, and every server has multiple devices, you sum up all these storage units into one big storage, you have a tracker application that you consult when reading or writing to this huge storage, and the tracker take responsibility of saving your data and making sure that your data is available even if multiple hosts went off line.

This application just came in time, we just had an idea of a project that takes images from your server and store it on a network storage, so if something wrong happens to your server you can simply take this image and restore it back to your server, or you can even restore this image on a different server to clone it, or something like that. The challenge was where to store all of these images. By doing a simple calculation, if you have 100 users and every user has a 10 G.B. image, then you are bound to maintain a tera of storage.. and scalability will be an issue.

With MogileFS you will gain three advantages here, one, you will have cheap disks on cheap servers with your storage distributed on it. Two, you will gain from this distribution by installing the application on all of these servers, and so gaining high availability. Three, scaling will be as simple as adding a server to this farm. So with about half the price of a SAN and its expensive disks, you will get high availability for your storage and application. Ofcourse we will have to manage this distributed environment. One of the ways to tackle it is to create no slave architecture, all servers are masters, and every server can detect on which server the user’s image is stored by consulting the tracker. So when a user logs in, he will first go to any server according to Round and Robin algorithm, and from this server he will be redirected to the server storing his image, where he can get served, while eliminating the network communication overhead.


This architecture can be implemented with any storage intensive application, or any application that used to rely on NFS, as NFS has proven to be unreliable in heavy production environments.

I like this tool very much, and I can’t wait to test it on our application.. so I will keep you posted with any updates.

No comments: