We've been using MongoDB in production at AJJ, Inc. for almost two years now.
We initially chose Mongo because we needed to store large SOAP responses (500k/ea on average) and we were in need of a way to cache them locally. Everything worked well, and I had no complaints or gripes (unlike many others).
When things seemingly started to go haywire last Friday I wasn't sure where to look. Customers were having trouble accessing the application, though no exceptions were being thrown. After logging into our Mongo server the problem was immediately apparent:
[FileAllocator] allocating new datafile /var/lib/mongo/regapp_production_db.5, filling with zeroes... [FileAllocator] creating directory /var/lib/mongo/_tmp [FileAllocator] FileAllocator: posix_fallocate failed: errno:28 No space left on device falling back
Disk Space Damage Control
Doh. Our disk was out of space. The first thing to note is that when a MongoDB server runs out of disk space the process immediately goes into read only mode; any writes to the database (including deletes) will be blocked. To re-enable writes to the database you have to first restart the mongod service.
After restarting the database I was able to remove some of our oldest cached requests manually by hand, so that at the very least our application server could resume processing requests once again. Awesome, our application was at least functioning, albeit not for long. I knew I had around 3k requests before it filled up again.
Resizing the root filesystem on an Amazon EC2 instance
Our MongoDB instance runs on an Amazon EC2 micro instance, with our filesystem still in poor shape I immediately scheduled 10 minutes of maintenance downtime with my boss and logged into the AWS console.
Ensuring data integrity
Anytime you mess around with a partition you are risking the integrity of the data on the partition. The fact that this was a production server and there were incomplete registrations made me very nervous. Based off what I read others have suggested that you do not need to be careful when creating EBS snapshots, but still I did not want to risk it. I safely powered down the mongod serve and then the ec2 instance..
sudo /etc/init.d/mongod stop sudo shutdown -h 0
Snapshotting the EBS volume
Knowing that our MongoDB server was in a safe state, I created a snapshot of the EBS volume I needed to resize.
Creating a new volume
The next step was to create a new volume. It is important to note that if the instance runs in us-east-1a, the new volume you are creating needs to be in the same datacenter. When you create the new volume you should choose the id of the snapshot you created in the previous step.
Attach your new volume
After creating the larger volume, go back to the EC2 Instances tab and detach the old volume. After the old volume is detatched you can attach the new volume to the root filesystem (/dev/sda1). After you do this you will need to go to the Elastic IP's tab and reattach the EIP to the EC2 instance. Once everything is reattached you are good to restart the EC2 instance.
Resizing the volume
When you restart and reconnect to the EC2 instance, if you do a df -h you'll see that nothing has changed, yet:
~ $ sudo df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 7.9G 7.8G .1G 99% /
The last and final step is to resize the filesystem:
Now when you do a df -h you should see:
~ $ sudo df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 74G 8.1G 65G 12% /
After this you're good to restart mongodb and start your application server back up.
It is worth noting that while I had only planned for this to take 10 minutes, from now on I will take my estimate and times it by 4, that way when things go longer than expected, you already have extra time allotted. When things take the proper amount of time, you'll look better because you got done 3x faster than you'd expected.