How EC2 gave us a 130x throughput increase in generating millions of images

April 8, 2011

This post originally appeared on SlideShare's engineering blog

Images are a key part of the zillions of websites out there, and SlideShare is no different. A few weeks back, we made some optimizations to load images below the fold lazily. In this post, we discuss some more optimizations we have done to user profile pictures.

Across SlideShare, there are primarily 3 sizes of profile images used – 100×100, 50×50 and 32×32. As the site evolved over a period of time, we do not have images of these 3 sizes for the pictures of all the users. Running a query through our db revealed that 2 million profile images have a single size – 100×100. In places were we needed a smaller size, we were letting the browser do the resizing as needed.

At this stage, we were faced with the mammoth task of generating 3 different sizes for each of these 2 million images. For each image, we had to make a http call to Amazon’s S3 storage service, read the image into memory, use RMagick to generate 3 variants and upload each of these variants back to S3. We put together a small ruby script to loop over the 2 million images and do all these tasks.

A preliminary test on my dev machine showed us that it would take 27 days to run through all images, if only 1 machine and 1 process was used! The throughput we were achieving was roughly 1 image per second.

27 days was too long and we *had* to bring the time down to a few hours. Amazon EC2 instances to the rescue. We tested out with a single EC2 machine with 100,000 images and 4 processes running simultaneously. We were able to run through the 100,000 images in under 3 and half hours – which would mean a throughput of around 7 images per second, which is not really bad! But still this throughput means the time would only go down from 27 days to 4 days. Not good enough.

We then created an AMI (Amazon Machine Image) out of the first instance and generated 19 more replicas of the first instance. This multiplied the processing power at hand by 20. The numbers looked good now – A combined throughput of 140 images per second !! Each EC2 instance taking care of 100,000 images – the entire exercise was complete in a matter of 4-5 hours!!

Number of machines Number of processes Images handled per second Total time taken
1 dev machine 1 1 27 days
1 EC2 instance 4 7 4 days
20 EC2 instances 4 140 5 hours

 
The core part of the code was this:

# Generates the 3 image sizes and uploads them back to s3
# @param [String] login login of user for whom pics are to be generated
# @param [Object] img The RMagick::Image object that has to be resized
def generate_and_upload(login, img)
  SIZES.each { |key, size|
    suffix = "."+ img.format.downcase
    headers = {"Content-Type" => "image/#{img.format.downcase}"}
    filename = "profile-photo-#{login}-#{size}x#{size}"
    writeTo = $ss_convert_store + filename + suffix

    #generate
    newfile = img.resize(size, size)
    newfile.write(writeTo)

    #upload
    $awsHelper.put_file_with_key(writeTo, filename)

    #cleanup
    FileUtils.rm(writeTo)
    $progress.write(size.to_s + " ")
  }
end

At the end of the exercise, we reached a state where all users on SlideShare had these 3 variants of images, and a couple of days later, our codebase was updated to make use of these new images, instead of the original 100×100 variant. This is a nice-to-have performance win, and combined with lazy loading, our slide-view page is now even faster. Also, a few tweaks were made to the image uploading mechanism to give more control to the users. We now generate all required sizes when a new profile image is uploaded.

Overall, the project was an exciting one, with work ranging from frontend jquery plugins to EC2-S3 interactions and ruby scripts running at web scale.

Topics