Resources



Articles and Tutorials

Economical Use of Amazon S3 with Ruby on Rails

Click for a printer friendly version of this document Printer Friendly Save to del.icio.us
 

Building on his Introduction to AWS for Ruby Developers tutorial, this second article from Robert Dempsey shows you how to efficiently make requests to Amazon S3.

AWS Products Used: Amazon S3
Language(s): Ruby
Date Published: 2007-09-18

By Robert Dempsey of Atlantic Dominion Solutions, LLC

Amazon Simple Storage Service (Amazon S3) is a great utility--virtually limitless storage at a fraction of what it would cost to purchase and maintain storage servers. As of June 1, 2007, the Amazon S3 pricing plan includes a small charge for each request made to the service. These fees, while being quite minimal, can add up. In the first Amazon Web Services (AWS) article, Introduction to AWS for Ruby Developers, we saw how easy it was to use Amazon S3 in our Ruby on Rails applications. This tutorial will go a step further by proposing a solution to keep your Amazon S3 storage costs down, which makes integration of Amazon S3 and Ruby on Rails easy and economical.

This article shows you step-by-step how easy it is to integrate Amazon S3 into your Rails application, using freely available tools, and shows you how to keep your total costs to a minimum. It assumes an intermediate-level understanding of the Ruby programming language and the Ruby on Rails framework.

Amazon S3 Pricing

Before figuring out how to keep our costs down, let's look at the costs of using Amazon S3. The pricing structure is simple:

Storage
$0.15 per GB-month of storage used

Data Transfer
$0.10 per GB - all data transfer in

$0.18 per GB - first 10 TB/month data transfer out
$0.16 per GB - next 40 TB/month data transfer out
$0.13 per GB - data transfer out/month over 50 TB

For those not familiar with the terms "data transfer in" and "data transfer out," this means that you pay for the size of the data (such as images or files) that your application sends to or requests from Amazon S3.

Note: Amazon does not charge for data transferred between Amazon Elastic Compute Cloud (Beta) (Amazon EC2™) and Amazon S3. Bonus!

Requests
$0.01 per 1,000 PUT or LIST requests
$0.01 per 10,000 GET and all other requests (but no charge for DELETE requests)

Got your spreadsheet open and ready to calculate your estimated costs? Close it, and go to the AWS Simple Monthly Calculator. Not only can we use this calculator for Amazon S3, but if we use it to deploy our Rails application to Amazon EC2 as well, we can estimate that cost here. For now, let's look at how we can keep our costs down.

The Obvious Solution

The most obvious solution appears to be deploying our application using Amazon EC2 and use Amazon S3 for the storage. This way, we don't have to pay for any data transfer whatsoever. For the sake of argument, let's say that we aren't yet ready to make that jump. How else can we use these tools economically? We look again to the pricing chart in the first paragraph of this tutorial, and see that PUT and LIST requests are $0.01 for every thousand, versus GET and all other request, which are $0.01 for every ten thousand. The solution: pack and ship. That's where our savings will come from.

Pack It Up and Ship It Out

The "pack and ship" solution entails the following:

  • Bundling - bundle multiple files into a single package
  • Compression - compress the packages before sending
  • Timed Delivery - sending compressed packages to Amazon S3 when we have 10 files to package
  • Local Package Tracking - maintain a local list of what files are in which package

Ultimately, while this solution maintains the number of required GET requests, it reduces the amount of PUT requests and therefore reduces our overall cost. Don't believe it? Let's take a look at the numbers. Let's say we have a total of 10,000 files, each of which is 1 MB in size.

Standard data transfer

Number of files # of PUTs # of GETs PUTs cost GETs cost Storage cost Data transfer cost Total Cost
10,000 10,000 10,000 $0.10 $0.01 $1.46 $0.98 $2.55

Pack and ship data transfer - We will store 10 files in each package and assume 30% compression

Number of files # of PUTs # of GETs PUTs cost GETs cost Storage cost Data transfer cost Total Cost
10,000 1,000 10,000 $0.01 $0.01 $1.03 $0.68 $1.73

That's a huge difference! By packaging and compressing our files, we've reduced our total cost by $0.82, or 30%. Well, let's get started making this happen. We list much of the code in the following section; however, you can download the entire application and follow along if you prefer.

Pack and Store

If you read our first article on using Amazon S3 with Ruby on Rails then you will be familiar with some of the code that we will use here. As in the previous article, we are going to need to the following:

  • A text editor of your choice
  • Ruby 1.8.4 or later
  • Rails 1.2.3
  • An active Amazon Web Services account
  • An Amazon S3 Access Key ID
  • An Amazon S3 bucket to store our files
  • The attachment_fu plug-in
  • tar - the utility, not the hot stuff
  • An active Internet connection to install the RubyGems and plug-ins

Let's start the coding!

Note: Although I use a Mac for development, the following commands are the same regardless of your operating system of choice. You just need to have the tar utility installed on your computer and be able to run the "tar" command in your command-line tool. The following code was created using Ruby 1.8.6 and Ruby on Rails 1.2.3.

First, let's install all the gems we will need. Open a command-line tool and type the following:

$ sudo gem install aws-s3 archive-tarsimple

Now that we have all of the gems we need, we create a new Rails application. Go to the directory where you keep all of your Rails applications and type the following:

$ rails economical_s3

To be able to use the archive-tarsimple Ruby gem, we have to configure our application controller to provide access to all of its resources. Open app/controllers/application.rb and make it look like the following:

require "archive/tarsimple"

class ApplicationController < ActionController::Base
  include Archive
  
  session :session_key => '_economical_s3_session_id'
end

That was easy. Now, let's go into the economical_s3 directory and install the attachment_fu plug-in. We will use attachment_fu for the main product file upload, but for Amazon S3, we are going to use the Amazon S3 gem directly.

$ cd economical_s3
$ script/plugin install http://svn.techno-weenie.net/projects/plugins/attachment_fu/

To use Amazon S3, we need to have an Amazon Web Services account, and then set up a bucket to hold our files. If you haven't already created your free Amazon Web Services account, do that now. When we all have an AWS account, let's set up a bucket. Go back to your command-line tool and type the following:

$ s3sh

The s3sh tool is like irb for the Amazon S3 gem. We will use this tool to connect to Amazon Web Services and create our bucket. First, we need to connect to Amazon S3 using our AWS credentials.

>> AWS::S3::Base.establish_connection!(
:access_key_id => 'your access key',
:secret_access_key => 'your secret key'
)

Now that we are connected, let's create our development bucket. Keep in mind that every bucket name must be unique, so name yours something fun.

>> Bucket.create('economicals3_development')

Now we have our bucket. Type exit and press ENTER. Open your favorite text editor, and then open config/amazon_s3.yml. Enter your bucket_name, access_key_id, and your secret_access_key where required. You can find the access key values in your Amazon Web Services portal. When you have the keys entered and a bucket created, we are ready to code our application.

Let's take stock of where we are now. We installed our RubyGems, created a Rails application, installed our plug-ins, configured our application for Amazon S3, connected to Amazon S3, and created our bucket. In the previous tutorial we used the attachment_fu plug-in to upload our attachments directly to Amazon S3. Today we will use it to temporarily store our product files in the file system. Every time a new file is uploaded, we check to see if we have 10 we can package together. When we hit 10, we create a package, ship it to Amazon S3, and remove the files from the file system. Believe it or not, this is a lot simpler (thanks to Ruby and Rails) than we might think. Let's get to it!

As with any Rails application, we need to create our databases and configure our connection. Let's keep it simple and name the database economical_s3_development. Then, open config/database.yml and type your server credentials. After we've done this, we create our scaffold files for our file management. Run this little command and observe all the files that Rails happily generates for us:

$ script/generate scaffold_resource ProductFile description:text size:integer content_type:string filename:string package_id:integer uploaded_on:datetime created_at:datetime updated_at:datetime

Next we need a place to keep track of our packages. These packages will contain our product files. After we generate the scaffold code for the packages, we will take a look at the relationships between product files and packages.

$ script/generate scaffold_resource Package filename:string created_at:datetime updated_at:datetime

Note that we are using the scaffold_resource generator instead of the scaffold generator. This will create our migrations and all of our forms with the database columns we want, all ready to go. One command does it all for us. Let's make sure that is true with a quick check of our migration files. Looks good. Let's run them:

$ rake db:migrate

So far, so good. Now that we have code and database tables, let's configure our ProductFile and Package models accordingly. First, we need to set up our relationships.

class ProductFile < ActiveRecord::Base<br />  belongs_to :package
end
class Package < ActiveRecord::Base<br /> has_many :product_files
end

Next we configure the ProductFile model to use attachment_fu. We don't need to do anything with the Package model.

Note: The following code limits the size of the files to between 1 KB and 500 MB. You can easily change this. Please see the README file that comes with the attachment_fu plug-in for all the details.

class ProductFile < ActiveRecord::Base<br />  belongs_to :package
has_attachment :storage => :file_system, :path_prefix => 'public/files/product_files', :size => 1.kilobyte..500.megabytes
validates_as_attachment
end
class Package < ActiveRecord::Base<br /> has_many :product_files
end

The last step before delving into the controllers and making the magic happen is to update the "new" view of our product files. Open views/product_files/new.rhtml, remove all of the form fields accept for description, add one more field for uploaded_data, and change the form_for tag.

  <h1>New product_file</h1>
  <%= error_messages_for :product_file %>
  <% form_for(:product_file, :url => product_files_path, :html => { :multipart => true }) do |f| %>
    <p>
      <b>Description</b><br />
      <%= f.text_area :description %>
    </p>
    <p>
          <b>File</b><br />
                <%= f.file_field :uploaded_data %>
    </p>
    <p>
      <%= submit_tag "Create" %> or <%= link_to 'Back', product_files_path %>
    </p>
  <% end %>

Just for fun, let's run script/server at the root of our Rails application and see if we can upload a file. Done? Great. Our next step is to add the functionality to keep track of the number of product files we have, and when that number hits 10, we create a package and send it to Amazon S3. The first thing we need is a place to store our number. We will keep it simple and store that number in our database. Create the migration, create the model, and then add the counter to the table.

$ script/generate migration CreateProductFileTrackingTable
$ script/generate model ProductFileCount

Open the migration we just created and edit as follows:

  class CreateProductFileTrackingTable < ActiveRecord::Migration
    def self.up
      create_table :product_file_counts do |t|
        t.column :file_count, :integer, :default => 0, :null => false
        t.column :last_uploaded_file, :integer, :default => 0, :null => false
        t.column :updated_at, :datetime
      end
      
      ProductFileCount.create(:file_count => 0)
    end

    def self.down
      drop_table :product_file_counts
    end
  end

This will generate our table and, because we already created our ProductFileCount model, it will create a blank record to work with in our product_file_counts table. Delete the migration that was generated when we created our ProductFileCount model (004), and run the new migration.

$ rake db:migrate

What we need to do now is add a method to our product_files controller to increment our counter every time we upload a file. We will use this same method to check the value in the database, and if we are at 10, it will make a call to the packages controller and create a package. Let's add a method to our product_files controller to handle the incrementing. This code is the heart of our savings plan, and is heavily commented, and it generates log statements all over the place so we can see what is going on.

After we define our method we create a counter variable and use it to increment our file_count column in the product_file_counts table.

Note: All of the following code is a single method in product_files_controller.rb.

  def increment_and_package
    @incrementer = ProductFileCount.find(:first)
    @incrementer.file_count += 1
    @incrementer.save

When that is done, we run our check to see if we have hit 10 files. If we have, we go to the database to get the last 10 files that have been uploaded. It is these 10 files that we will pack and ship.

    if @incrementer.file_count == 10
      # Check the last_uploaded_file value, if it isn't 0, we can pack and ship
      if @incrementer.last_uploaded_file != 0
        # Grab the last 10 files
        logger.info "*** We are grabbing the last 10 files ***"
        @files_to_pack = ProductFile.find(:all, :conditions => ['id > ?', @incrementer.last_uploaded_file], :limit => 10)
      else
        logger.info "*** We are grabbing the last 10 files ***"
        @files_to_pack = ProductFile.find(:all, :conditions => ["id <= ?", 10])
      end

When we have the 10 files, we create a tar file locally and add the 10 files to it. Because every file in a single bucket needs a unique name, we will use SHA1 (Secure Hash Algorithm version 1.0) encryption on the current time and use that for the file name.

      # Create a new tar file - use the time to make it unique
      enc = Digest::SHA1.hexdigest(Time.now().to_s)
      tar_file = Archive::Tar.new(enc + ".tar")
      # Add the files to it
      logger.info "*** Time to pack it up ***"
      for prod_file in @files_to_pack
        tar_file.add_to_archive("public" + prod_file.public_filename)
      end

Now, to save space, not only are we going to pack up the files, we also need to compress them in the tar file. We are going to use bz2 (Bzip2) compression.

Note: Depending on the file type, the amount of compression will vary.

      # Compress the file for packaging
      logger.info "*** Time to compress it ***"
      tar_file.compress_archive("bzip2")
      filename = tar_file.archive_name + ".bz2"
      my_file = File.open(filename)

After we have our compressed file, we need to read in the Amazon S3 config file we created.

      # Read the config file and get the info to connect to S3
      logger.info "*** Reading the S3 config file ***"
      conf_file = RAILS_ROOT + '/config/amazon_s3.yml'
      s3_conf = YAML.load_file(conf_file)[ENV['RAILS_ENV']].symbolize_keys

Now we connect to Amazon S3 and find the bucket we will be using.

   
      # Connect to S3
      logger.info "*** Connecting to S3 ***"
      conn = AWS::S3::Base.establish_connection!(:access_key_id => s3_conf[:access_key_id], :secret_access_key => s3_conf[:secret_access_key]) 
      
      # find our bucket
      logger.info "*** Finding out bucket ***"
      AWS::S3::Bucket.find(s3_conf[:bucket_name])

After we are connected and we have our bucket, upload the file.

     
      # Upload our file
      logger.info "*** Uploading our file ***"
      logger.info "*** The file name is: " + filename
      if AWS::S3::S3Object.store(filename, open(filename), s3_conf[:bucket_name])
        logger.info "*** File upload successful ***"

If the upload was successful, we create a new package using the name of our file, and we update our counter.

        # Create a new package record from the filename
        new_package = Package.create(:filename => filename)
        # Update the product_file_counts.last_uploaded_file with the id of the last product_file in the hash
        logger.info "*** Incrementer time ***"
        @incrementer.last_uploaded_file = @files_to_pack.last.id
        @incrementer.file_count = 0
        @incrementer.save

Next we update the record of each product file we uploaded with the upload time, and remove it from the file system. Lastly we remove our compressed file.

        # Update the product_files in @files_to_pack with the new package id and remove the file
        logger.info "*** Updating and deleting the product files ***"
        for prod_file in @files_to_pack
          prod_file.package = new_package
          prod_file.uploaded = Time.now()
          prod_file.save
          # Remove the product_files from the file system
          File.delete(RAILS_ROOT + "/public" + prod_file.public_filename)
        end
        
        # Finally remove the local bz2 file
        File.delete(filename)
      end # end file upload
      
    end # end @incrementer.file_count
  end # end increment_and_package

When we have our method in place, we update the create method to make it all happen.

  @product_file = ProductFile.new(params[:product_file])

  respond_to do |format|
    if @product_file.save
      self.increment_and_package

If we have done our job, when we hit 10 uploads, the last 10 uploaded files will be packaged, sent to Amazon S3, and the locally stored files will be removed after updating the package_id field.

The Next Level

I am sure that you noticed a few things missing, such as the ability to retrieve the packages. Well, if we showed you how to do everything, what fun would there be for you all? One more way to take this application to the next level is to separate out the Amazon S3 upload, and upload the files using backgrounDRb, fully automating the upload and keeping the UI responsive. What else do you think you could add?

Conclusions

There is no doubt that with scalable, on-demand, pay-as-you-go storage, the Amazon Web Services tool Amazon S3 is shaking up traditional hosting models. Freely available RubyGems and Rails plug-ins makes Ruby on Rails the ideal platform for creating web-scale applications that easily take advantage of these services. Using the built-in capabilities of Ruby we lower our total costs of storage even further and receive a faster return on investment.

Learning More About Amazon Web Services

This article highlights a few aspects of working with Amazon Web Services. Here are a few more resources available to Ruby and Rails developers to help you learn more:

Common AWS Resources

  • Introduction to AWS for Ruby Developers - A great introduction to using Amazon Web Services and Ruby on Rails.
  • Amazon Web Services - Learn more about each web service on the AWS web site.
  • Developer Connection - The community web site for AWS developers includes forums on AWS, a Solutions Catalog for examples of what your peers have built, and more.
  • Resource Center - Part of the Developer Connection web site, the Resource Center has links to tutorials, code samples, technical documentation, and other resources for building your application on AWS.

Great Resources for Ruby and Rails Developers

Ruby and AWS Real-World Examples

Here are some web sites that use Amazon Web Services and Ruby on Rails:

References

About the Author

After eight years as an MCSE and project manager, Robert Dempsey jumped from IT management and PHP/Visual Basic.NET development to Ruby on Rails. He is the project director of Atlantic Dominion Solutions, a Ruby on Rails development firm, and has recently launched Rails For All, a not-for-profit organization dedicated to promoting the use of Ruby on Rails. In addition, Robert presents on a regular basis at the Orlando Ruby Users Group, and has begun giving talks to Java user groups on topics including JRuby and Ruby on Rails.



Related Documents
Type: Articles & Tutorials Introduction to AWS for Ruby Developers
Type: Sample Code Code Sample for the Economical Use of S3 with Ruby on Rails Article

Discussion
Click to start a discussion on this document Create a New Discussion
No discussion has been created for this document.

Reviews
Create Review Write a Review
Be the first to review this.
Welcome, Guest Help
Login Login