Resources



Articles and Tutorials

Patching AMI Instances with Updates on Amazon S3

Click for a printer friendly version of this document Printer Friendly Save to del.icio.us
Average Review:

Making a new AMI is time consuming and requires significant temporary storage space and CPU cycles. This article shows you how updating your AMIs' content at startup from Amazon S3 is a cost-effective way to maintain web sites and other services on Amazon EC2.

AWS Products Used: Amazon EC2, Amazon S3
Language(s): Ruby, Other
Date Published: 2008-04-14

By PJ Cabrera, freelance software developer

Change is Constant

When using the Amazon EC2 service, you typically create an AMI to go with the specific service you are trying to host. You pick your favorite distribution, load it with the software and content for your service, and save everything onto an AMI stored on Amazon S3.

However, most content and software changes over time. Clients want their sites updated, and software needs to be modified to support new features--usually in a long cycle with no real perceived end. Sites and software stop being updated only when they are no longer important, and nobody runs a venture with the goal of becoming unimportant. Change is the only constant in the life of the systems we create.

Let's be honest, sometimes you don't even need a change request. You may update the operating system, utilities, or configuration to protect against known or perceived vulnerabilities. Or, you might upgrade or add new development libraries to use a new feature in your favorite development environment.

The Cost of Change

Regenerating AMIs for each content change, however small, seems to be an immutable law of using Amazon EC2. If you don't regenerate your AMI, your changes will be lost if the instance gets shut down! However, every change requires CPU, bandwidth, and storage expenses to transfer the changed files, generate the new image, and store it on Amazon S3. And just to be safe, you keep the old AMI around in case you need to roll back the system to its previous state.

Let's assume, for the sake of argument, that the minimum size of an AMI with operating system, web server, development environment of choice, libraries, system utilities, and content is 600 MB. If you have several systems running on Amazon EC2, each with their own tailored AMI, you need to generate and store another 600 MB AMI each time you make a system change, no matter how small the change. This requirement is wasteful in the extreme.

If you decided to build a new house each time you wanted to rearrange the furniture or put a new painting on the wall, the cost in materials, time, and effort would compel you to limit changes to those that were absolutely necessary. Fortunately, you have alternatives.

Minimizing the Cost

Rather than create and store a new AMI for each and any change to the systems you maintain, you can create generic AMIs that contain most of your software, then store outside the AMI the content that is most likely to change. This approach minimizes the amount of storage that you must churn through to store a small change. Rather than update the whole AMI for content or software changes, you update only the storage area outside the AMI. You need regenerate the AMI, which will typically be much larger than your content- and site-specific software, only when the base system needs to be updated, such as in the case of operating system patch releases.

Using Amazon S3 for storage in Amazon EC2 deployments is free of transfer costs. This makes Amazon S3 the ideal place to store changing content. I'll also show you how to automate the use of Amazon S3 for your software changes so that your AMI will update itself at startup, with content from an Amazon S3 bucket.

As the article "Using Parameterized Launches to Customize Your AMIs," explains, the last startup script to run on the Fedora Core Linux distribution is called /etc/rc.local. You can modify this script in your AMIs to download, from a specific Amazon S3 bucket, the sets of changes to perform on your system and to apply them to an instance.

The changesets can consist of any kinds of files. The example in this article relies on a script called autostart.sh, tucked inside a compressed .tar file, along with any other files you want to include. You have all the flexibility you need to manage your system changes. The important part is storing changes in Amazon S3, downloading them at instance startup, and running the script to apply the content to the running instance.

Implementing the System

To keep things simple, you'll use the s3sync utilities introduced in the article "Using Amazon S3 from Amazon EC2 with Ruby." These utilities simplify uploading to and downloading from protected Amazon S3 buckets. Install the s3sync utilities to the /usr/local/s3sync folder, and make sure that this folder is added to the system path:


## Edit /etc/profile and add /usr/local/s3sync to the $PATH
# vi /etc/profile

# cd /usr/local

# wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz

# tar xzf s3sync.tar.gz

Besides installing s3sync, you'll need to modify /etc/rc.local. In this example, you'll hard code the name of the Amazon S3 bucket and Amazon authentication keys, but you could use this system along with parameterized launches to make the script much more flexible.


## Copy the following to the end of your /etc/rc.local file

# replace xxxx with your AWS access key
AWS_ACCESS_KEY_ID=xxxx

# replace yyyy with your AWS secret
AWS_SECRET_ACCESS_KEY=yyyy

# change this to your own S3 bucket name
S3_CHANGESETS_BUCKET=pjc_changesets

mkdir -p /tmp/changesets
cd /tmp/changesets/

## This downloads all compressed .tar files stored in the Amazon S3 bucket.
## To keep things simple, this script does not recurse into folders.
s3sync.rb $S3_CHANGESETS_BUCKET:/*.tgz /tmp/changesets/

for file in *.tgz
do
   tar xzpf $file

   dirname=`echo $file | sed -e 's/.tgz//g' `

   if [ -d $dirname ]; then
      pushd $dirname

      if [ -e autorun.sh ]; then
         /usr/bin/bash autorun.sh
      fi

      popd
   fi
done

The script downloads all files with extension .tgz into the folder /tmp/changesets/. Then the script loops over the .tgz files, in alphanumeric order, extracting the content and executing the autorun.sh file inside. Really simple. All the magic is in autorun.sh and in the other content that you've stored in the compressed .tar file.

For example, let's say you have an Amazon EC2 instance serving up a Rails application, and you want to pull fixes from Amazon S3. To start, create one compressed .tar file, called myrailsapp01.tgz and containing the complete Rails application and the following script to deploy the application:

## autorun.sh
##
## Deploy the contents of the compressed .tar file. The Rails application 
## is in a folder named myrailsapp. You need to move it to /var/railsapps 
## and make a symlink of the public folder on the web server document 
## root. Then,  move a web server configuration file that sets up the 
## virtual host.

mv myrailsapp /var/railsapps/

ln -s /var/railsapps/myrailsapp/public /var/www/myrailsapp

mv myrailsapp-virtualhost.conf /etc/httpd/conf.d/

Suppose that you later need to make a change to myrailsapp. You have at least two options: You can recreate the file myrailsapp01.tgz with the new content, or you can make a patch by using the diff -u command :

# diff -u myrailsapp-previous myrailsapp-current > myrailsapp-patch.diff

Then, pack the patch into the file myrailsapp02.tgz by using the following autorun.sh script:

## autorun.sh
##
## Apply patch to myrailsapp.

cd /var/railsapps/myrailsapp

patch -u -p2 < myrailsapp_patch.diff

If you upload myrailsapp02.tgz to the Amazon S3 bucket, the next time the Amazon EC2 instance is started, the modified /etc/rc.local script will download both compressed .tar files, myrailsapp01.tgz and myrailsapp02.tgz, unpacking them in alphanumerical order and processing the autorun.sh script in each archive.

Conclusions

You might prefer to recreate the compressed .tar file with the changed content rather than use diff files. Using the modified /etc/rc.local script and your Amazon S3 bucket, you can put anything you want into your compressed .tar files, and the autorun.sh script gives you unlimited flexibility in deploying anything to your AMI instance. These files don't need to hold "content" in the usual sense at all.

When you combine the /etc/rc.local script with the concepts shown in the "Using Parameterized Launches to Customize Your AMIs," you can have the ability to specify at launch which Amazon S3 bucket your AMI instance should use to get its changesets.

I hope the concepts in this example have sparked some ideas about how to simplify change deployment to your AMIs, increasing bandwidth and storage efficiency.

PJ Cabrera is a freelance software developer specializing in Ruby on Rails e-commerce and content management systems development. PJ's interests include Ruby on Rails and open-source scripting languages and frameworks, agile development practices, mesh networks, compute clouds, XML parsing and processing technologies, microformats for more semantic web content, and research into innovative uses of Bayesian filtering and symbolic processing for improved information retrieval, question answering, text categorization, and extraction. You can reach him at pjcabrera at pobox dot com, and read his weblog at pjtrix.com/blawg/



Related Documents
Type: Sample Code s3sync and s3cmd in Ruby

Discussion

The 5 most recent discussion messages. View full discussion.

Sriram Natarajan
Posts: 1
Registered: 6/19/08
Patching AMI Instances with Updates on Amazon S3
Posted: Jun 19, 2008 12:50 AM PDT
 
  Click to reply to this thread Reply

Just a thought... if some one is using OpenSolaris based AMI, then you can simply leverage on the ZFS file system - default within OpenSolaris and take snap shot and can roll back based on it, if necessary

create a location where you want to host the site
pfexec /usr/sbin/zfs create rpool/var/railapps
pfexec /usr/sbin/zfs set mountpoint=/var/rails rpool/railapps
pfexec /usr/sbin/zfs snapshot rpool/railapps

pfexec /usr/sbin/zfs rollback rpol/railapps


PJ Cabrera
Posts: 1
Registered: 11/12/07
Re: Patching AMI Instances with Updates on Amazon S3
Posted: Jun 23, 2008 2:40 PM PDT   in response to: Sriram Natarajan
 
  Click to reply to this thread Reply

Hi Sriram, thanks for your comment.

Your assertion about OpenSolaris is correct, but only if your AMI instance is using persistent storage. Persistent storage is not the default configuration for most instances and is still in limited beta availability.

Even with persistent storage, I hope you're taking advantage of the inexpensive S3 cloud to store backups of important data and code. We all back up our data, right? ;-)


Reviews
Create Review Write a Review

Wildcard problems, Jun 27, 2008 6:18 AM
Reviewer: mckenfra
Great tutorial, but the following does not work for me: s3sync.rb $S3_CHANGESETS_BUCKET:/*.tgz /tmp/changesets/ When I run this with my bucket, it doesn't return anything, because it doesn't interpret the wildcard. Any idea why this works for you and not me?

Distinguishing OS and Frameworks from Application, Sep 10, 2008 8:18 AM
Reviewer: "sublime1"
As I understand it, this methodology is mainly for changing the parts of the system you're in control of -- your application, like Rails. This is what I would call a software release, and maybe using a patch approach is a good way of handling this. But it seems like reinventing a wheel. Or maybe I am not understanding what your objective is. But is this really the best approach for managing the stack of applications (e.g. OS fixes/updates, plus Apache, MySQL, postfix, java, etc.) that are maintained by a third party and installed as packages? Before you save your customized AMI the first time, you might use yum or apt-get to make sure the AMI is up to date, then save it to an S3 bucket. But a month later, perhaps a new version of Apache is available, or a security patch to postfix, or Linux comes out? Sure, you could run yum or apt-get every time you fire up an instance, but that's not a fast process, and gets longer over time, and would need to be repeated for each AMI (in which case, you would be better off just running the update from within each AMI instead of reloading. But this creates a chance for instances to become out of sync. So I am probably just missing the point on this, but it might be worth clarifying the use case of your solution. Tom

Great idea and writeup! But..., Aug 23, 2009 6:48 PM
Reviewer: Paul Ryan
...the article seems a bit out of date, as a crucial line of the init script is (now) inaccurate. The learning curve to figure out what tweaks to make was steep (for me), but I took the hill in the end. The central issue is the one pointed out by the reviewer mckenfra (gulp) a year ago: S3 doesn't accept wildcards in specifying which files to pull out of S3. Of course this need is key to the concept, so a workaround was needed, and that turned out to be to use a tricky regex with the --exclude parameter with s3sync.rb. So that line became /usr/local/s3sync/s3sync.rb -v --exclude=".*\.(?!tgz$)" $S3_CHANGESETS_BUCKET: /tmp/changesets/ Also, on my Fedora8 install, the line /usr/bin/bash autorun.sh was adjusted to /bin/bash autorun.sh Another important item of note which the article didn't make so clear is that when you tar up your autorun.sh files into a .tgz archive, be sure to place the autorun.sh into a subdirectory before before the tar'ring. Script jockeys looking at the code will see this straight off, but those who aren't, like me, it took work to make sense of it. Finally, I definitely recommend reading about s3sync *first* in the document linked by the author, so you get a sense of what that tool does before implementing this solution. One thing I had to do on my Fedora8 platform for the sake of s3sync was to install Ruby. Thankfully that's exceedingly easy with the command: yum install ruby HTH. The solution is working great now! I hope this feedback will turn others onto it.
Welcome, Guest Help
Login Login