|
By PJ Cabrera, freelance software developer
Change is Constant
When using the Amazon EC2 service, you typically create an AMI to
go with the specific service you are trying to host. You pick
your favorite distribution, load it with the software and content
for your service, and save everything onto an AMI stored on Amazon S3.
However, most content and software changes over time.
Clients want their sites updated, and software needs to be modified
to support new features--usually in a long cycle with no real
perceived end. Sites and software stop being updated only when they
are no longer important, and nobody runs a venture with the goal of becoming unimportant.
Change is the only constant in the life of the systems we create.
Let's be honest, sometimes you don't even need a change request.
You may update the operating system, utilities, or configuration
to protect against known or perceived vulnerabilities. Or, you might upgrade
or add new development libraries to use a new feature in your favorite development
environment.
The Cost of Change
Regenerating AMIs for each content change, however small, seems
to be an immutable law of using Amazon EC2. If you don't
regenerate your AMI, your changes will be lost if the instance
gets shut down! However, every change requires CPU, bandwidth, and storage
expenses to transfer the changed files, generate
the new image, and store it on Amazon S3. And just to be safe, you keep
the old AMI around in case you need to roll back the system to its previous
state.
Let's assume, for the sake of argument, that the minimum size of an
AMI with operating system, web server, development environment of
choice, libraries, system utilities, and content is 600 MB.
If you have several systems running on Amazon EC2, each with
their own tailored AMI, you need to generate and store another 600 MB AMI
each time you make a system change, no matter how
small the change. This requirement is wasteful in the extreme.
If you decided to build a new house each time you wanted to
rearrange the furniture or put a new painting on the wall, the
cost in materials, time, and effort would compel you to limit
changes to those that were absolutely necessary. Fortunately,
you have alternatives.
Minimizing the Cost
Rather than create and store a new AMI for each and any
change to the systems you maintain, you can create generic AMIs that contain
most of your software, then store outside the AMI the content
that is most likely to change. This approach minimizes the amount of
storage that you must churn through to store a small change. Rather than
update the whole AMI for content or software changes, you update only
the storage area outside the AMI. You need regenerate the AMI, which will typically be
much larger than your content- and site-specific software, only when
the base system needs to be updated, such as in the case of operating system patch releases.
Using Amazon S3 for storage in Amazon EC2 deployments is free of transfer costs.
This makes Amazon S3 the ideal place to store changing content. I'll also
show you how to automate the use of Amazon S3 for your software changes so that your AMI
will update itself at startup, with content from an Amazon S3 bucket.
As the article
"Using Parameterized Launches to Customize Your AMIs," explains, the last
startup script to run on the Fedora Core Linux distribution is
called /etc/rc.local. You can modify this script in your AMIs to
download, from a specific Amazon S3 bucket, the sets of changes to
perform on your system and to apply them to an instance.
The changesets can consist of any kinds of files. The example in this article relies
on a script called autostart.sh, tucked inside a compressed .tar
file, along with any other files you want to include. You have all
the flexibility you need to manage your system changes. The
important part is storing changes in Amazon S3, downloading them at
instance startup, and running the script to apply the content to the
running instance.
Implementing the System
To keep things simple, you'll use the s3sync utilities introduced
in the article
"Using Amazon S3 from Amazon EC2 with Ruby." These utilities
simplify uploading to and downloading from protected Amazon S3 buckets.
Install the s3sync utilities to the /usr/local/s3sync folder,
and make sure that this folder is added to the system path:
## Edit /etc/profile and add /usr/local/s3sync to the $PATH
# vi /etc/profile
# cd /usr/local
# wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
# tar xzf s3sync.tar.gz
Besides installing s3sync, you'll need to modify /etc/rc.local.
In this example, you'll hard code the name of the Amazon S3
bucket and Amazon authentication keys, but you could use this system
along with
parameterized launches to make the script much more flexible.
## Copy the following to the end of your /etc/rc.local file
# replace xxxx with your AWS access key
AWS_ACCESS_KEY_ID=xxxx
# replace yyyy with your AWS secret
AWS_SECRET_ACCESS_KEY=yyyy
# change this to your own S3 bucket name
S3_CHANGESETS_BUCKET=pjc_changesets
mkdir -p /tmp/changesets
cd /tmp/changesets/
## This downloads all compressed .tar files stored in the Amazon S3 bucket.
## To keep things simple, this script does not recurse into folders.
s3sync.rb $S3_CHANGESETS_BUCKET:/*.tgz /tmp/changesets/
for file in *.tgz
do
tar xzpf $file
dirname=`echo $file | sed -e 's/.tgz//g' `
if [ -d $dirname ]; then
pushd $dirname
if [ -e autorun.sh ]; then
/usr/bin/bash autorun.sh
fi
popd
fi
done
The script downloads all files with extension .tgz into the folder
/tmp/changesets/. Then the script loops over the .tgz files, in
alphanumeric order, extracting the content and executing the
autorun.sh file inside. Really simple. All the magic is in
autorun.sh and in the other content that you've stored in the
compressed .tar file.
For example, let's say you have an Amazon EC2 instance serving up a Rails
application, and you want to pull fixes from Amazon S3.
To start, create one compressed .tar file, called myrailsapp01.tgz and
containing the complete Rails application and the following script to
deploy the application:
## autorun.sh
##
## Deploy the contents of the compressed .tar file. The Rails application
## is in a folder named myrailsapp. You need to move it to /var/railsapps
## and make a symlink of the public folder on the web server document
## root. Then, move a web server configuration file that sets up the
## virtual host.
mv myrailsapp /var/railsapps/
ln -s /var/railsapps/myrailsapp/public /var/www/myrailsapp
mv myrailsapp-virtualhost.conf /etc/httpd/conf.d/
Suppose that you later need to make a change to myrailsapp.
You have at least two options: You can recreate the file myrailsapp01.tgz
with the new content, or you can make a patch by using the diff -u command :
# diff -u myrailsapp-previous myrailsapp-current > myrailsapp-patch.diff
Then, pack the patch into the file myrailsapp02.tgz by using the
following autorun.sh script:
## autorun.sh
##
## Apply patch to myrailsapp.
cd /var/railsapps/myrailsapp
patch -u -p2 < myrailsapp_patch.diff
If you upload myrailsapp02.tgz to the Amazon S3 bucket, the next time the Amazon EC2
instance is started, the modified /etc/rc.local script will download
both compressed .tar files, myrailsapp01.tgz and myrailsapp02.tgz,
unpacking them in alphanumerical order and processing the autorun.sh
script in each archive.
Conclusions
You might prefer to recreate the compressed .tar file with the
changed content rather than use diff files. Using the modified /etc/rc.local script and
your Amazon S3 bucket, you can put anything you want into your compressed .tar
files, and the autorun.sh script gives you unlimited flexibility in
deploying anything to your AMI instance. These
files don't need to hold "content" in the usual sense at all.
When you combine the /etc/rc.local script with the concepts shown
in the
"Using Parameterized Launches to Customize Your AMIs," you can have the ability to
specify at launch which Amazon S3 bucket your AMI instance should use to
get its changesets.
I hope the concepts in this example have sparked some ideas about how
to simplify change deployment to your AMIs, increasing bandwidth and
storage efficiency.
PJ Cabrera is a freelance software developer specializing in Ruby on
Rails e-commerce and content management systems development. PJ's
interests include Ruby on Rails and open-source scripting languages and
frameworks, agile development practices, mesh networks, compute clouds,
XML parsing and processing technologies, microformats for more semantic
web content, and research into innovative uses of Bayesian filtering and
symbolic processing for improved information retrieval, question
answering, text categorization, and extraction. You can reach him at
pjcabrera at pobox dot com,
and read his weblog at pjtrix.com/blawg/
|