Resources



Articles and Tutorials

Building a Small Business Backup System Using Amazon S3

Click for a printer friendly version of this document Printer Friendly Save to del.icio.us
Average Review:

Thomas Myer puts Amazon S3 to work as a simple, inexpensive, efficient backup solution for small businesses.

AWS Products Used: Amazon S3
Language(s): PHP
Date Published: 2008-01-17

By Thomas Myer, owner of Austin-based Triple Dog Dare Media (http://www.tripledogs.com)

The Solution to Small Office Backup Needs: Amazon S3

When you run a small business, you're usually running too fast and hard to think about data backups. Then something terrible happens—a hardware failure, an employee or contractor gets sloppy and kills a month's worth of work—and you realize that you better slow down and figure out a few things. If you're lucky, something less disruptive has moved you to consider backup solutions—for example, maybe you landed a big contract in which the client demands some kind of backup system.

If you're like me, you're not exactly an expert in backup systems. All you know is that you need to have something in place in case of emergency. So you stand there looking out over the yawning precipice of possible solutions. There are USB and FireWire external hard drives, network-attached storage machines, Linux machines you can repurpose as backup machines, thumb drives, and hosting solutions to which you can rsync files. From a distance, they all look pretty much alike, except for maybe price point and storage capacity.

One thing's for sure, though: If you're going to play in this league, you have to shell out some bucks. In some cases, a lot of bucks. Sometimes the money doesn't come due right at first, but eventually it flows out at some point. In other words, your total cost of ownership can be pretty high. We're talking about the cost of running power (and backup batteries) to your hardware; paying fees to a hosting solution, if you use that route; buying or leasing and maintaining, equipment; and paying for the guy to come fix your hardware that crashed.

And, unless you're using a hosting solution, you still need to figure in the need for an off-site backup in case of fire, break-in, or other catastrophe. Isn't it amazing how deep this can of worms goes once you pop the top? But it's all essential if you want to get a good night's sleep and have secure data.

The question at this point is do you want to grow this thing organically, like a lot of business owners have? Or do you want to go a simpler route? And if you've already grown your own Rube Goldberg backup contraption, can you wipe the slate clean and start over with something simpler, cleaner, and more efficient?

The answer to the last two questions is—of course! In this article I'm going to focus on a very affordable and secure backup system: the Amazon Web Service (AWS) Simple Storage Service, or Amazon S3.

Because our scope here is limited, I'm going to focus on the simplest-use case—building a simple PHP script that targets a single directory of files for backup. By using this modular approach, you can individually target different directories and run backups more efficiently. The assumption I'm working on is that you'll be using these directories to deposit archive (.zip or .tar) files and then transporting those files to Amazon S3. (We can debate the relative worthiness of this approach at another time in another venue.)

What Is Amazon S3?

Unless you've been living in a cave for the past decade, you know that Amazon.com is one of the world's leading Internet retailers. The company started out selling books on the web, and moved on to sell music, movies, power tools, and much more.

Amazon isn't just about the front end, though; it also has some pretty exciting things going on in terms of back-end power. The most significant for the purposes of this article is access to all the tools that comprise AWS. These web services are easy to use and relatively inexpensive, and they provide developers and business owners with some innovative fixes for common headaches--such as backup solutions.

This article focuses on Amazon S3. Amazon S3 offers a hosted storage solution for just pennies on the gigabyte (GB). Using Amazon S3, you can store data objects up to 5 GB in size in containers called buckets, and then retrieve that data at a later date.

Why use Amazon S3? Instead of spending money on servers or external hard drives, paying money to connect those machines to the network (or paying co-location fees), and then paying more money to secure, maintain, and update those servers, you can pay a fraction of that money for a high-bandwidth, high-reliability service provided by Amazon.

How To Sign Up

Before you can use any of the AWS products, you'll need to sign up on the AWS web site. The process takes 5 to 10 minutes maximum, and requires you to fill out a simple form and register a credit card to pay for services.

Simply go to http://aws.amazon.com to sign up. Remember that to use any of this code, you have to have an Access Key ID and a Secret Access Key from Amazon—you'll receive those in your e-mail inbox as soon as you complete your registration.

Of Buckets and Objects

When you start working with Amazon S3, you'll have to organize your work into buckets and objects. A bucket is where you store all your data. Buckets must have unique names, so be as specific as you like when you name your buckets. Think of buckets as domain names. In fact, when using the REST API, you can think of buckets as subdomains of s3.amazonaws.com:

http://mybucketname.s3.amazonaws.com

Objects are the files and other data packets you place in your buckets. Objects can be named anything you want—they don't have to be unique across buckets. For example, you might have an object backups/backup.1.tar or an object index.html. To access these objects using the REST API, the address would look something like this:

http://mybucketname.s3.amazonaws.com/backups/backup.1.tar
http://mybucketname.s3.amazonaws.com/index.html

Furthermore, every bucket you create has an associated access control policy that governs who can create, delete, and list objects within a bucket. In this article, we won't go into too much detail on access control.

The final important thing to note about buckets is that fees for object storage and network data transfer are billed to a bucket's owner. Thus, you can think of buckets as billable line items, and act accordingly.

Objects consist of a key (think of this as a filename), key-value metadata, and the object payload (the actual contents of the file itself, whether it is image, text, video, or other data).

One of the interesting things you can do with object keys is assign prefixes to them and then break up those prefixes with a common delimiter. For example, say you had the following set of objects:

Backups/November/1.tar
Backups/November/2.tar
Backups/December/1.tar
Backups/December/2.tar

You could split on the "/" delimiter and then list all the objects with the prefix "Backups/December/".

Creating Your PHP Backup Script

Now that we have covered some background, you can create a PHP backup script. The intention here is to create a simple script that will back up a targeted directory of files to Amazon S3.

To simplify a few things in the examples in this article, I use the Amazon S3 PHP class written by Donovan Schonknecht. The class is available at http://undesigned.org.za/2007/10/22/amazon-s3-php-class and requires PHP 5 and CURL.

Why don't I use the better known (and excellent) Amazon S3 class written by Geoffrey P. Gaudreault? Because that class requires that you read the file data into a variable before POSTing it over. This approach is fine for interactive file uploads but is incredibly memory-intensive for the kinds of files I deal with in my own backup routine. It took only two or three large files (anything over 8 megabytes—MB—on a typical server) to create a memory problem. If you have larger files, the memory problem just happens faster, and you have to keep upping the ante on your maximum memory usage in PHP.

The next question is, why use Donovan's class at all? Why not create your own functions to work with Amazon S3? Because Donovan has already done all the hard work of abstracting signature creation and transferring objects. The goal of this article is to show you how to create a simple interface to Amazon S3, so using an existing class makes a lot of sense.

Because I believe in showing you real-life examples, in this article I will guide you through the creation of a backup script for a particular directory—one of mine, in reality, that contains various podcasts, Zip archives, and downloadable documents. File sizes in this directory range from 100 kilobytes (KB) to around 60 MB, well within the comfort zone of Amazon S3's 5 GB limit.

Let's take a quick look at the code, and then I'll walk you through it.

<?php
include('s3.php');

//make sure we are getting right number of
//arguments from command line
if (count($_SERVER['argv']) < 3){
	die("You must provide a target directory and bucket name!\nusage: backups.php <target_dir> <bucket_name>\n");
}

//take second argument and use it as our targetdir variable
$targetdir = $_SERVER['argv'][1];

//take third argument and use it as our bucketname variable
$bucket = $_SERVER['argv'][2];

//assign to constants
define('BUCKET',$bucket);
define('DIR',$targetdir);


//switch directories to target directory
chdir(DIR);


//instantiate S3 class
$s3 = new S3('change-this', change-this-too');

//try to create bucket
$okay = $s3->putBucket(BUCKET, S3::ACL_PUBLIC_READ);


if ($okay){
  echo "Created bucket ".BUCKET."\n";
}else{
  die("Can't create bucket ". BUCKET."\n");
}




//iterate through files in the directory

if ($handle = opendir('.')) {
  while (false !== ($filename = readdir($handle))) {
    if ($filename != "." && $filename != ".." ) {
      if($okay) {
  				
        if ($s3->putObjectFile($filename, BUCKET, basename($filename), S3::ACL_PUBLIC_READ)) {
          echo "File copied: ".basename($filename)."\n";
						
        } else {
          echo "*** Failed to copy: ". basename($filename). "\n";
      }
    } else {
      
    }
  }
 }
    closedir($handle);
}


?>

The first thing you do is include the s3.php file and check for the number of incoming arguments from the command line. Next, you use the $_SERVER['argv'] array (which contains the incoming command-line arguments) to grab the second and third arguments. Remember that the name of the PHP script is always the first argument (indexed at 0!). In this case, the command line to run this PHP script will look like this:

/usr/bin/php –q backups.php /path/to/my/directory name_of_bucket

For good measure, after you have captured those two command-line arguments, you define two constants based on those arguments: one called BUCKET, which contains the name of your Amazon S3 bucket, and another called DIR, which contains the name of the directory you want to back up.

As soon as you have defined those two elements, you run a chdir() to change your current working directory to the target directory. That way you can run all your commands without having to figure out the path for files that need to be processed.

Next, instantiate a new Amazon S3 object and create a new bucket using the putBucket() method. Notice that you run a check to make sure that you've really created the bucket (or are reusing one with this name that you already own). If you can't create or use the bucket, you immediately die(), because there's no point in continuing the script.

The rest of the script is pretty simple. You iterate over the target directory, ignoring any files with names "." and "..". As you process each file, you use the putObjectFile() method to send the file over to Amazon S3. The arguments for this method include the filename, the bucket name, and an Access Control List (ACL) setting. In this example, you set public-read permissions, but you could also set it to private or public-read-write.

Every time you post a file, you send a message back to the console confirming that the file was indeed copied. If a copy fails, you don't want to die(), because you want to continue copying files, but it is good to have a record of which files didn't copy.

Listing Objects in a Bucket

Now that you've stored your data in Amazon S3, you need to write a quick script that will list all the objects in a bucket. The script for that process is even simpler.

<?php
include('s3.php');
if (count($_SERVER['argv']) < 2){
	die("You must provide a bucket name!\nusage: listing.php <bucket_name>\n");
}
$bucket = $_SERVER['argv'][1];
define('BUCKET',$bucket);
$s3 = new S3('change-this', 'change-this-too');

$RESPONSE = $s3->getBucket(BUCKET);
foreach( $RESPONSE as $key => $list ) { 
  echo $key."\n";
}
?>

Once again, you use include() to pull in the s3.php file, and you define a constant to hold the name of your bucket (which you are getting as a command-line argument). After you instantiate your object, you use the getBucket() method to pull out all the objects in the bucket.

After that, it's a simple matter of iterating to get a simple listing of object keys in a particular bucket. The output might look like this:

0001.mp3
0002.mp3 
0003.mp3 
0004.mp3 

Of course, it could be useful to display further information, like the file size, in which case you can access the size key on the foreach iterator. In the following example, you calculate KB, MB, and GB to provide a more condensed view of the data:

<?php

include('s3.php');
if (count($_SERVER['argv']) < 2){
	die("You must provide a bucket name!\nusage: listing.php <bucket_name>\n");
}
$bucket = $_SERVER['argv'][1];
define('BUCKET',$bucket);

$s3 = new S3('change-this', 'change-this-too');

$RESPONSE = $s3->getBucket(BUCKET);
foreach( $RESPONSE as $key => $list ) { 
	
  if ($list['size'] <= 1024){
    $size = $list['size'] . " bytes";
  }elseif ($list['size'] >= 1024 * 1024){
    $size = intval($list['size'] /(1024*1024)). " MB";
  }elseif ($list['size'] >= 1024 * 1024 * 1024){
    $size = intval($list['size'] /(1024*1024 *1024)). " GB";
  }else{
    $size = intval( $list['size'] / 1024 ) . " KB";
  }
  echo $key." ($size)\n";
}
?>

The output of the new PHP code might be:

0001.mp3 (53 MB)
0002.mp3 (21 MB)
0003.mp3 (16 MB)
0004.mp3 (11 MB)

If you need more precision, you could just print out the number of bytes without all the processing:

<?php

include('s3.php');
if (count($_SERVER['argv']) < 2){
	die("You must provide a bucket name!\nusage: listing.php <bucket_name>\n");
}
$bucket = $_SERVER['argv'][1];
define('BUCKET',$bucket);

$s3 = new S3('change-this', 'change-this-too');

$RESPONSE = $s3->getBucket(BUCKET);
foreach( $RESPONSE as $key => $list ) { 
	
  $size = $list['size'];
  echo $key." ($size)\n";
}
?>

Deleting All Objects in a Bucket

Occasionally, you might need to delete all objects in a bucket. The following code does just that. As you can see, the process involves using the getBucket() method to build an array of objects in a given bucket. Then, you just iterate through the array using the key for the array (which corresponds to the key of any given object) and pass that key to the deleteObject() method.

At the end of the process, simply use getBucket() to list all the keys in a bucket and get a count, then display that count to confirm that a bucket is empty.

<?php
include 'S3.php';
if (count($_SERVER['argv']) < 2){
	die("You must provide a bucket name!\nusage: listing.php <bucket_name>\n");
}
$bucket = $_SERVER['argv'][1];
define('BUCKET',$bucket);

$s3 = new S3('change-this', 'change-this-too');


$RESPONSE = $s3->getBucket(BUCKET);

if (count($RESPONSE)){
  foreach( $RESPONSE as $key => $list ) { 
    if ($s3->deleteObject(BUCKET, $key)) {
      echo "Deleted: $key\n";
    }else{
      echo "*** Couldn't delete $key!\n";
    }		
  }
}

$RESPONSE = $s3->getBucket(BUCKET);
echo "Files in bucket ".BUCKET.": ".count($RESPONSE) . "\n";

?> 

What if you just want to delete one file, files that match a regular expression, or files of a certain age? It would be simple enough to build that into this script. We can cover this and other advanced topics in a future article.

Multithreading the Backup Process

Now that you've built a simple modular script that tackles one directory at a time, the question then becomes what if I need to back up a large number of directories? Can this script be run in a multithreaded fashion?

Well, PHP doesn't support multithreaded operations. However, you can easily build a system that takes full advantage of available resources on your Linux machine and then let the operating system multithread the backups. Here's what you need to make this work:

  1. A shell script that can monitor how many backup scripts are running at the moment and fire off new processes as needed.
  2. A simple listing of target directories that can be fed to the backup script.
  3. A backup script (already written above!)

First, build a file called targets.txt. This file will contain, one line at a time, your command lines, complete with arguments:

/usr/bin/php -q backups.php /target/1 tdog_bk1
/usr/bin/php -q backups.php /target/2 tdog_bk2
/usr/bin/php -q backups.php /target/3 tdog_bk3
/usr/bin/php -q backups.php /target/4 tdog_bk4
/usr/bin/php -q backups.php /target/5 tdog_bk5
/usr/bin/php -q backups.php /target/6 tdog_bk6
/usr/bin/php -q backups.php /target/7 tdog_bk7

Next, build a simple Perl script that loads this file and then continuously runs up to three of these backup instances at any one time. You can adjust this variable to suit your own needs—a little experimentation on my MacBook Pro running MAMP suggests that you can run six processes at any given time with a load of .40. Call this script watcher.pl:

#!/usr/bin/perl -w
open IN, "targets.txt";
@IN = <IN>;
close IN;

foreach $line (@IN){

	next if $line =~ /^#/;
	print "Status: processing " . $line . "\n";
	&processLine($line);
}


sub processLine(){
	my $text = shift;
	chomp($text);

	my $ctr = `ps aux | grep 'backups.php' | grep -v 'grep' | wc -l`;
	print $ctr;
	if ($ctr > 3){
		sleep 60*1;
		&processLine($text);
	}else{
		system($text . "&");
	}
}

If you don't know Perl, don't worry. This script simply opens the targets.txt file you created, then saves the lines of that file into an array. Next, the script loops through that array of lines, ignoring any line that begins with #, and runs a subroutine called processLine() on the current line from the array.

The processLine() routine runs a UNIX command line that reads how many times backup.php shows up in the ps list. If the backup.php script is currently being run more than three times, the script falls asleep for one minute and then checks again. Otherwise, it runs the line fed to the routine, appending the & character to keep the process running in the background.

In other words, all you have to do is set up the targets and fire off the Perl script, and it will continue running instances of your backup script until the work is done. Because the underlying PHP script is echoing status lines, your console will tell you which bucket is being worked on as the processes fire off.

Conclusion

In this article, you learned how to create a backup script that copies the contents of a targeted directory to an Amazon S3 bucket. You also created some infrastructure around that script to allow for multiple backups at a time.

In the next article in this series, we'll tackle a few advanced topics, such as directory recursion, timestamping (backing up only files that have changed since the last backup), and more.

Learning More About AWS

This article highlights a few aspects of working with AWS. Here are a few more resources available to PHP developers to help you learn more.

Common Resources on AWS

  • AWS web site - Learn more about each web service on the AWS web site
  • Developer Connection web site - The community web site for AWS developers includes forums on AWS, a Solutions Catalog for examples of what your peers have built, and more
  • Resource Center - Part of the Developer Connection web site, the Resource Center has links to tutorials, code samples, technical documentation, and other resources for building your application on AWS

Great Resources for PHP Developers

About the Author

Thomas Myer is owner of Austin-based Triple Dog Dare Media (http://www.tripledogs.com). He is a Web developer, author of Lead Generation on the Web (O'Reilly) and No Nonsense XML Web Development with PHP (SitePoint), a consultant, and speaker. He is currently hard at work on a CodeIgniter book for WROX.



Related Documents
Type: Sample Code Amazon S3 Backup Solution (PHP/Perl)

Discussion

The 5 most recent discussion messages. View full discussion.

crewze
Posts: 8
Registered: 3/5/08
Re: Building a Small Business Backup System Using Amazon S3
Posted: Apr 11, 2008 8:46 AM PDT   in response to: hareem
 
  Click to reply to this thread Reply

You need to put your aws key and secret key in place of change-this (aws key) and change-this-too (secret key).

$s3 = new S3('change-this', change-this-too');

so it would look like:

$s3 = new S3('1234567890', '0987654321');

Look at example.php that is included with S3.php for more info.

hareem
Posts: 169
Registered: 8/11/07
Re: Building a Small Business Backup System Using Amazon S3
Posted: Jun 6, 2008 9:30 PM PDT   in response to: crewze
 
  Click to reply to this thread Reply

Thanks for the info.
Could you please tell me how can i store my data into multiple folders.

Basically now what happens is that the data is being stored right on the bucket. The trouble i have is that i want to accomplish the following but have no idea how to go about it.

EXAMPLE SCENARIO:

upload to bucketA/datafoldername/objectname

What parameter do i need to implement in order for me to do that.

Regards
Hareem.


D. Kavanagh
RealName(TM)

Posts: 2,712
Registered: 5/25/06
Re: Building a Small Business Backup System Using Amazon S3
Posted: Jun 8, 2008 6:02 AM PDT   in response to: hareem
 
  Click to reply to this thread Reply

Hareem,
You simply include the "/" character in your object key, so the object key for your scenario would be "datafoldername/objectname".
Then, if you use S3Fox to browse it, it will also give you the folder view.

David


hareem
Posts: 169
Registered: 8/11/07
Re: Building a Small Business Backup System Using Amazon S3
Posted: Jun 12, 2008 9:48 AM PDT   in response to: D. Kavanagh
 
  Click to reply to this thread Reply

I tried it the way you mentioned. But the files are still being stored toghether.

Its weird. I specified bucket/docfolder/subfolder = hdata/test/123/

this is the response that i got back.

Created bucket hdata/test/123/


but my files are stored inside hdata and not hdata/test/123

please advise.


D. Kavanagh
RealName(TM)

Posts: 2,712
Registered: 5/25/06
Re: Building a Small Business Backup System Using Amazon S3
Posted: Jun 13, 2008 11:14 AM PDT   in response to: hareem
 
  Click to reply to this thread Reply

I'm not sure what client you are using, but using jets3t, I've written applications that store files as you are describing. Make sure the bucket name you specify is distinct from the key name. So, create a bucket with the hdata name, then put the object named "test/123" into that bucket.

David



Reviews
Create Review Write a Review

Building a Small Business Backup System Using Amazon S3, Apr 11, 2008 8:50 AM
Reviewer: crewze
There is an error in these examples. The line: include('s3.php'); should be: include('S3.php'); Note the capital "S".

What the non-technical?, Jun 21, 2008 10:07 AM
Reviewer: Diveares
I am trying to use the service as a backup for a small business. The community appears all technical and I feel like I stumbled into the wrong playground. Can S3 be used for me to drag and drop a photo collection retaining my current file structure?

Very useful, Aug 3, 2008 10:18 AM
Reviewer: M. Randrup
Yes, downloadable source code would have made this a much quicker process to understand. BUT IT WORKS and I wish that I could possibly encourage the author to continue the series and write on timestamp comparison and on directory recursion.

Add "For Linux/Unix Admins" to the title, Oct 1, 2008 10:31 AM
Reviewer: Doug Sharp
I came across this article while researching Amazon S3 and other storage cloud services. This article gave me a concise real-world scenario where I could use S3. The title should be revised to state that it's for Unix administrators that already understand PHP, scripts, etc. For that audience, I think it's excellent. I haven't tried to run the code, but then I think the code here is for descriptive purposes - not a copy and paste full application.

What a cool intro to Amazon S3, Feb 26, 2009 10:57 AM
Reviewer: Rajiv Banerjee
Thomas - thank you. This is a well written introduction to Amazon S3. The writeup has been kept at a fairly high level - so it will not scare away the new bee. It has enough information for you to be successful in kicking off your S3 journey, and getting deeper into the details. A big than you also to Donovan Schonknecht for his Amazon S3 class at http://undesigned.org.za/2007/10/22/amazon-s3-php-class. Good work guys!
Welcome, Guest Help
Login Login