|
By Francis Shanahan
| Editor's Note -- This article is an excerpt from the book, Amazon.com Mashups, also by Francis Shanahan. |
Imagine that you are an entrepreneur in the early days of 2000. You’ve just finished writing up your latest business model on the back of a napkin at a local restaurant. The plan calls for a photo-oriented Web site that allows users to upload their precious family photos to your servers for a small fee.
Of course, before you go live you’ll need to create a capacity plan with the expected usage volumes, and then map that to your year-over-year projected growth. You’ll then need to procure hardware and rack space from your favorite hardware vendor. With that complete, you’ll commence setup of the environment and build out of the hardware cage. Don’t forget to include provision for a disaster-recovery site, and the business contingency volumes that may never be used.
This is the approach many companies took in the Internet hey-day. I know, because I architected and facilitated many such ventures. Countless multi-million dollar environments were procured and set up, only to have the underlying business model prove shaky. The problem was that the barrier to entry was so high, only the luckiest companies actually got the venture capital required to procure the necessary infrastructure and begin building out their dreams.
Amazon has changed all of that with the launch of its pay-as-you go Amazon Simple Storage Service (Amazon S3). The Amazon S3 is a storage service that allows anyone to purchase industrial-quality storage space on an as-needed basis. You pay only for what you use.
This has major repercussions in terms of the business plans that it enables. A student working out of a dorm room can build a professional-quality photo storage site without requiring millions of dollars from venture capitalists up front. Budding media entrepreneurs can share out content without a highly available redundant disaster-recovery site. The possibilities are endless.
In this article, you will build a simple generic Web site that utilizes the Amazon S3 to securely store any type of file you see fit. The resultant application can be used as a basis on which to build your empire, or at least store your business plan on something better than a napkin.
This article uses the Amazon S3 to store files on the Internet. You don’t really know where or how these files are stored, only that they are stored securely and without fear of being lost.
Understanding the Architecture
The sample application uses SOAP over HTTP to communicate with the S3 service. The architecture of the sample application is depicted in the following figure:

How It Works
- The
default.aspx page is presented to the user. It contains a form with a number of pieces of functionality.
- The user creates a bucket using the first set of controls on the sample page. This bucket can be used to store objects also known as files.
- The users’ request is forward onto the Amazon S3 server as a SOAP request. A bucket is created on Amazon.
- The user locates a file on a local computer and clicks Submit. The file is converted into a stream and uploaded to the sample applications Web server.
- The Web server then takes this stream and passes it on to the Amazon S3 server using SOAP over HTTP. The Amazon S3 server stores this stream as a file in its secure storage. The file is now associated with the bucket created in Step 1.
With this sample application the user can also delete a file or the bucket, list available buckets or list the contents of a particular bucket
The following figure displays the completed application. As you can see, this is a generic utility-type application. The code is written in such a way as to clearly articulate the steps involved. Feel free to take this code and build your own applications on top of it.

The next section describes Amazon S3 in terms of registration requirements and key concepts.
Registering for S3 Access
Since S3 is a pay-per-use service, you will need to sign up for explicit access to the service to use it. If you are already an Amazon customer, you can use your billing and credit card information that Amazon already has. This makes signup literally a 1-minute procedure.
You will only be billed based on the storage space you use, and the duration for which you use it. As someone who does this for a living, I can tell you that Amazon’s rates are highly competitive, and will beat any dedicated hardware providers estimates by a number of orders of magnitude.
Key Concepts
Amazon S3 uses a number of key concepts to organize data storage on its servers.
Buckets
Every object stored on Amazon S3 is placed in a bucket. A bucket is simply a way to group objects together and aggregate them for the purposes of usage tracking.
Bucket names have global scope. So, if a user creates a bucket named “mybucket”, no one else can create a bucket of the same name.
Objects
Objects represent the files that actually get stored on the platform. These can be files of any type, and are typically associated with a set of metadata. The object consists of the metadata and the file itself.
Objects can be created or deleted, and associated with a defined set of permissions. Only users or groups with the appropriate level of permissions can access a given object.
Every object in Amazon S3 is assigned a key, which is analogous to a filename, and is what uniquely identifies the object within your bucket.
Try It Out - Setting Up the Project
To build the generic Amazon S3 application, follow these steps:
- Create a new project and add a class to the project named
S3Helper.cs . This class should reside in the App_Code directory.
- Add a function to this class named
GetTimeStamp. This function will be implemented in the next section.
- Add a Web reference to the Amazon S3 SOAP service. The WSDL endpoint for the service is located here:
http://s3.amazonaws.com/doc/2006-03-01/AmazonS3.wsdl
- Create a page named
default.aspx and design it to look like the following figure. The remainder of this article walks you through the specifics of each page element.

Required Parameters
To make any call to Amazon S3, you must supply a number of standard items:
- The first is your AWS Access Key ID. This is assigned when you register as a developer with AWS.
- The second item is the current timestamp in Greenwich Mean Time (GMT). If the time you specify varies from the time on the Amazon servers by more than 15 minutes, the operation will be declined. This is for security reasons to ensure no one intercepts your request and attempts to replay it later.
- The third item you need to specify is probably the most interesting. This is a string that acts as a signature. The signature is the cryptographic hash of a piece of data comprising the operation you are calling, a special secret key, and the timestamp sent as item 2.
These are standard items that are included in every call. The secret key is similar to the AWS Access Key ID and is provided when you explicitly sign up for Amazon S3 access.
You must never disclose the secret key to anyone. Amazon will never ask for it, so if you are asked to disclose it, you know the requester is a fraud.
Authenticating with S3
The folks at Amazon have really done their homework when it comes to authentication. The Amazon S3 authenticates each and every method call. There is no notion of a session when working with S3. As a result, every call requires a set of parameters to be included, regardless of any other required information for the API.
The TimeStamp
There are two main instances when a timestamp is required:
- The first is for inclusion in a typical method call, in which case, the timestamp is included as a DateTime object.
- The second is in calculating the signature digest.
To create a valid TimeStamp value as a DateTime object, you must convert the current time to Universal Coordinated Time (UTC), also known as Greenwich Mean Time. The next section shows you how to do this.
Calculating the TimeStamp
To calculate a timestamp for use in an Amazon S3, call the first step is to obtain the current time. This is then used in subsequent calls. It’s important to store this value because time marches on, and you will not be able to recalculate a timestamp if you throw away the one you are working with.
I have provided a static helper class named S3Helper in the sample code that returns the correctly formatted timestamp. The code looks like this:
/// <summary>
/// Returns a new DateTime object set to the provided time
/// but with precision limited to milliseconds.
/// </summary>
/// <param name="myTime"></param>
/// <returns></returns>
public static DateTime GetTimeStamp(DateTime myTime)
{
DateTime myUniversalTime = myTime.ToUniversalTime();
DateTime myNewTime = new DateTime(myUniversalTime.Year,
myUniversalTime.Month, myUniversalTime.Day,
myUniversalTime.Hour, myUniversalTime.Minute,
myUniversalTime.Second, myUniversalTime.Millisecond);
return myNewTime;
}
This code accepts a DateTime as a parameter. This is the timestamp. That timestamp value is then converted into UTC, which is effectively mapped against GMT.
The precision of the DateTime is limited to milliseconds to conform to the expected format. Too much precision will break the Amazon API.
The second time helper function formats the same TimeStamp value as a string. This is used later in computing the message signature. The code looks like this:
/// <summary>
/// Formats the provided time as a string limited to millisecond precision
/// </summary>
/// <param name="myTime"></param>
/// <returns></returns>
public static string FormatTimeStamp(DateTime myTime)
{
DateTime myUniversalTime = myTime.ToUniversalTime();
return myUniversalTime.ToString("yyyy-MM-dd\\THH:mm:ss.fff\\Z", System.Globalization.CultureInfo.InvariantCulture);
}
You should add both of these functions to the S3Helper class.
Calculating the Signature
The signature is the third component in the Amazon authentication scheme. The signature proves knowledge of the user’s secret key (not the AWSAccessKeyId). This is a special key that is known only to the registered developer and should never be disclosed to anyone. By creating a digest with this key as input, Amazon can verify that the operation request came from a valid source, and has not been tampered with.
What Is a Digest?
A digest (or message digest) is a string value produced by applying a mathematical algorithm to a piece of content. The algorithm is such that if even a single character in the content is modified or changed, the resulting digest value will be affected, yielding a brand new digest value. A message digest is sometimes known as a hash value.
There are many algorithms available to calculate message digests. Amazon uses the HMACSHA1 algorithm, which is implemented in .NET using the System.Security.Cryptography namespace.
Add a using statement to your code in S3Helper.cs as follows:
using System.Security.Cryptography;
Amazon expects the digest to be created based on a concatenated string of the following data elements:
- The string “AmazonS3”
- The operation you are invoking (for example, “ PutObjectInline”)
- The string representation of the TimeStamp included in the call
HMACSHA1 constructs a hash-based message authentication code ( HMAC) using a piece of data and a key input. The algorithm mixes the key with the data. It then applies a hash function to obtain a digest. The result is mixed with the key input again, and the result is hashed one more time. The result is a secure hash code that can be used to determine whether the data has been tampered with during transport.
Add the following code, which creates an HMACSHA1 hash or digest of the given data elements:
public static string GetSignature(string mySecretAccessKeyId, string strOperation, DateTime myTime)
{
Encoding myEncoding = new UTF8Encoding ();
// Create the source string which is used to create the digest
string mySource = "AmazonS3" + strOperation + FormatTimeStamp(myTime);
// Create a new Cryptography class using the
// Secret Access Key as the key
HMACSHA1 myCrypto = new HMACSHA1 (myEncoding.GetBytes(mySecretAccessKeyId));
// Convert the source string to an array of bytes
char [] mySourceArray = mySource.ToCharArray();
// Convert the source to a UTF8 encoded array of bytes
byte [] myUTF8Bytes = myEncoding.GetBytes(mySourceArray);
// Calculate the digest
byte [] strDigest = myCrypto.ComputeHash(myUTF8Bytes);
return Convert .ToBase64String(strDigest);
}
Now, you have generic Signature and TimeStamp functions that can be used with all the subsequent Amazon S3 operations.
Working with Buckets
As you would expect, buckets support a number of operations such as Create, Delete, and so on. This next section describes these operations and how to invoke them.
Creating a Bucket
The CreateBucket operation, as the name implies, creates a new bucket of a specific name.
The key takeaway when talking about buckets is that bucket names are not scoped per user. If I create a bucket named MyBucket, other users will not be able to create a bucket of the same name.
If a user named Fred has created a bucket named MyBucket, then any subsequent calls to CreateBucket by Fred using MyBucket as the bucket name will be successful. However, all other users will experience an error that says, “Bucket name already exists.”
If successful the CreateBucket operation returns NULL.
Try It Out - Adding a Bucket
To create a bucket using the Amazon S3 SOAP API, follow these steps:
- Add a new textbox to
default.aspx named txtBucketName.
- Add a new button named
cmdCreateBucket. Add the following code to its click event:
protected void cmdCreateBucket_Click(object sender, EventArgs e)
{
AmazonS3 myS3 = new AmazonS3 ();
DateTime myTime = DateTime .Now;
try
{
CreateBucketResult myCreateResult = myS3.CreateBucket(txtBucketName.Text, null ,
myAWSAccessKeyId,
S3Helper .GetTimeStamp(myTime),
true ,
S3Helper .GetSignature(mySecretAccessKeyId, "CreateBucket" , myTime));
MyPrint("Bucket successfully created." );
}
catch (Exception ex)
{
MyPrint("CreateBucket Error: " + ex.Message);
}
}
Creating a bucket is easy, but the returned information does little to indicate that anything has actually happened. You will want to quickly move on to the next section, which shows you how to display your list of buckets.
Listing Your Buckets
Once you have created some buckets, the next step will be to list them. Add the following code to the event handler for a new button named cmdListBuckets:
protected void cmdListBuckets_Click(object sender, EventArgs e)
{
AmazonS3 myS3 = new AmazonS3 ();
DateTime myTime = DateTime .Now;
/// Lists all buckets under this user
ListAllMyBucketsResult myBuckets = myS3.ListAllMyBuckets(myAWSAccessKeyId,
S3Helper .GetTimeStamp(myTime),
true ,
S3Helper .GetSignature(mySecretAccessKeyId, "ListAllMyBuckets" , myTime));
lblBuckets.Text = "<b>My Buckets</b><br/>" ;
foreach (ListAllMyBucketsEntry b in myBuckets.Buckets)
{
lblBuckets.Text += (b.Name + ", created " + b.CreationDate + "<br/>" );
}
}
This code invokes ListAllMyBuckets and iterates through the results. Each result is returned as a ListAllMyBucketsEntry .
Now, you can create and list your buckets. The next step is to be able to delete the buckets associated with your account.
Deleting a Bucket
The DeleteBucket operation deletes a bucket. You can only delete your own buckets, and each bucket must be empty in order to delete it. That means you have to first delete all the files within a bucket before it can be removed.
DeleteBucket expects to be passed the name of the bucket to be deleted. Here’s the code that goes in the Delete button’s event handler:
protected void cmdDeleteBucket_Click(object sender, EventArgs e)
{
AmazonS3 myS3 = new AmazonS3 ();
DateTime myTime = DateTime .Now;
try
{
Status myDeleteResult = myS3.DeleteBucket(txtBucketName.Text,
myAWSAccessKeyId,
S3Helper .GetTimeStamp(myTime),
true ,
S3Helper .GetSignature(mySecretAccessKeyId,
"DeleteBucket", myTime),
null);
MyPrint("Bucket successfully deleted." );
}
catch (Exception ex)
{
MyPrint("DeleteBucket Error: " + ex.Message);
}
}
This code simply deletes whatever bucket is named in txtBucketName . You can modify this as you see fit (for example, to delete a given bucket returned by ListAllMyBuckets) .
Deleting buckets is final, and, once they are gone, they are gone. There is no un-deleted operation. Of course, you can re-create a bucket of the same name, if needed.
Uploading Objects
With the basic bucket housekeeping implemented, it’s time to finally upload some files. The sample application uses an ASP.NET FileUpload control to obtain the file from the browser.
Add a FileUpload control to the page and name it myFileUpload.
<asp:FileUpload ID="myFileUpload" runat="server" />
Add a new button named cmdSubmit. This will handle the actual file upload to your Web server. Here's the code for the event handler of the Submit button.
protected void cmdSubmit_Click(object sender, EventArgs e)
{
if (myFileUpload.HasFile)
{
MyPrint("Server received " + myFileUpload.FileName);
MyPrint("Attempting to save to S3" );
AmazonS3 myS3 = new AmazonS3 ();
DateTime myTime = DateTime .Now;
// Create a signature for this operation
string strMySignature = S3Helper .GetSignature(
mySecretAccessKeyId,
"PutObjectInline" ,
myTime);
// Create a new Access grant for anonymous users.
Grant myGrant = new Grant ();
Grant [] myGrants = new Grant [1];
// Setup Access control, allow Read access to all
Group myGroup = new Group ();
myGroup.URI = "http://acs.amazonaws.com/groups/global/AllUsers" ;
myGrant.Grantee = myGroup;
myGrant.Permission = Permission .READ;
myGrants[0] = myGrant;
// Setup some metadata to indicate the content type
MetadataEntry myContentType = new MetadataEntry ();
myContentType.Name = "ContentType" ;
myContentType.Value = myFileUpload.PostedFile.ContentType;
MetadataEntry [] myMetaData = new MetadataEntry [1];
myMetaData[0] = myContentType;
// Finally upload the object
PutObjectResult myResult = myS3.PutObjectInline(
txtBucketName.Text,
txtKey.Text,
myMetaData,
myFileUpload.FileBytes,
myFileUpload.FileBytes.Length,
myGrants,
StorageClass .STANDARD,
true ,
myAWSAccessKeyId,
S3Helper .GetTimeStamp(myTime),
true ,
strMySignature, null
);
// Print out the results.
if (myResult != null ) MyPrint("ETag: " + myResult.ETag);
}
}
I am specifying the content type using a MetaData entry. This allows me to store the file, which is essentially a stream of bytes, along with the actual MIME content type that those bytes represent.
When you upload a file, you assigned it a key name. This is analagous to a filename. If a file exists with a given key name, that file will be overwritten.
A successful file upload returns an eTag for the file. The eTag is a hash on the content of the file received by Amazon. The hashing algorithm used is Message Digest 5 (MD5).
For an added level of assurance, you can compute your own hash of the file content and compare it against the eTag. If the hash values match (as they should), you can be assured the file content was not tampered with or corrupted during transmission.
If the hash values don’t match, it implies that something happened during transmission. You can conclude that the file was not accurately stored, and should be re-transmitted.
Permissions
Every object or bucket stored on the Amazon S3 servers needs a set of grants or permissions. Only users with the required privileges can access a given bucket or object.
The sample code illustrates how to assign permissions on an object, but the process is similar for buckets. To create permissions on an object, you start by creating a new Amazon S3 Grant object.
// Create a new Access grant for anonymous users.
Grant myGrant = new Grant ();
Grant [] myGrants = new Grant [1];
Next, specify the properties of the Grant object. A grant can apply to a group of users, or an individual user. In this example, you want the grant to apply to all anonymous users, so you must create a new Group.
// Setup Access control, allow Read access to all
Group myGroup = new Group ();
myGroup.URI = "http://acs.amazonaws.com/groups/global/AllUsers" ;
Then, assign this group to the Grant .
myGrant.Grantee = myGroup;
Next, specify the level of permission (in this case READ access).
myGrant.Permission = Permission .READ;
Finally, assign this Grant to the Grants array.
myGrants[0] = myGrant;
The Grants array is supplied to the PutObjectInline call and the Grants contained therein are assigned to the objects uploaded in the method call.
This example uses a group but you can also tie permissions to an individual users. By specifiying the users’ Amazon email address, only that user will be granted the associated permissions.
Amazon S3 uses a set of predefined groups for grants. The following URL refers to the set of All Users, including anonymous users:
http://acs.amazonaws.com/groups/global/AllUsers
You must use only the predefined groups at this time, and cannot currently define your own groups. The following table lists the predefined groups available.
Group URL |
Description |
http://acs/amazonaws.com/groups/gobal/allUSers |
All users, whether authenticated or otherwise. |
http://acs/amazonaws.com/groups/gobal/allUSers |
Only users who have registered and authenticated with the Amazon S3 service will be part of this group. |
By assigning the READ permission to non-authenticated users, any files uploaded will be available at the following URL:
https://s3.amazonaws.com/<Bucket Name>/<Key Name>
Listing a Bucket’s Contents
With a file uploaded the next thing you’ll want to do is list the content of a bucket to ensure that files got uploaded OK.
protected void cmdShowContents_Click(object sender, EventArgs e)
{
AmazonS3 myS3 = new AmazonS3 ();
DateTime myTime = DateTime .Now;
string strMySignature = S3Helper .GetSignature(
mySecretAccessKeyId,
"ListBucket" ,
myTime);
ListBucketResult myResults = myS3.ListBucket(
this .txtBucketName.Text,
"" ,
"" ,
0,
false ,
"|" ,
myAWSAccessKeyId,
S3Helper .GetTimeStamp(myTime),
true ,
strMySignature,
null );
// Iterate through the bucket contents
if (myResults.Contents != null )
{
lblResults.Text = "<table>" ;
foreach (ListEntry myEntry in myResults.Contents)
{
lblResults.Text += "<tr><td>" ;
lblResults.Text += "<img src=https://s3.amazonaws.com/"+ txtBucketName.Text +"/" + myEntry.Key + " width=100px><br/>" ;
lblResults.Text += "<a href=https://s3.amazonaws.com/"+ txtBucketName.Text +"/" + myEntry.Key + " target=_blank>" + myEntry.Key + "</a>" ;
lblResults.Text += "</td></tr>" ;
}
lblResults.Text += "</table>" ;
}
else
{
MyPrint("Bucket is Empty" );
}
}
In this sample code, the results are iterated through, and a link to each item is provided. Since I assume the content is an image, I build an image tag referencing each bucket item. The following figure displays the output.

Deleting Objects
To delete a file, you need to specify the bucket in which that object resides, along with the name of the object to delete. This information is supplied along with the standard Amazon S3 required parameters.
The following listing shows the code for the DeleteFile button:
protected void cmdDeleteFile_Click(object sender, EventArgs e)
{
AmazonS3 myS3 = new AmazonS3 ();
DateTime myTime = DateTime .Now;
string strMySignature = S3Helper .GetSignature(
mySecretAccessKeyId,
"DeleteObject" ,
myTime);
Status myResults = myS3.DeleteObject(
this .txtBucketName.Text,
this .txtKey.Text,
myAWSAccessKeyId,
S3Helper .GetTimeStamp(myTime),
true ,
strMySignature,
null );
MyPrint("Delete successful: " + myResults.Code + ", " + myResults.Description);
}
The Delete operation is unforgiving. Once a file is deleted, it cannot be un-deleted. The operation will not prompt you with an “Are You Sure?”. I personally prefer this approach, but use with caution, because once an item is deleted from storage, it is not retrievable.
Summary
In this article you have done the following:
- Implemented HMACSHA1 message signing.
- Built a generic Amazon S3 application.
- Created a utility class that can be re-used with other Amazon services
The sample code with this article, attached below, can easily be enhanced to support a wide range of storage scenarios, including file sharing, large email attachment storage, collaborative editing, and so on.
|