Discussion Forums



Thread: Content-encoding: gzip for text files

This question is answered. Helpful answers available: 2. Correct answers available: 1.

Welcome, Guest Help
Login Login


Permlink Replies: 16 - Pages: 2 [ 1 2 | Next ] - Last Post: Nov 23, 2008 3:56 PM by: Robert Coup
michalfrackowiak

Posts: 17
Registered: 11/17/08
Content-encoding: gzip for text files
Posted: Nov 18, 2008 11:26 AM PST
 
  Click to reply to this thread Reply

Hi,

CloudFront was extremely easy to implement (took us 30 minutes for the complete integration!) but when we were testing the performance we discovered that it does not support compression (or we do not know how to enable this). So in fact the performance of our webpages were less than optimal and in fact the UI responsiveness was worse than not using CF, although the latency was great.

Would it be possible to enable gzip-compressed content (via Content-encoding: gzip) for "text" file types, e.g. "application/x-javascript", "text/plain", "text/html", "text/css", "text/javascript", "application/xhtml+xml" and similar?

Many thanks for another great service! It looks like the best CDN for small and mid businesses. I believe however that content compression however is crucial for many of us. We would switch immediately when this option is available too.

Michal Frackowiak
http://michalfrackowiak.com

Message was edited by: michalfrackowiak

Tal@AWS

Posts: 252
Registered: 7/1/08
Re: Content-encoding: gzip for text files
Posted: Nov 18, 2008 11:32 AM PST   in response to: michalfrackowiak
 
  Click to reply to this thread Reply

Thank you for the feedback.  We do not support Gzip and deflation in our initial release.  This is a capability we are considering for a future release.

A private beta customer posted the following work around as what he does:

Upload 2 versions into the CDN bucket (eg /resources-build1234-gzip/and /resources-build1234-nogzip/), pre-gzipping and adding theContent-Encoding header as part of the add-to-S3 process (only for text content, not images) for the -gzip version.

Use Accept-Encoding and the User-Agent string to determine whether to deliver CDN resources GZipped or not as part of the web request, and return URLs to the appropriate version via a template tag.

Strictly its bad since a user could take the gzip version of a javascript/css URL and ask for it in a non-gzip browser - but thats unlikely. As well, it means caching every HTML page twice (gzip/no-gzip).

Regards,

The CloudFront team.



michalfrackowiak

Posts: 17
Registered: 11/17/08
Re: Content-encoding: gzip for text files
Posted: Nov 18, 2008 12:13 PM PST   in response to: Tal@AWS
 
  Click to reply to this thread Reply

Thank you, this is a very nice solution. Not optimal, but I guess good enough to use CloudFront.

I will try to implement this and possibly post my findings and experiences.

Michal

wayneadmob

Posts: 3
Registered: 11/18/08
Re: Content-encoding: gzip for text files
Posted: Nov 18, 2008 3:59 PM PST   in response to: michalfrackowiak
 
  Click to reply to this thread Reply

I've tried this and it doesn't seem to work. CloudFront must be serving up files in a completely different manner than S3. CF is modifying the headers where S3 is not (namely it seems like the Content-Length header). Here's an example:

These two files should be the same (the CF distro is pointing to the same S3 bucket).
http://admob-static.s3.amazonaws.com/iphone/iadmob.js
http://d25m14jh7al8zt.cloudfront.net/iphone/iadmob.js

I uploaded the files to S3 with these headers:
'Content-Encoding' => 'gzip',
'Content-Length' => strio.string.length,
'Content-Type' => 'application/x-javascript',
'Expires' => 'Fri, 16 Nov 2018 22:09:29 GMT'

Doing a curl -D here are the headers I get:
S3
HTTP/1.1 200 OK
x-amz-id-2: 3iArtFabXKFPO0JftBcHhfK9wtH13UYqMvtkuC8jl95QveWivfvILFi9H2xA8sLj
x-amz-request-id: 345001477AE6BB3C
Date: Tue, 18 Nov 2008 23:56:48 GMT
Content-Encoding: gzip
Expires: Fri, 16 Nov 2018 22:09:29 GMT
Last-Modified: Tue, 18 Nov 2008 23:48:26 GMT
ETag: "8441229af719c03174981cdb7d0fec30"
Content-Type: application/x-javascript
Content-Length: 4453
Server: AmazonS3


CloudFront:
HTTP/1.0 200 OK
x-amz-id-2: 2oOIINf92CeurE7QkTz6hXTSpcoF0eN5ZI5NDWY1ghCGHK/VedvlDUzv14MZ3OEm
x-amz-request-id: 1050B02399ED198C
Date: Tue, 18 Nov 2008 22:11:14 GMT
x-amz-meta-s3fox-filesize: 11384
x-amz-meta-s3fox-modifiedtime: 1225498118000
Cache-Control: max-age=315360000
Content-Encoding: gzip
Expires: Fri, 16 Nov 2018 22:09:29 GMT
Last-Modified: Tue, 18 Nov 2008 22:10:56 GMT
ETag: "60cd922c776ff889ce56ae11bd1758cc"
Content-Type: application/x-javascript
Content-Length: 11384
Server: AmazonS3
Age: 6351
X-Cache: Hit from cloudfront
Via: 1.0 3385d16e8aeaf70ee27cd12b252c5d04.cloudfront.net:11180 (CloudFront), 1.0 6bd993f817487f73a24be7b46249ee50.cloudfront.net:11180 (CloudFront)
Connection: close


Needless to say the CloudFront version does not work.

Message was edited by: wayneadmob

Yejun Yang
RealName(TM)


Posts: 173
Registered: 1/22/08
Re: Content-encoding: gzip for text files
Posted: Nov 18, 2008 4:15 PM PST   in response to: wayneadmob
 
  Click to reply to this thread Reply

Your links are both working in my browser.
Response headers.

x-amz-id-2	SoGk2UREN+dd9W38gpemRkUUKQT8qPrSHH8yB96USQucVwY1VByBP/n9Jr+TCriu
x-amz-request-id	9C1549EAA32FD762
Date	Wed, 19 Nov 2008 00:02:20 GMT
Content-Encoding	gzip
Expires	Fri, 16 Nov 2018 22:09:29 GMT
Last-Modified	Tue, 18 Nov 2008 23:48:26 GMT
Etag	"8441229af719c03174981cdb7d0fec30"
Content-Type	application/x-javascript
Content-Length	4453
Server	AmazonS3
Age	708
X-Cache	Hit from cloudfront
Via	1.0 df950d0dbb20199f7993dcff4a01f80b.cloudfront.net:11180 (CloudFront)


Message was edited by: "aaronyy"

wayneadmob

Posts: 3
Registered: 11/18/08
Re: Content-encoding: gzip for text files
Posted: Nov 18, 2008 6:18 PM PST   in response to: Yejun Yang
 
  Click to reply to this thread Reply

odd, indeed it does work now, and with the correct headers. Is there some sort of replication delay that I'm not aware of that pushes the files from S3 out to the CDN?

michalfrackowiak

Posts: 17
Registered: 11/17/08
Re: Content-encoding: gzip for text files
Posted: Nov 19, 2008 1:20 AM PST   in response to: wayneadmob
 
  Click to reply to this thread Reply

I have been testing gzip content yesterday and it works, but:
1. You have to gzip the files, gzipped file name will be what will be served by CF, so probably you would want to "gzip myfile.js; mv myfile.js.gz myfile.js"
2. When uploading to S3 you must add "Content-Encoding:gzip" flag.
3. Since this is a gzipped content, CF servers will be unable to determine its mime type, so you should provide the Conent-Type header, e.g. "Content-Type:application/x-javascript".

The complete command to upload the myfile.js using s3sync is:

gzip myfile.js
mv myfile.js.gzip myfile.js
s3cmd.rb -s put mybucket:myfile.js myfile.js x-amz-acl:public-read Content-Encoding:gzip Content-Type:application/x-javascript

(we have those commands in a nice upload script for all our static content)

If I understand the caching mechanism correctly, adding the Expires header is nice too, so append to the line above:
"Expires:Tue, 23 Nov 2008 21:11:04 GMT"
(put any date in the future, more than 24h, but if you use Expires, you should re-upload your content periodically to keep the Expires header up-to-date).

@wayneadmob: If I understand correctly, CF nodes pull content from S3 only when they need to, e.g. their local copy expires or does not exist. Uploading new file versions to S3 does not distribute the content automatically to the CF nodes. This is why you should wisely use Expires and Cache-Control headers. If you experimenting with various file versions, my advice is to upload the file under a new object name into S3.

Hope this helps

Michal Frackowiak
http://michalfrackowiak.com

Allen

Posts: 5,320
Registered: 3/19/07
Re: Content-encoding: gzip for text files
Posted: Nov 19, 2008 3:02 AM PST   in response to: michalfrackowiak
 
  Click to reply to this thread Reply

> CF servers will be unable to determine its mime type

I don't believe the CF servers determine the mime type, but at the moment, this is not documented anywhere.


michalfrackowiak

Posts: 17
Registered: 11/17/08
Re: Content-encoding: gzip for text files
Posted: Nov 19, 2008 6:03 AM PST   in response to: Allen
 
  Click to reply to this thread Reply

> I don't believe the CF servers determine the mime type, but at the moment, this is not documented anywhere.

My mistake. You are right. One should _always_ set Content-Type for uploaded files to avoid problems with browsers.

Message was edited by: michalfrackowiak

michalfrackowiak

Posts: 17
Registered: 11/17/08
Successfully deployed!
Posted: Nov 19, 2008 7:45 AM PST   in response to: michalfrackowiak
 
  Click to reply to this thread Reply

Hi,

just to let you know - we have successfully deployed support for CloudFront on our production servers at http://www.wikidot.com , with the support for gzipped content.

Here is a very brief description of our structure:

1. All static content has been served from static.wikidot.com (css, images, javascript and support files) and we wanted to make this content available through CF.
2. The problem was with the lack support for auto-negotiated gzipped (deflated) content on CF servers and lack of support for SSL, so we decided to create static-l.wikidot.com to serve non-gzipped versions of files and not to use CF for it. This traffic makes only 10%-20% of the overall traffic to our static files so we can live with it for now.
3. If a client supports gzip encoding (we check if the browser sends Accept-Encoding: gzip) and the page is requested through HTTP, static content is served from static.wikidot.com, which now is configured (via CNAME) to a CF distribution.
4. Otherwise, it is served from static-l.wikidot.com, with the same protocol as the original request.
5. So the decision which servers to use for static content is made by the application that renders the HTML pages. Links in the HTML page lead either to static.wikidot.com or to static-l.wikidot.com
6. We have created automated scripts that push the content from static.wikidot.com to S3 bucket with some processing in between, as described in one of my posts above.

Although the idea is quite simple, the tricky part was to combine it with our rendering engines (we are using A LOT of Memcached and similar stuff because of our large traffic).

It works. And works really nicely. We have deployed this experimentally today and just for sure keep low TTL times on the static.wikidot.com domain so that we can quickly revert to the previous setup. But no problems so far.

Needless to say, loading times for our web pages are radically lower when we access the service from Europe and Asia (our primary servers are in USA).

Thanks to everyone for help! Looks like CloudFront is a game-changer for us!

Michal Frackowiak
http://michalfrackowiak.com
BTW: feel free to contact me if you need assistance with setting up CloudFront

wayneadmob

Posts: 3
Registered: 11/18/08
Re: Content-encoding: gzip for text files
Posted: Nov 19, 2008 10:57 AM PST   in response to: michalfrackowiak
 
  Click to reply to this thread Reply

@michalfrackowiak Are you saying that CF obeys cache control headers when pulling form S3? That's a double edge sword it seems...

Anybody know of a way to send a command to CF to force a repull all content from s3?

Yejun Yang
RealName(TM)


Posts: 173
Registered: 1/22/08
Re: Content-encoding: gzip for text files
Posted: Nov 19, 2008 5:31 PM PST   in response to: wayneadmob
 
  Click to reply to this thread Reply

You can't. Just use a new name for your file.

michalfrackowiak

Posts: 17
Registered: 11/17/08
Re: Content-encoding: gzip for text files
Posted: Nov 20, 2008 2:05 AM PST   in response to: Yejun Yang
 
  Click to reply to this thread Reply

This is already documented in the Developer's Guide:
http://docs.amazonwebservices.com/AmazonCloudFront/2008-06-30/DeveloperGuide/index.html?Expiration.html

Apart from that, we have seen the following headers from CF that suggest resyncing content from original S3 bucket (could someone confirm the real meaning on this?):

X-Cache: RefreshHit from cloudfront

Sometimes it is:
X-Cache: Miss from cloudfront
Which I guess is when the item is not in the CF node.

Michal Frackowiak
http://michalfrackowiak.com

msolnit

Posts: 15
Registered: 5/13/08
Re: Successfully deployed!
Posted: Nov 20, 2008 12:15 PM PST   in response to: michalfrackowiak
 
  Click to reply to this thread Reply

Michal,

Thank you very much for this detailed description.  Could you provide some more specifics about where the decision to use static vs. static-l is made?  Is it inside the server presentation code, when the first request is made?  Is it stored in a session variable somewhere?

Sincerely,
Matt


michalfrackowiak

Posts: 17
Registered: 11/17/08
Re: Successfully deployed!
Posted: Nov 21, 2008 12:25 AM PST   in response to: msolnit
 
  Click to reply to this thread Reply

It is "per-request" in the presentation layer. A piece of code might look like this (in PHP):

$proto = 'http';
$static = 'static';
if($_SERVER['HTTPS']){
	/* Page is requested over HTTPS, CF does not support it. */
	$proto = 'https';
	$static = 'static-l';
}
if(!substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) {
	/* Gzip encoding is not supported, serve files from non-gzipped server */
	$static='static-l';
}
 
$staticServer = $proto . '://' . $static . '.wikidot.com';


Later we use this $staticServer to build links to the static content.
In real life it is a bit more complicated because of our caching, so we in fact this piece of logic in the post-rendering processing. But the idea is exactly the same.

Of course we could also use CF for non-gzipped content, but this traffic is so small (and mostly bots, not humans) we do not bother.

Hope this helps

Michal Frackowiak
http://michalfrackowiak.com

Message was edited by: michalfrackowiak


Point your RSS reader here for a feed of the latest messages in all forums