5 Tips on Amazon S3 hosting optimization

If you use Amazon S3 (or CloudFront) as a web hosting provider, here are some essential tips that you should know about. These tips are mainly for front-end optimizations, including reduce S3 bandwidth, reduce unnecessary requests to S3, check abnormal requests and optimizations on file transfer.

Before moving ahead, you need a S3 file manager. There are a couple of good S3 file managers that can be used to manage S3 files like S3Cmd, S3Hub, S3Fox and CloudBerry Explorer (only available on Windows). It’s suggested to use S3Hub (or Cloudberry Explorer) as it implements the features that we mentioned here.

Tip #1: Are S3 files misused by others? Enable the S3 access logging.

Amazon S3’s bandwidth rates are inexpensive and you pay for what you use. The problem is that if other websites are linking to your S3 hosted content (like images, MP3s, Flash videos, etc.), you’ll also have to pay for bandwidth that used by these sites.

Unlike Apache web servers where you can easily prevent referred links from other websites, Amazon S3 doesn’t offer such mechanism but what you can do is to enable logging for your S3 buckets. Amazon will log all client requests in logs and you can parse them later. If you noticed a file is heavily referred to by other websites, you may rename the file on S3 or just delete it if you’re ok with that.

By implementing it, you create a new S3 bucket, then right-click the bucket name, choose “Logging” then specify the bucket to store the log files.

Tip #2: Set Expiry Headers for Static Files

The cost for GET requests is small (just 1¢ per 10,000 requests), they can quickly add-up if you have have a popular site or if your website design uses too many images. It is important that you add an an Expires or a Cache-Control HTTP Header for static content on your site like images, css/js files.

The gist is that all web browsers store objects in their cache and this Expires header in the HTTP response tells the browser how long that object should stay in the cache. So if you can set the Expires date sometime in future and client browser won’t request the object again.

To set an expires header, right click the S3 object properties, choose HTTP headers and add a new header. Call it “Expires” and set an expiration date like “Fri, Apr 23 2021 10:18:36 GMT”.

Other than saving money, your site will also load relatively faster because the visitor’s browser will not establish as much connections/queries as it was.

If you are worried about setting Cache headers for JavaScript and CSS files as they may change frequently (especially when you are in the midst of a site re-design), just append a version number after the file name like how commonly rubyonrails application does.

When the main.png on S3 updated, you just need to change the version number and visitors browser will make a fresh GET request to Amazon S3 for the latest version of the S3 file.

Tip #3: Use Amazon S3 with a reasonable Domain Name

When a new bucket on Amazon S3 is created, setting the file access to public and Amazon will provide you with a public URL which is something like:
http://bucketname.s3.amazonaws.com/filename
http://s3.amazonaws.com/bucketname/filename

It’s suggested to use the first URL since you can do the Tip #4 mentioned below.

Tip #4: Block search engine crawlers on your Amazon S3

To prevent robots from indexing files stored in your Amazon S3 buckets, create a robots.txt file in the root directory with the content:

User-agent: *
Disallow: /

Make sure that you update the ACL (or access permissions) to public otherwise spiders won’t find your robots.txt file.

Tip #5: Other Optimizations on File transferring
Torrent Delivery for large files

If you are planning to distribute some large files of the web (like a software installer or some database dump) via Amazon S3, it makes sense to use BitTorrent with S3 so that you don’t necessarily have to pay for all the download bandwidth. The starting point for a BitTorrent download is a .torrent file and you can quickly create a .torrent file to any S3 object by adding “?torrent” to the original web URL like this:

http://bucketname.s3.amazonaws.com/software-installer.zip?torrent

Create Time Limited Links

By default all public files in your S3 account are available for download forever until you delete the file or change the permissions. However, if you are running some sort of contest on your site where you are giving away a PDF ebook or some MP3 ringtone to your visitors, it doesn’t make sense to have those file live on your S3 server beyond the duration of the contest. You should therefore consider creating “signed URLs” for such temporary S3 files – these are time limited URLs that are valid for a specific time period and expire afterwards (or return 404s).

Right click a file in the S3 bucket, choose Web URL and then set a Expiry Time. Click Generate to created a “signed URL”.

4 thoughts on “5 Tips on Amazon S3 hosting optimization

  1. Nice one….
    Your article really help out to safe the data on S3,I am Using Bucket explorer and its really assure as per my knowledge about data its support to set bucket policy and all…I really learn the Basic utilities of these things by your article thanks….I am surly implement it….

  2. Hi there, I just seen that sometimes this site renders a 404 server error. I figured that you would like to know. Regards

Leave a comment

Your email address will not be published. Required fields are marked *