When using Amazon S3 to store the web files, you may face this question sometimes,
How can I keep the files for my productive environment exactly the same as develop environment? (In other words, you want to remove the unused files in dev environment.)
If it’s a normal working environment, rsync is a nice solution and its –delete parameter does what we expect. Then, on Amazon S3 is there a solution for us to keep two buckets synced smoothly?
Here we’ll show a working solution with s3cmd. Other Amazon S3 clients may have similar features, but we will not explain them here.
By using the bucket sync feature, you need to update to version 1.0.0, prior to this version, bucket syncing is not supported, and you may face this error:
root@demo:~# s3cmd sync --delete-removed s3://admon/brands s3://admon-dev/ ERROR: Parameter problem: Invalid source/destination: 's3://admon/brands' 's3://admon-dev/'
When s3cmd updated, the command is as simple as what rsync does:
s3cmd sync --delete-removed --acl-public s3://admon/brands s3://admon-dev/
The –delete-removed option has the same meaning as rsync’s –delete parameter, and –acl-public is an extra parameter which is used to set your files publicly accessible. If you want to keep your pictures private, you can just remove –acl-public.
Note: If you have a extremly large number of pictures (like more than 100k pictures), this command takes a significate amount of memory. As for my test, it takes 2.7GB resident memory when the bucket contains 500k objects ( pictures and directories).