I have used Amazon S3 for quite a long time, and my S3 bucket may contains tens of thousands of files. How can I find a easiest and quickest way to list all these files in the bucket?
Well, with boto you can do that with expectation. Boto is a Python package that provides interfaces to Amazon Web Services. With the help of Boto, we only need to create a python script to query S3 bucket.
First of first, make sure you have installed boto, which can be done by either easy_install or “pip install boto”:
root@local:~/s3-list# pip install boto Downloading/unpacking boto Downloading boto-2.26.1.tar.gz (6.5Mb): 6.5Mb downloaded Running setup.py egg_info for package boto warning: no files found matching 'boto/mturk/test/*.doctest' warning: no files found matching 'boto/mturk/test/.gitignore' Installing collected packages: boto Running setup.py install for boto warning: no files found matching 'boto/mturk/test/*.doctest' warning: no files found matching 'boto/mturk/test/.gitignore' changing mode of build/scripts-2.7/sdbadmin from 644 to 755 changing mode of build/scripts-2.7/elbadmin from 644 to 755 ... ... changing mode of /usr/local/bin/taskadmin to 755 changing mode of /usr/local/bin/cq to 755 changing mode of /usr/local/bin/cwutil to 755 changing mode of /usr/local/bin/sdbadmin to 755 changing mode of /usr/local/bin/fetch_file to 755 changing mode of /usr/local/bin/cfadmin to 755 changing mode of /usr/local/bin/s3put to 755 changing mode of /usr/local/bin/instance_events to 755 changing mode of /usr/local/bin/asadmin to 755 Successfully installed boto Cleaning up...
Then create a script as below, and name it list-s3.py,
from boto.s3.connection import S3Connection conn = S3Connection('access-key','secret-access-key') bucket = conn.get_bucket('bucket') for key in bucket.list(): print key.name.encode('utf-8')
Now all you files can be listed by,
$ python list-s3.py > results.txt
This script won’t use much memory, and it takes minutes to list 2.5 millions file names.