Amazon Simple Storage Service (S3) is an object storage service that offers a scalable, reliable, and secure way to store data. You can use S3 to store various data, including images, videos, documents, and application data.
When working with S3 buckets, one common task is listing the contents of a bucket, which can include objects like files, folders, and other data. In this guide, you’ll learn how to accomplish this task using Boto3, the AWS SDK for Python. Whether you need to list all objects, filter by file type, or apply regular expressions, Boto3 provides the tools to interact with S3 programmatically.
Prerequisites
- Install the
Boto3
library usingpip install boto3
- AWS Security credentials(AWS Access Key ID and Secret Access Key)
- Security credentials configured through
aws configure
command - Read access to the bucket from which you need to list the objects
List Contents of An S3 Bucket Using Boto3 Client
The Boto3 client creates a low-level service client using the default session.
To list the contents of an S3 bucket using a Boto3 client,
- Create an S3 client representation using
boto3.client(‘s3’)
- Invoke the list_objects_v2() method using the
s3_client
object and pass the desired bucket name - If you have the read access, the
list_objects_v2()
method returns all the objects (up to 1000) in a bucket. If you need to get more than 1000 objects, use the pagination method, explained in the next section. - Iterate over the response as a dictionary, and you can access the objects using the key called
Contents
Code
The following code demonstrates how to use list objects from the bucket using Boto3.
import boto3
s3_client = boto3.client('s3')
objects = s3_client.list_objects_v2(Bucket='mrcloudgurudemo')
for obj in objects['Contents']:
print(obj['Key'])
Output
csv_files/
csv_files/business-financial-data-june-2023-quarter-csv.csv
hello (1).txt
hello (2).txt
test_boto3.txt
The list_objects_v2()
method in Boto3 allows you to filter the results of the list operation by specifying a number of criteria.
Here are some of the most common filters:
- StartAfter (string) – StartAfter is where you want Amazon S3 to start listing from. Amazon S3 starts listing after this specified key. StartAfter can be any key in the bucket.
- MaxKeys (integer) – Sets the maximum number of keys returned in the response. By default, the action returns up to 1,000 key names. The response might contain fewer keys but will never contain more.
To use these filters, you must pass the desired criteria to the Filter
parameter of the list_objects_v2()
method.
List More than 1000 Contents From An S3 Bucket Using Paginator in Boto3
To get more than 1000 contents from An S3 bucket, you need to use the paginator class and the list_objects_v2()
method.
- Get the paginator class for the
list_objects_v2()
method - Configure the paginator using the
PaginationConfig
and define thePageSize
. This is the number of objects that need to be returned on each page - Iterate over the response and print objects on each page.
Code
The following code demonstrates how to get more than 1000 objects from the S3 bucket using Boto3.
import boto3
s3_client = boto3.client("s3")
paginator = s3_client.get_paginator("list_objects_v2")
response = paginator.paginate(Bucket="mrcloudgurudemo", PaginationConfig={"PageSize": 3})
for page in response:
files = page.get("Contents")
for file in files:
print(f"file_name: {file['Key']}")
print('\nGetting next page..\n')
print('No further objects found.')
Output
The output of each page is printed during each iteration.
file_name: csv_files/
file_name: csv_files/business-financial-data-june-2023-quarter-csv.csv
file_name: hello (1).txt
Getting next page..
file_name: hello (2).txt
file_name: test_boto3.txt
Getting next page..
No further objects found.
List Contents of A Specific Directory of An S3 Bucket Using Boto3
To list the contents of a specific directory of an S3 bucket using the Boto3 client,
- Use the
list_objects_v2()
method - Pass the bucket_name and the specific directory as a prefix using the prefix parameter
- This method will return only the objects from that specific directory
Code
The following code demonstrates how to get the objects from the csv_files directory of the bucket mrcloudgurudemo
import boto3
s3 = boto3.client('s3')
bucket_name = 'mrcloudgurudemo'
prefix = 'csv_files/'
response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
for obj in response.get('Contents', []):
print(obj['Key'])
Output
The objects available under the csv_files directory are listed.
csv_files/
csv_files/business-financial-data-june-2023-quarter-csv.csv
List Specific File Types From a Bucket using the Boto3 Client
To list specific file types from a bucket using the boto3 client, you can check if the key of the object ends with the specific extension. There is no other explicit method available to get the particular file type.
Code
import boto3
s3 = boto3.client('s3')
bucket_name = 'mrcloudgurudemo'
response = s3.list_objects_v2(Bucket=bucket_name)
for obj in response.get('Contents', []):
key = obj['Key']
if key.endswith('txt'):
print(key)
Output
Only the .txt
files from the bucket are displayed.
hello (1).txt
hello (2).txt
test_boto3.txt
List Files From Directory Matching A Regular Expression in S3 Bucket Using Boto3
To list files from a directory matching a regular expression, you must check if the object key matches the desired expression.
To learn more about regular expressions, read the syntax guide.
To list files from a directory matching a regular expression,
- Get all the objects from the desired bucket using
bucket.objects.all()
- Iterate over the list of objects
- During each iteration, check if the
object.key
matches the regular expression using the re.search() method
Code
The following code demonstrates how to get the objects that contain a number in the file name using the regular expression search.
import re
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('mrcloudgurudemo')
substring = "\d"
for obj in my_bucket.objects.all():
if re.search(substring, obj.key):
print(obj.key)
Output
csv_files/business-financial-data-june-2023-quarter-csv.csv
hello (1).txt
hello (2).txt
test_boto3.txt
List Contents of An S3 Bucket Using Boto3 Resource
The Boto3 Resource represents an object-oriented interface to AWS services.
The AWS Python SDK team does not intend to add new features to the resources interface in boto3. Existing interfaces will continue to operate during boto3’s lifecycle. You can use the Boto3 client interface explained above to interact with the service.
- Create a resource representation for the S3 service
- Create the bucket object for the desired bucket using its name
- Iterate over the returned objects and access each object.
Code
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('mrcloudgurudemo')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)
Output
csv_files/
csv_files/business-financial-data-june-2023-quarter-csv.csv
hello (1).txt
hello (2).txt
test_boto3.txt
Conclusion
In this article, you learned how to list the contents of an S3 bucket using the Boto3 library. You also learned how to filter the results of the list operation, list more than 1000 objects from a bucket, list the contents of a specific directory, and list particular file types from a bucket.