AWS, Boto3

How to List Contents of An S3 Bucket Using Boto3 – Definitive Guide

Photo of author

Published on

by Vikram Aruchamy

Amazon Simple Storage Service (S3) is an object storage service that offers a scalable, reliable, and secure way to store data. You can use S3 to store various data, including images, videos, documents, and application data.

When working with S3 buckets, one common task is listing the contents of a bucket, which can include objects like files, folders, and other data. In this guide, you’ll learn how to accomplish this task using Boto3, the AWS SDK for Python. Whether you need to list all objects, filter by file type, or apply regular expressions, Boto3 provides the tools to interact with S3 programmatically.

Prerequisites

  • Install the Boto3 library using pip install boto3
  • AWS Security credentials(AWS Access Key ID and Secret Access Key)
  • Security credentials configured through aws configure command
  • Read access to the bucket from which you need to list the objects

List Contents of An S3 Bucket Using Boto3 Client

The Boto3 client creates a low-level service client using the default session.

To list the contents of an S3 bucket using a Boto3 client,

  • Create an S3 client representation using boto3.client(‘s3’)
  • Invoke the list_objects_v2() method using the s3_client object and pass the desired bucket name
  • If you have the read access, the list_objects_v2() method returns all the objects (up to 1000) in a bucket. If you need to get more than 1000 objects, use the pagination method, explained in the next section.
  • Iterate over the response as a dictionary, and you can access the objects using the key called Contents

Code

The following code demonstrates how to use list objects from the bucket using Boto3.

import boto3

s3_client = boto3.client('s3')

objects = s3_client.list_objects_v2(Bucket='mrcloudgurudemo')

for obj in objects['Contents']:
    print(obj['Key'])

Output

 csv_files/
    csv_files/business-financial-data-june-2023-quarter-csv.csv
    hello (1).txt
    hello (2).txt
    test_boto3.txt

The list_objects_v2() method in Boto3 allows you to filter the results of the list operation by specifying a number of criteria.

Here are some of the most common filters:

  • StartAfter (string) – StartAfter is where you want Amazon S3 to start listing from. Amazon S3 starts listing after this specified key. StartAfter can be any key in the bucket.
  • MaxKeys (integer) – Sets the maximum number of keys returned in the response. By default, the action returns up to 1,000 key names. The response might contain fewer keys but will never contain more.

To use these filters, you must pass the desired criteria to the Filter parameter of the list_objects_v2() method.

List More than 1000 Contents From An S3 Bucket Using Paginator in Boto3

To get more than 1000 contents from An S3 bucket, you need to use the paginator class and the list_objects_v2() method.

  • Get the paginator class for the list_objects_v2() method
  • Configure the paginator using the PaginationConfig and define the PageSize. This is the number of objects that need to be returned on each page
  • Iterate over the response and print objects on each page.

Code

The following code demonstrates how to get more than 1000 objects from the S3 bucket using Boto3.

import boto3 

s3_client = boto3.client("s3")

paginator = s3_client.get_paginator("list_objects_v2")

response = paginator.paginate(Bucket="mrcloudgurudemo", PaginationConfig={"PageSize": 3})

for page in response:
    files = page.get("Contents")
    for file in files:
        print(f"file_name: {file['Key']}")

    print('\nGetting next page..\n')
print('No further objects found.')

Output

The output of each page is printed during each iteration.

    file_name: csv_files/
    file_name: csv_files/business-financial-data-june-2023-quarter-csv.csv
    file_name: hello (1).txt

    Getting next page..

    file_name: hello (2).txt
    file_name: test_boto3.txt

    Getting next page..

    No further objects found.

List Contents of A Specific Directory of An S3 Bucket Using Boto3

To list the contents of a specific directory of an S3 bucket using the Boto3 client,

  • Use the list_objects_v2() method
  • Pass the bucket_name and the specific directory as a prefix using the prefix parameter
  • This method will return only the objects from that specific directory

Code

The following code demonstrates how to get the objects from the csv_files directory of the bucket mrcloudgurudemo

import boto3

s3 = boto3.client('s3')

bucket_name = 'mrcloudgurudemo'
prefix = 'csv_files/'

response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)

for obj in response.get('Contents', []):
    print(obj['Key'])

Output

The objects available under the csv_files directory are listed.

    csv_files/
    csv_files/business-financial-data-june-2023-quarter-csv.csv

List Specific File Types From a Bucket using the Boto3 Client

To list specific file types from a bucket using the boto3 client, you can check if the key of the object ends with the specific extension. There is no other explicit method available to get the particular file type.

Code

import boto3

s3 = boto3.client('s3')

bucket_name = 'mrcloudgurudemo'

response = s3.list_objects_v2(Bucket=bucket_name)

for obj in response.get('Contents', []):
    key = obj['Key']
    if key.endswith('txt'):
        print(key)

Output

Only the .txt files from the bucket are displayed.

    hello (1).txt
    hello (2).txt
    test_boto3.txt

List Files From Directory Matching A Regular Expression in S3 Bucket Using Boto3

To list files from a directory matching a regular expression, you must check if the object key matches the desired expression.

To learn more about regular expressions, read the syntax guide.

To list files from a directory matching a regular expression,

  • Get all the objects from the desired bucket using bucket.objects.all()
  • Iterate over the list of objects
  • During each iteration, check if the object.key matches the regular expression using the re.search() method

Code

The following code demonstrates how to get the objects that contain a number in the file name using the regular expression search.

import re 
import boto3

s3 = boto3.resource('s3')

my_bucket = s3.Bucket('mrcloudgurudemo')

substring =  "\d"

for obj in my_bucket.objects.all():
    if re.search(substring,  obj.key):  
        print(obj.key)

Output

    csv_files/business-financial-data-june-2023-quarter-csv.csv
    hello (1).txt
    hello (2).txt
    test_boto3.txt

List Contents of An S3 Bucket Using Boto3 Resource

The Boto3 Resource represents an object-oriented interface to AWS services.

The AWS Python SDK team does not intend to add new features to the resources interface in boto3. Existing interfaces will continue to operate during boto3’s lifecycle. You can use the Boto3 client interface explained above to interact with the service.

  • Create a resource representation for the S3 service
  • Create the bucket object for the desired bucket using its name
  • Iterate over the returned objects and access each object.

Code

import boto3

s3 = boto3.resource('s3')

my_bucket = s3.Bucket('mrcloudgurudemo')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Output

    csv_files/
    csv_files/business-financial-data-june-2023-quarter-csv.csv
    hello (1).txt
    hello (2).txt
    test_boto3.txt

Conclusion

In this article, you learned how to list the contents of an S3 bucket using the Boto3 library. You also learned how to filter the results of the list operation, list more than 1000 objects from a bucket, list the contents of a specific directory, and list particular file types from a bucket.

Additional Resources

Leave a Comment