AWS, AWS S3, Boto3

Reading File Content From AWS S3 Bucket Using Boto3 – Definitive Guide

Photo of author

Updated on

by Vikram Aruchamy

Amazon S3 (Simple Storage Service) is AWS’s versatile and highly scalable cloud storage solution. It’s a critical component of many AWS workflows, storing data for various purposes. Understanding how to read file content from S3 is essential for extracting, analysing, and processing data stored in the cloud, enabling seamless integration with applications and data-driven decision-making.

This tutorial teaches you how to read file content from AWS S3 using Boto3.

Prerequisites

  • Read access to the object
  • Security credentials AWS Access Key ID and the Secret Access Key.

The Security credentials can be created by clicking your Profile name at the top right corner, Clicking the Security credentials menu, and clicking the Create access key option available under the Access keys section.

Setting Up Boto3 And Configuring Credentials

Use the following to install Boto3 if you’ve not installed it already.

pip install boto3

Once Boto3 is installed, you have the option to set up your credentials in your system using the aws configure command.

aws configure

It will let you configure the Access Key ID and the Secret access key and make it available globally in your system.

If you do not prefer to configure settings globally, an alternative approach is to specify credentials and establish a Boto3 session within your program to generate a client object from that session instance, as demonstrated below. However, it’s important to note that hardcoding security credentials directly into your program is not considered a best practice.

import boto3

session = boto3.Session(
    aws_access_key_id='Your Access Key ID',
    aws_secret_access_key='Your Secret Access Key'
)

s3_client = session.client('s3')

Reading A File From An S3 Bucket Using Boto3 Client

The Boto3 client creates a low-level service client using the default session.

To read a file from an S3 bucket using the Boto3 client,

  • Create a client object that represents the S3 service
  • Invoke the get_object() and pass the bucket name and the key name
  • Read the response body using response['Body'].read()
  • It will print the byte string representation of the file content.

Code

import boto3

bucket_name = 'mrcloudgurudemo'

object_name = 'test_boto3.txt'

s3_client = boto3.client('s3')

response = s3_client.get_object(Bucket=bucket_name, Key=object_name)

file_content = response['Body'].read()

print(file_content)

Output

    b'This file is a text file used for demonstrating Boto3. \n\nThis file contains multiple lines. \n\nBoto3 is a Python library to interact with the AWS service using Python. '

Decoding the Bytes String to Normal String

To convert the byte string representation to the normal string, you can use the decode() method and pass the utf-8 encoding. Because, in most cases, the text files are encoded using the. utf-8 encoding, and you can use the same to decode it.

The following code demonstrates how to decode the byte string to a normal string using the output while reading the file.

import boto3

bucket_name = 'mrcloudgurudemo'

object_name = 'test_boto3.txt'

s3_client = boto3.client('s3')

response = s3_client.get_object(Bucket=bucket_name, Key=object_name)

file_content = response['Body'].read().decode('utf-8')

print(file_content)

Output

    This file is a text file used for demonstrating Boto3. 

    This file contains multiple lines. 

    Boto3 is a Python library to interacts with the AWS service using Python. 

Reading A File Line By Line From An S3 Bucket Using Boto3 Client

To read the file line by line from the S3 bucket using Boto3 content,

  • Read the file using the get_object() method as explained in the previous section
  • Split the file content using the \n so that each line is separated and stored in an array
  • Iterate over the array and print each line

This may be useful in cases where you need to read content on specific lines or read files partially.

Code

import boto3

bucket_name = 'mrcloudgurudemo'

object_name = 'test_boto3.txt'

s3_client = boto3.client('s3')

response = s3_client.get_object(Bucket=bucket_name, Key=object_name)

file_content = response['Body'].read().decode('utf-8')

for line in file_content.split('\n'):
    print(line)

Output

    This file is a text file used for demonstrating Boto3. 

    This file contains multiple lines. 

    Boto3 is a Python library to interact with the AWS service using Python. 

Reading A File From An S3 Bucket Using Boto3 Resource

The Boto3 Resource represents an object-oriented interface to AWS services.

The AWS Python SDK team does not intend to add new features to the resources interface in boto3. Existing interfaces will continue to operate during boto3’s lifecycle. You can use the client interface to interact with the service.

To read a file from an S3 bucket using the Boto3 resource object,

  • Create a Resource representation of the S3 service using the default boto3 session
  • Create an object representation for the S3 resource using the s3_resource.Object() and pass the desired bucket name and the object key’s name
  • Invoke the get() method of the resource object and read the response from the response body using response['Body'].read()
  • It’ll return the bytes String representation of the content. To decode the content to a normal string representation, use the decode(‘utf-8’) method
  • It’ll return the content in the normal string format

Code

import boto3

bucket_name = 'mrcloudgurudemo'

object_name = 'test_boto3.txt'

s3_resource = boto3.resource('s3')

obj = s3_resource.Object(bucket_name, object_name)

file_content = obj.get()['Body'].read().decode('utf-8')

print(line)

Output

    Boto3 is a Python library to interact with the AWS service using Python. 

Exception Handling

Exception handling is crucial in a program to gracefully handle and recover from unexpected errors, ensuring the program continues running smoothly and providing better user experiences.

While reading an object from an S3 bucket, there are chances that the bucket or the object doesn’t exist. Hence, you need to handle these exception scenarios to ensure the smooth running of your program.

Code

The following code demonstrates handling the NoSuchBucket and the NoSuchKey exceptions.

import boto3
import botocore.exceptions

try:


    bucket_name = 'mrcloudgurudemo'

    object_name = 'test_boto3a.txt'

    s3_client = boto3.client('s3')

    response = s3_client.get_object(Bucket=bucket_name, Key=object_name)

    if response['ResponseMetadata']['HTTPStatusCode'] == 200:
        file_content = response['Body'].read().decode('utf-8')
        print(file_content)

except botocore.exceptions.ClientError as e:

    if e.response['Error']['Code'] == 'NoSuchBucket':
        print(f"{bucket_name} ->  No Such Bucket Exists : {e}")

    elif e.response['Error']['Code'] == 'NoSuchKey':
        print(f"{object_name} ->  No such key exists : {e}")

    elif e.response['Error']['Code'] in ('NoCredentialsError', 'PartialCredentialsError'):
        print(f"Error: {e}")

    else:
        print(f"An unexpected error occurred: {e}")

except Exception as e:
    print(f"An unexpected error occurred: {e}")

Output

    test_boto3a.txt ->  No such key exists : An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

Conclusion

In this definitive guide, you’ve learned how to harness the power of Boto3 to seamlessly access and retrieve file content from Amazon S3, AWS’s versatile and highly scalable cloud storage service.

Whether you need to extract, analyse, or process data stored in the cloud, mastering these techniques is vital for seamless integration with your applications and data-driven decision-making. By understanding the essential concepts, setting up credentials, and leveraging Boto3’s capabilities, you can now interact with AWS S3 and unlock its full potential for your projects.

Additional Resources

Leave a Comment