Case Study: Generating a Report Repository

Introduction to AWS Boto in Python

Maksim Pecherskiy

Data Engineer

Final product

Final product

Introduction to AWS Boto in Python

The steps

Prepare the data
  • Download files for the month from the raw data bucket
  • Concatenate them into one csv
  • Create an aggregated DataFrame
Introduction to AWS Boto in Python

The steps

Create the report
  • Write the DataFrame to CSV and HTML
  • Generate a Bokeh plot, save as HTML
Introduction to AWS Boto in Python

The steps

Upload report to shareable website
  • Create gid-reports bucket
  • Upload all the three files for the month to S3
  • Generate an index.html file that lists all the files
  • Get the website URL!
Introduction to AWS Boto in Python

Raw data bucket

Raw data bucket

  • Private files
  • Daily CSVs of requests from the App
  • Raw data

Gid requests bucket

Introduction to AWS Boto in Python

Read raw data files

# Create list to hold our DataFrames
df_list = []

# Request the list of csv's from S3 with prefix; Get contents response = s3.list_objects( Bucket='gid-requests', Prefix='2019_jan')
# Get response contents request_files = response['Contents']
Introduction to AWS Boto in Python

Read raw data files

# Iterate over each object
for file in request_files:
    obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])

# Read it as DataFrame obj_df = pd.read_csv(obj['Body'])
# Append DataFrame to list df_list.append(obj_df)
Introduction to AWS Boto in Python

Read raw data files

# Concatenate all the DataFrames in the list
df = pd.concat(df_list)

# Preview the DataFrame df.head()

DataFrame preview

Introduction to AWS Boto in Python

Create aggregated reports

  • Perform some aggregation
  • df.to_csv('jan_final_report.csv')
  • df.to_html('jan_final_report.html')
  • jan_final_chart.html

Aggregated reports

Introduction to AWS Boto in Python

Report bucket

Report bucket

  • Bucket website
  • Publicly Accessible
  • Aggregated data and HTML reports

Report Bucket

Introduction to AWS Boto in Python

Upload Aggregated CSV

# Upload Aggregated CSV to S3
s3.upload_file(Filename='./jan_final_report.csv', 
                      Key='2019/jan/final_report.csv', 
                      Bucket='gid-reports',
                      ExtraArgs = {'ACL': 'public-read'})
Introduction to AWS Boto in Python

Upload HTML Table

# Upload HTML table to S3
s3.upload_file(Filename='./jan_final_report.html', 
                      Key='2019/jan/final_report.html', 
                      Bucket='gid-reports',
                      ExtraArgs = {
                        'ContentType': 'text/html',
                        'ACL': 'public-read'})
Introduction to AWS Boto in Python

Upload HTML Chart

# Upload Aggregated Chart to S3
s3.upload_file(Filename='./jan_final_chart.html', 
                      Key='2019/jan/final_chart.html', 
                      Bucket='gid-reports',
                      ExtraArgs = {
                        'ContentType': 'text/html',
                        'ACL': 'public-read'})
Introduction to AWS Boto in Python

Uploaded reports

Uploaded reports

Introduction to AWS Boto in Python

Create index.html

# List the gid-reports bucket objects starting with 2019/
r = s3.list_objects(Bucket='gid-reports', Prefix='2019/')

# Convert the response contents to DataFrame objects_df = pd.DataFrame(r['Contents'])
# Create a column "Link" that contains website url + key base_url = "https://gid-reports." objects_df['Link'] = base_url + objects_df['Key']
Introduction to AWS Boto in Python

Create index.html

# Write DataFrame to html
objects_df.to_html('report_listing.html', 
                   columns=['Link', 'LastModified', 'Size'],
                   render_links=True)

Write DF to html

Introduction to AWS Boto in Python

Upload index.html

# Upload the file to gid-reports bucket root.
s3.upload_file(
  Filename='./report_listing.html', 
  Key='index.html', 
  Bucket='gid-reports',
  ExtraArgs = {
    'ContentType': 'text/html', 
    'ACL': 'public-read'
  })
Introduction to AWS Boto in Python

Get the URL of the index!

Bucket website URL *


"http://gid-reports.index.html"
Introduction to AWS Boto in Python

Let's tweak!

Introduction to AWS Boto in Python

Preparing Video For Download...