Introduction to AWS Boto in Python
Maksim Pecherskiy
Data Engineer
df = pd.read_csv('https://gid-staging.potholes.csv')
Download File
s3.download_file(
Filename='potholes_local.csv',
Bucket='gid-staging',
Key='2019/potholes_private.csv')
Read From Disk
pd.read_csv('./potholes_local.csv')
Use '.get_object()'
obj = s3.get_object(Bucket='gid-requests', Key='2019/potholes.csv')
print(obj)
Get the object
obj = s3.get_object(
Bucket='gid-requests',
Key='2019/potholes.csv')
Read StreamingBody
into Pandas
pd.read_csv(obj['Body'])
Example
https://?AWSAccessKeyId=12345&Signature=rBmnrwutb6VkJ9hE8Uub%2BBYA9mY%3D&Expires=1557624801
Upload a file
s3.upload_file(
Filename='./potholes.csv',
Key='potholes.csv',
Bucket='gid-requests')
Generate Presigned URL
share_url = s3.generate_presigned_url(
ClientMethod='get_object',
ExpiresIn=3600,
Params={'Bucket': 'gid-requests','Key': 'potholes.csv'}
)
Open in Pandas
pd.read_csv(share_url)
# Create list to hold our DataFrames df_list = []
# Request the list of csv's from S3 with prefix; Get contents response = s3.list_objects( Bucket='gid-requests', Prefix='2019/')
# Get response contents request_files = response['Contents']
# Iterate over each object for file in request_files: obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])
# Read it as DataFrame obj_df = pd.read_csv(obj['Body'])
# Append DataFrame to list df_list.append(obj_df)
# Concatenate all the DataFrames in the list df = pd.concat(df_list)
# Preview the DataFrame df.head()
Download then open
s3.download_file()
Open directly
s3.get_object()
Generate presigned URL
s3.generate_presigned_url()
Generate using .format()
'https://{bucket}.{key}'
Generate using .get_presigned_url()
'https://?AWSAccessKeyId=12345&Signature=rBmnrwutb6VkJ9hE8Uub%2BBYA9mY%3D&Expires=1557624801'
Introduction to AWS Boto in Python