Streaming data case study
Streaming Data with AWS Kinesis and Lambda
Maksim Pecherskiy
Data Engineer
This chapter
Send incoming data to Firehose
Store data
Visualize data
Set alerts in real-time
Monitor the stream
Meet a set of requirements
Requirements
Tweets must include #sandiego hashtag
Tweets must come in real time
Tweets must come enriched with sentiment
Visualize last 15 minutes of data
Notify manager if >3 negative tweets in 5 minute interval
The stream should minimize data loss due to downtime
Data must persist to be analyzed later
Tweets come in real-time
Enriched with sentiment
Data must persist for later analysis
Visualize last 15 minutes
Redshift vs Elasticsearch
Redshift
Designed for storing clean tables of data
Schema is defined in database
SQL for queries
Works great with BI tools like Tableau
Elasticsearch
Schemaless - good for logs and text
Schema is created during query
Uses its own language for queries
Has its own UI - Kibana
Let's practice!
Streaming Data with AWS Kinesis and Lambda
Preparing Video For Download...