Streaming data case study

Streaming Data with AWS Kinesis and Lambda

Maksim Pecherskiy

Data Engineer

This chapter

  • Send incoming data to Firehose
  • Store data
  • Visualize data
  • Set alerts in real-time
  • Monitor the stream
  • Meet a set of requirements
Streaming Data with AWS Kinesis and Lambda

2020-08-11_18-31.png

Streaming Data with AWS Kinesis and Lambda

Requirements

  • Tweets must include #sandiego hashtag
  • Tweets must come in real time
  • Tweets must come enriched with sentiment
  • Visualize last 15 minutes of data
  • Notify manager if >3 negative tweets in 5 minute interval
  • The stream should minimize data loss due to downtime
  • Data must persist to be analyzed later
Streaming Data with AWS Kinesis and Lambda

Tweets come in real-time

2020-08-02_08-38.png

Streaming Data with AWS Kinesis and Lambda

Enriched with sentiment

Streaming Data with AWS Kinesis and Lambda

Data must persist for later analysis

2020-08-02_08-42.png

Streaming Data with AWS Kinesis and Lambda

Visualize last 15 minutes

2020-08-11_18-44.png

Streaming Data with AWS Kinesis and Lambda

Redshift vs Elasticsearch

Redshift

  • Designed for storing clean tables of data
  • Schema is defined in database
  • SQL for queries
  • Works great with BI tools like Tableau

Elasticsearch

  • Schemaless - good for logs and text
  • Schema is created during query
  • Uses its own language for queries
  • Has its own UI - Kibana
Streaming Data with AWS Kinesis and Lambda

Let's practice!

Streaming Data with AWS Kinesis and Lambda

Preparing Video For Download...