Data engineering and big data

Understanding Data Engineering

Hadrien Lacroix

Content Developer at DataCamp

About the course

  • Conceptual course
  • No coding involved
  • Objectives
    • Being able to exchange with data engineers
    • Provide a solid foundation to learn more
Understanding Data Engineering

Chapter 1

What is data engineering?

  1. Data engineering and big data
  2. Data engineers vs. data scientists
  3. Data pipelines
Understanding Data Engineering

Chapter 2

How data storage works

  1. Structured vs unstructured data
  2. SQL
  3. Data warehouse and data lakes
Understanding Data Engineering

Chapter 3

How to move and process data

  1. Processing data
  2. Scheduling data
  3. Parallel computing
  4. Cloud computing
Understanding Data Engineering

$$

$$

$$

spotflix logo

Understanding Data Engineering

Data workflow

first step - data collection and storage

Understanding Data Engineering

Data workflow

Second step - Data preparation

Understanding Data Engineering

Data workflow

Third step - exploration and visualization

Understanding Data Engineering

Data workflow

experimentation and prediction

Understanding Data Engineering

Data engineers

data collection and storage is circled

Understanding Data Engineering

Data engineers

Data engineers deliver:

  • the correct data
  • in the right form
  • to the right people
  • as efficiently as possible
Understanding Data Engineering

A data engineer's responsibilities

  • Ingest data from different sources
  • Optimize databases for analysis
  • Remove corrupted data
  • Develop, construct, test and maintain data architectures
Understanding Data Engineering

Data engineers and big data

  • Big data becomes the norm =>
Understanding Data Engineering

Data engineers and big data

  • Big data becomes the norm => data engineers are more and more needed
  • Big data:
    • Have to think about how to deal with its size
    • So large traditional methods don't work anymore
Understanding Data Engineering

Big data growth

  • Sensors and devices
  • Social media
  • Enterprise data
  • VoIP (voice communication, multimedia sessions)

graph showing big data growth

1 Data Age 2025, Seagate, November 2018
Understanding Data Engineering

The five Vs

  • Volume (how much?)
  • Variety (what kind?)
  • Velocity (how frequent?)
  • Veracity (how accurate?)
  • Value (how useful?)
Understanding Data Engineering

Summary

  • What's waiting for you
  • How data flows through an organization
  • When a data engineer intervenes
  • What their responsibilities are
  • How data engineering relates to big data
Understanding Data Engineering

Let's practice!

Understanding Data Engineering

Preparing Video For Download...