Evaluating modern data architecture solutions

Understanding Modern Data Architecture

Miller Trujillo

Senior Software Engineer

Ingestion

Solution proposed with ingestion highlighted

  • Unpredictable patterns
  • What if we pull the data?
    • Expose the files
    • Network file system
Understanding Modern Data Architecture

Storage

Solution proposed with storage highlighted

  • Cloud storage is:

    • cheaper that data warehouse or databases
    • Flexible, and expose then required APIs
  • BigQuery still an option?

    • Cheap enough
    • Not feasible due limitations at loading
  • Life-cycle policies to reduce even more the costs

Understanding Modern Data Architecture

Processing

Solution proposed with streaming processing highlighted

  • Dataflow, Dataproc (Spark), or even Data Fusion
  • Unpredictable arriving patterns
  • Process data as soon as it arrives
  • Simplicity
  • Temporal data
  • Automate cleaning with life-cycle policies
  • No schema maintenance needed
Understanding Modern Data Architecture

Processing: The model scores

Solution proposed with batch processing highlighted

  • Complex to keep track of everything
  • Easier to maintain
  • Previous job can write to NoSQL DB and this job complement data
Understanding Modern Data Architecture

Serving the data

Solution proposed with batch processing highlighted

  • BigQuery for analytical purposes
  • NoSQL DB => Easier scalability & flexibility
Understanding Modern Data Architecture

Some other details

  • Governance, orchestration, security, among others
  • Further refine the platform and requirements
  • Enable better management
  • Not one size fits all!

Everything is about trade-offs

Understanding Modern Data Architecture

Let's practice!

Understanding Modern Data Architecture

Preparing Video For Download...