Data Platforms

Framing and Approaches

Framing

Stevey’s Google Platforms Rant

Start with a platform, then use it for everything…

Data architecture vs backend architecture

I’ve found it useful to push as much as you can out of the backend into the data platform.

A reference guide for fintech & small-data engineering

Data Systems Tend Towards Production

There are many reasons to worry about non-data team members writing logic in SQL and making dbt PRs. What I can guarantee — this logic will be written, and if the data team gatekeeps, it will be written outside of their visibility. If a data team can educate and encourage contributions to their codebase, they invite code to be written where it most belongs.

DataOps Principles: How Startups Do Data The Right Way

If it’s someone’s job to handle all data requests by writing a new SQL query or to downloading data from external systems, your team is headed in the wrong direction.

Start with manual processes — like SQL queries or API pulls — to understand the problem space. Next, automate the repetitive parts and start manually monitoring that automation. Finally, automate the actions taken to correct issues found via monitoring and manually check performance metrics.

Frequent small changes over infrequent large changes

Functional Data Engineering — a modern paradigm for batch data processing

A Beginner’s Guide to Data Engineering

Getting started: the 3 stages of data infrastructure

My Philosophy on Alerting

Research quality data and research quality databases

4 Pillars of Analytics

  • Acquire, Process, Surface, Act.

Start with a platform, and then use it for everything…

Choose Boring Technology

Whom the Gods Would Destroy, They First Give Real-time Analytics

Scaling Data and Self-Serve Analytics

Idempotence Now Prevents Pain Later

Whoops, the numbers are wrong! Scaling Data Quality @ Netflix (talk)

Why most analytics efforts fail: A step by step process to fix the root causes of most event analytics mistakes

Emerging Architectures for Modern Data Infrastructure

Approaches

Gardening Platforms

A ~200 slide flip book style presentation about how to garden platforms.

Testing Statistical Software

Dynamic Data Testing

Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend

Consider SQL when writing your next processing pipeline

We’re seeing a general move towards expressing pipelines in plain SQL.

What are the steps / tools in setting up a modern, SaaS-based BI infrastructure?

It’s Time for Open Source Analytics

Beyond Interactive: Notebook Innovation at Netflix

Part 2: Scheduling Notebooks at Netflix

Software Engineering for Machine Learning: A Case Study

The ultimate guide to Google Sheets as a reliable data source

Cookiecutter Data Science


Data Products

Framing and Approaches

Framing

One Model to Rule Them All

This post discusses our obsession with finding the best model and emphasizes what we should do instead: Take a step back and see the bigger picture…

How to deliver on Machine Learning projects: A guide to the ML Engineering Loop

Approaches

Your Client Engagement Program Isn’t Doing What You Think It Is

Understanding Latent Style

Modeling the unseen: How Instacart uses Machine Learning to spot lost demand in its fulfillment chain

Learning Market Dynamics for Optimal Pricing

Building Lyft’s Marketing Automation Platform

Data Science and the Art of Producing Entertainment at Netflix

Are Data Catalogs Curing the Symptom or the Disease?