Data Platforms
Framing and Approaches
Framing
Stevey’s Google Platforms Rant
Start with a platform, then use it for everything…
Data architecture vs backend architecture
I’ve found it useful to push as much as you can out of the backend into the data platform.
A reference guide for fintech & small-data engineering
Data Systems Tend Towards Production
There are many reasons to worry about non-data team members writing logic in SQL and making dbt PRs. What I can guarantee — this logic will be written, and if the data team gatekeeps, it will be written outside of their visibility. If a data team can educate and encourage contributions to their codebase, they invite code to be written where it most belongs.
DataOps Principles: How Startups Do Data The Right Way
If it’s someone’s job to handle all data requests by writing a new SQL query or to downloading data from external systems, your team is headed in the wrong direction.
Start with manual processes — like SQL queries or API pulls — to understand the problem space. Next, automate the repetitive parts and start manually monitoring that automation. Finally, automate the actions taken to correct issues found via monitoring and manually check performance metrics.
Frequent small changes over infrequent large changes
Functional Data Engineering — a modern paradigm for batch data processing
A Beginner’s Guide to Data Engineering
Getting started: the 3 stages of data infrastructure
Research quality data and research quality databases
- Acquire, Process, Surface, Act.
Start with a platform, and then use it for everything…
Whom the Gods Would Destroy, They First Give Real-time Analytics
Scaling Data and Self-Serve Analytics
Idempotence Now Prevents Pain Later
Whoops, the numbers are wrong! Scaling Data Quality @ Netflix (talk)
Emerging Architectures for Modern Data Infrastructure
Approaches
A ~200 slide flip book style presentation about how to garden platforms.
Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend
Consider SQL when writing your next processing pipeline
We’re seeing a general move towards expressing pipelines in plain SQL.
What are the steps / tools in setting up a modern, SaaS-based BI infrastructure?
It’s Time for Open Source Analytics
Beyond Interactive: Notebook Innovation at Netflix
Part 2: Scheduling Notebooks at Netflix
Software Engineering for Machine Learning: A Case Study
The ultimate guide to Google Sheets as a reliable data source
Data Products
Framing and Approaches
Framing
This post discusses our obsession with finding the best model and emphasizes what we should do instead: Take a step back and see the bigger picture…
How to deliver on Machine Learning projects: A guide to the ML Engineering Loop
Approaches
Your Client Engagement Program Isn’t Doing What You Think It Is
Learning Market Dynamics for Optimal Pricing
Building Lyft’s Marketing Automation Platform
Data Science and the Art of Producing Entertainment at Netflix