Framing
Stevey’s Google Platforms Rant
Start with a platform, then use it for everything…
Data architecture vs backend architecture
I’ve found it useful to push as much as you can out of the backend into the data platform.
A reference guide for fintech & small-data engineering
In my time leading engineering teams at Square and Gusto I’ve found that this big-data approach to software engineering is a poor fit for many product companies. Rather, product scalability problems are along a different axis: Sprawling domains and massive schemas implementing those domains.
Acquire, Process, Surface, Act.
Research quality data and research quality databases
Creating research quality data is the way that you refine and structure data to make it conducive to doing science.
Data Systems Tend Towards Production
There are many reasons to worry about non-data team members writing logic in SQL and making dbt PRs. What I can guarantee — this logic will be written, and if the data team gatekeeps, it will be written outside of their visibility. If a data team can educate and encourage contributions to their codebase, they invite code to be written where it most belongs.
I chose “boring technology” as the pithy SEO-friendly title for this content, and I regret it most days. It’s kind of distracting. “Boring sounds bad, why is he saying it’s good?” Et cetera. It’s a real shitshow.
But what I’m aiming for there is not technology that’s “boring” the way CSPAN is boring. I mean that it’s boring in the sense that it’s well understood. It’s bad, but you know why it’s bad. You can list all of the main ways it will let you down.
DataOps Principles: How Startups Do Data The Right Way
If it’s someone’s job to handle all data requests by writing a new SQL query or to downloading data from external systems, your team is headed in the wrong direction.
High-quality applications, however, must enable ownership. They must be cognizant of their customers’ ever-expanding needs and plan accordingly so that customers can solve their own problems.
Towards an understanding of technical debt
Scientific debt is when a team takes shortcuts in data analysis, experimental practices, and monitoring that could have long-term negative consequences.
Building
The principles of tidy data provide a standard way to organize data values within a dataset.
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
Scaling Data and Self-Serve Analytics
Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend
Idempotence Now Prevents Pain Later
On the limits of incrementality
Joel Grus’ I Don’t Like Notebooks (talk)
Whom the Gods Would Destroy, They First Give Real-time Analytics
Every few months, I try to talk someone down from building a real-time product analytics system. When I’m lucky, I can get to them early.
I Will Fucking Dropkick You If You Use That Spreadsheet
The ultimate guide to Google Sheets as a reliable data source
Review
Strengthening Products and Teams with Technical Design Reviews
How to review an analytics pull request
The Art of Giving and Receiving Code Reviews (Gracefully)
How to Do Code Reviews Like a Human
Unlearning toxic behaviors in a code review culture
Google Engineering Practices Documentation
Managing
A ~200 slide flip book style presentation about how to garden platforms.
The Log: What every software engineer should know about real-time data’s unifying abstraction