Framing

Stevey’s Google Platforms Rant

Start with a platform, then use it for everything…

Data architecture vs backend architecture

I’ve found it useful to push as much as you can out of the backend into the data platform.

A reference guide for fintech & small-data engineering

In my time leading engineering teams at Square and Gusto I’ve found that this big-data approach to software engineering is a poor fit for many product companies. Rather, product scalability problems are along a different axis: Sprawling domains and massive schemas implementing those domains.

4 Pillars of Analytics

Acquire, Process, Surface, Act.

Research quality data and research quality databases

Creating research quality data is the way that you refine and structure data to make it conducive to doing science.

Data Systems Tend Towards Production

There are many reasons to worry about non-data team members writing logic in SQL and making dbt PRs. What I can guarantee — this logic will be written, and if the data team gatekeeps, it will be written outside of their visibility. If a data team can educate and encourage contributions to their codebase, they invite code to be written where it most belongs.

Choose Boring Technology

I chose “boring technology” as the pithy SEO-friendly title for this content, and I regret it most days. It’s kind of distracting. “Boring sounds bad, why is he saying it’s good?” Et cetera. It’s a real shitshow.

But what I’m aiming for there is not technology that’s “boring” the way CSPAN is boring. I mean that it’s boring in the sense that it’s well understood. It’s bad, but you know why it’s bad. You can list all of the main ways it will let you down.

DataOps Principles: How Startups Do Data The Right Way

If it’s someone’s job to handle all data requests by writing a new SQL query or to downloading data from external systems, your team is headed in the wrong direction.

Redefining Ownership

High-quality applications, however, must enable ownership. They must be cognizant of their customers’ ever-expanding needs and plan accordingly so that customers can solve their own problems.

Towards an understanding of technical debt

Scientific Debt

Scientific debt is when a team takes shortcuts in data analysis, experimental practices, and monitoring that could have long-term negative consequences.

Building

Tidy Data

The principles of tidy data provide a standard way to organize data values within a dataset.

The Quartz guide to bad data

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

Scaling Data and Self-Serve Analytics

Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend

Idempotence Now Prevents Pain Later

On the limits of incrementality

Joel Grus’ I Don’t Like Notebooks (talk)

Whom the Gods Would Destroy, They First Give Real-time Analytics

Every few months, I try to talk someone down from building a real-time product analytics system. When I’m lucky, I can get to them early.

I Will Fucking Dropkick You If You Use That Spreadsheet

The ultimate guide to Google Sheets as a reliable data source

The DataOps Cookbook

Making Wrong Code Look Wrong

Review

Strengthening Products and Teams with Technical Design Reviews

How to review an analytics pull request

The Art of Giving and Receiving Code Reviews (Gracefully)

How to Do Code Reviews Like a Human

Unlearning toxic behaviors in a code review culture

Google Engineering Practices Documentation

Managing

Gardening Platforms

A ~200 slide flip book style presentation about how to garden platforms.

My Philosophy on Alerting

The Log: What every software engineer should know about real-time data’s unifying abstraction

The Data Quality Flywheel

Testing Statistical Software

Dynamic Data Testing

Are Data Catalogs Curing the Symptom or the Disease?