Data Analysis

Framing and Approaches

Framing

Tukey, Design Thinking, and Better Questions

In my view, the most useful thing a data scientist can do is to devote serious effort towards improving the quality and sharpness of the question being asked.

Oversimplify

Trustworthy Data Analysis

It’s entirely possible to trust an analysis but not believe the final conclusions.

20 Questions to Ask Prior to Starting Data Analysis

Data-Informed Product Building

Our goal is to give you an understanding of how a product evolves from infancy to maturity; a holistic sense of the product metric ecosystem of growth, engagement and monetization; a framework to define goals for your company; and a toolkit you can use to analyze your product’s performance against those goals.

Model Tuning and the Bias-Variance Tradeoff

Uncertainty + Visualization, Explained

Uncertainty + Visualization, Explained (Part 2: Continuous Encodings)

10 Reads for Data Scientists Getting Started with Business Models

Approaches

Conversion rates – you are (most likely) computing them wrong

Modeling conversion rates and saving millions of dollars using Kaplan-Meier and gamma distributions

You’re all calculating churn rates wrong

Markov Chains Explained Visually

A Markov chain tells you the probability of hopping, or “transitioning,” from one state to any other state

Forecasting at scale

ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus

Using Causal Inference to Improve the Uber User Experience

The Power User Curve: The best way to understand your most engaged users

Forecasting: Principles and Practice (book)

What to do when your metrics dip


Experimentation

Framing, Approaches, Significance Pitfalls

Framing

The Engineering Problem of A/B Testing

Leaky Abstractions In Online Experimentation Platforms

Common statistical tests are linear models (or: how to teach stats)

How Not To Run an A/B Test

North Star or sign post metrics: which should one optimize?

Misadventures in experiments for growth

The Agony and Ecstasy of Building with Data

Against A/B Tests

Is Bayesian A/B Testing Immune to Peeking? Not Exactly

The Little Handbook of Statistical Practice

Approaches

Guidelines for A/B Testing

How Etsy Handles Peeking in A/B Testing

AB Testing 101: What I wish I knew about AB testing when I started my career

Suffering from a Non-inferiority Complex?

Analyzing Experiment Outcomes: Beyond Average Treatment Effects

Mediation Modeling at Uber: Understanding Why Product Changes Work (and Don’t Work)

Mediation modeling goes beyond simple cause and effect relationships in an attempt to understand what underlying mechanisms led to a result.

Experimentation & Measurement for Search Engine Optimization

Statistical Significance Pitfalls

The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time

Our key point here is that it is possible to have multiple potential comparisons, in the sense of a data analysis whose details are highly contingent on data, without the researcher performing any conscious procedure of fishing or examining multiple p-values.

Statistical Paradises and Paradoxes in Big Data (pdf)

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

Why Most Published Research Findings Are False

The Control Group is Out of Control

Beware The Man of One Study

Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.