Cleaning / Tidying / Munging / Wrangling

Tidy Data

The principles of tidy data provide a standard way to organize data values within a dataset.

The Quartz guide to bad data

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

The Log: What every software engineer should know about real-time data’s unifying abstraction


Opinionated python

Minimally Sufficient Pandas

In this article, I will offer an opinionated perspective on how to best use the Pandas library for data analysis. My objective is to argue that only a small subset of the library is sufficient to complete nearly all of the data analysis tasks that one will encounter. This minimally sufficient subset of the library will benefit both beginners and professionals using Pandas.

The Little Book of Python Anti-Patterns

What’s the future of the pandas library?

Fast Pandas: A Benchmarked Pandas Cheat Sheet

Loop Better: A Deeper Look at Iteration in Python

A Visual Intro to NumPy and Data Representation

Learn a new pandas trick every day


Jupyter

Bringing the best out of Jupyter Notebooks for Data Science

Reproducible Data Analysis in Jupyter

Jupyter Docker Stacks

Joel Grus’ I Don’t Like Notebooks Slides


Visualization

How to learn D3.js

Fundamentals of Data Visualization (book)


Git

Flight rules for Git

A guide for astronauts (now, programmers using Git) about what to do when things go wrong.

Learn git concepts, not commands

Git Immersion

A guided tour that walks through the fundamentals of Git, inspired by the premise that to know a thing is to do it.

How to Write a Git Commit Message


SQL

Learning SQL 201: Optimizing Queries, Regardless of Platform

postgresql: Don’t Do This

Analyzing 89 Responses to a SQL Screener Question for a Senior Data Analyst Position

SQL Window Functions to Pass a Data Analytics Interview

The Most Underutilized Function in SQL (md5)


Data Build Tool (dbt)

Build Your Data Analytics Like An Engineer (podcast)

dbt Best Practice Guide

On the limits of incrementality

State of testing in dbt


Data Warehouses

How we configure Snowflake

How Compatible are Redshift and Snowflake’s SQL Syntaxes?

The R.A.G (Redshift Analyst Guide)


Looker

How to Design Your Looker Explores

Is Looker the Right Business Intelligence Tool for My Company?

How do you decide what to model in dbt vs LookML?