Recent Posts

Big ideas from the 2023 Causal Data Science Meeting

Highlights and links to select talks

Industry information management for causal inference

Proactive collection of data to comply or confront assumptions

Crosspost: The Art of Abstraction in ETL

Rounding out my three-part ETL series form Airbyte’s developer blog

The Art of Abstraction in ETL: Dodging Data Extraction Errors

Cross-post from guest post on Airbyte’s developer blog

Goin' to Carolina in my mind (or on my hard drive)

Out-of-memory processing of North Carolina’s voter file with DuckDB and Apache Arrow


Scaling Personalized Volunteer Emails

An overview of the data stack used to automate over 50,000 personalized emails to voter turnout volunteers using BigQuery, dbt, Census, and MailChimp

Causal Design Patterns

An overview of basic research design patterns in causal inference, modern extensions, and data management strategies to set up a causal inference initiative for success

Evaluation without Experimentation

An introduction to inverse propensity of treatment weighting for program evaluation with applications to Two Million Texans’ relational organizing campaign during the 2022 midterms

Taking Flight with Shiny: a Modules-First Approach

An argument for the individual and organization-wiide benefits of teaching new developers Shiny with a modules-first paradigm

Data (error) Generating Process

Interrogating the data generating process to devise better data quality tests.




dbt package bringing dplyr semantics to SQL


R package for managing controlled vocabularies

satRday Chicago Conference Organizer

Speaker & Sponsor lead for 2019 and 2020


Hackathon-in-a-box templates for custom Rmd and ggplot2 themes


R package providing project management interface to GitHub


97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts

Contributed six chapters on tops ranging from data design, development, validation, and democratization

R Markdown Cookbook

This cookbook contains tips and tricks to help you get the most out of R Markdown. Topics include the automated generation of content (diagrams, text), customizing format (Pandoc, HTML, and LaTeX templates), workflow improvements (modularizing child documents, cross-referencing code chunks, chunk caching), modifying rendering behavior with hooks, and using alternative language engines.