Part 1: Introduction to Difference-in-Differences (DiD): A Guide for Data Scientists

Learn how to use DiD to estimate causal effects when A/B testing isn’t possible — with a focus on intuition, assumptions, and real-world applications.

and

Jun 12, 2025

👋 Hey! This is Manisha Arora from PrepVector. Welcome to the Tech Growth Series, a newsletter that aims to bridge the gap between academic knowledge and practical aspects of data science. My goal is to simplify complicated data concepts, share my perspectives on the latest trends, and share my learnings from building and leading data teams.

About the Authors:

Manisha Arora: Manisha is a Data Science Lead at Google Ads, where she leads the Measurement & Incrementality vertical across Search, YouTube, and Shopping. She has 12+ years of experience in enabling data-driven decision-making for product growth.

Banani Mohapatra: Banani is a seasoned data science product leader with over 12 years of experience across e-commerce, payments, and real estate domains. She currently leads a data science team for Walmart’s subscription product, driving growth while supporting fraud prevention and payment optimization. She is known for translating complex data into strategic solutions that accelerate business outcomes.

Many causal questions in product and policy analysis boil down to this: What would have happened in the absence of the intervention? When randomized experiments aren't feasible — due to ethics, cost, or rollout constraints — data scientists turn to quasi-experimental methods to estimate counterfactuals. One of the most widely used and theoretically grounded approaches in this space is Difference-in-Differences (DiD).

In our ongoing series on causal inference with Banani Mohapatra, we’ve explored foundational techniques rooted in the Rubin Causal Model - specifically, Propensity Score Matching (PSM) and Inverse Propensity Weighting (IPW). While these methods are powerful for adjusting for observed differences between treated and untreated individuals, they are best suited for ‘static interventions’ with clearly defined treatment and control groups.

However, in real-world scenarios:

Interventions may roll out gradually over time (e.g., feature releases by geography or user segment), which makes it hard to pinpoint a single “before” and “after” or apply traditional matching.
Sometimes, a proper untreated control group doesn’t exist. For example, if a company rolls out a change to all users, there’s no obvious group to compare to. In such cases, simple pre-post analysis is tempting - but unreliable, because it doesn’t account for broader trends or seasonality that might also affect outcomes.

This is where Difference-in-Differences (DiD) comes into play.

Intuition Behind Difference-in-Differences (DiD)

Difference-in-Differences (DiD) is a foundational causal inference method used to estimate the impact of an intervention by comparing changes over time between a treated group and a control group.

DiD compares the pre-post change in outcomes for a treated group to the pre-post change for a comparison group that did not receive the treatment. Unlike simple before-and-after comparisons, which are prone to bias from underlying trends or seasonality, DiD introduces a temporal dimension - helping isolate the effect of the intervention itself.

This method is especially relevant in tech settings where features, policies, or experiments are gradually rolled out, or applied to a subset of users or regions.

Some real-world scenarios include:

A new onboarding experience launched in one country
A pricing experiment enabled for a specific user segment
A policy change impacting only a subset of advertisers

In these cases, teams often default to naive pre-post comparisons, which lack a valid counterfactual and can lead to over- or underestimation of impact.

While it shares the same goal as PSM or IPW - controlling for bias and estimating treatment effects - DiD excels in scenarios involving time-based interventions.

Evolution of DID

DID methodology has its roots in econometrics and has evolved through a series of influential studies that laid the groundwork for how it is used today in real world scenarios.

Image created by author

The journey of DiD began in 1985, when Ashenfelter & Card first used it to assess the impact of job training programs on earnings, marking one of the earliest examples of applying DiD in non-randomized labor studies. This was followed by a landmark study in 1994 by Card & Krueger, who analyzed the effect of minimum wage increases using neighboring states as treatment and control groups. Their work popularized DiD for evaluating public policy and challenging economic assumptions.

In 2004, Bertrand, Duflo & Mullainathan highlighted key issues with DiD when applied to time-series data, such as serial correlation, and recommended more robust error handling practices. By the 2010s, DiD had become mainstream in sectors like healthcare, education, and marketing, especially useful for evaluating policy rollouts and large-scale interventions.

Goodman-Bacon (2015) advanced the field by analyzing how bias can arise in DiD models with staggered treatment timing. Then in 2020, Callaway & Sant’Anna introduced a flexible, non-parametric estimator to better handle heterogeneous treatment effects in these staggered settings.

From 2021 onward, DiD has been integrated with machine learning tools like EconML and CausalML to enhance causal analysis in complex business environments. Today, it plays a crucial role in evaluating GenAI rollouts, fraud detection, and experimentation where traditional A/B testing isn’t feasible.

Shameless plugs:

Product Data Science
Master Product Sense and AB Testing, and learn to use statistical methods to drive product growth. I focus on inculcating a problem-solving mindset, and application of data-driven strategies, including A/B Testing, ML, and Causal Inference, to drive product growth.

Check Out Product DS Course

AI/ML Projects for Data Professionals
Gain hands-on experience and build a portfolio of industry AI/ML projects. Scope ML Projects, get stakeholder buy-in, and execute the workflow from data exploration to model deployment. You will learn to use coding best practices to solve end-to-end AI and ML Projects to showcase to the employer or clients.

Why DiD matter?

Now that we understand the intuition and evolution of DiD, let’s explore why this method truly matters for real-world decision-making.

When A/B Tests Aren’t Feasible

DiD serves as a robust quasi-experimental alternative when RCTs aren’t viable - such as during global feature launches. By leveraging temporal and cross-sectional variation, DiD helps estimate counterfactual outcomes without explicit randomization, making it invaluable for high-stakes decisions.

Controls for Time-Based Biases

Unlike naive before-after comparisons, DiD accounts for background trends, seasonality, and macroeconomic shifts - leading to more reliable estimates.

Supports Real-World Rollouts
Many product launches happen in phases (by region, user cohort, or platform). DiD is well-suited to analyze such staggered interventions.
Quantifies Business Impact Beyond Conversion
DiD can estimate lift in metrics like revenue per user, engagement, and retention - critical for assessing true business value, not just clicks or sign-ups.
Enhances Experimentation Strategy
DiD complements other methods like PSM and IPW, enabling teams to extract causal insights from observational data when experimentation bandwidth is limited

Assumptions of DID

Before applying the DID method, it’s essential to understand its core assumptions. These ensure that any observed effects are truly due to the intervention, not unrelated trends or external factors. Skipping this step can lead to misleading conclusions. Let’s briefly review the key assumptions that make DiD results valid and trustworthy.

Parallel Trends Assumption

This is the most important idea behind DiD.

In the absence of treatment, the treated and control groups would have experienced the same average change in outcomes over time.

In other words, any difference we see after the treatment can be credited to the treatment itself, because before the treatment, both groups were moving similarly. This assumption helps ensure that we are measuring the real impact of the intervention, not just random differences between the groups.

Imagine a new feature like free shipping is launched in a group of stores in March, while another set of similar stores doesn’t receive the feature until later. If average daily revenue in both sets of stores was increasing at the same rate before March, we can assume they were on a “parallel trend.” Any post-March revenue difference can then reasonably be attributed to the feature rollout, assuming other conditions are met.

No Simultaneous Interventions

DiD assumes no other major changes affect only the treated group during the study period.

For instance, if a new recommendation feature is launched in certain cities, but those cities also receive a marketing push or app redesign at the same time, it’s hard to isolate the feature’s true impact. Overlapping changes can confound results, so the treatment should be the primary difference during the analysis window.

Stable Unit Treatment Value Assumption (SUTVA)

SUTVA assumes that the treatment status of one unit doesn’t affect the outcome of another - this means no spillover or interference across units. For instance, if a store in one city receives a pricing change, that change shouldn’t influence consumer behavior in a nearby untreated city.

If spillovers/network effects occur, the control group is no longer a valid comparison, violating the integrity of the DiD design.

Common Impact of External Factors

DiD assumes that external changes over time - like seasonal trends, economic shifts, or product-wide outages - affect both the treated and control groups in a similar way.

If an unexpected event (e.g., a system-wide performance bug) affects only the region where the new feature is launched, it becomes difficult to separate the feature’s impact from that event. For example, if the treated group experiences a major mobile app crash during the rollout while the control group does not, this asymmetry may bias the results. Ensuring comparable exposure to time-based shocks is key to a clean analysis.

Consistency and Correct Grouping

Accurate classification of who received the treatment and when is fundamental to a valid DiD analysis.

If users or regions are misclassified - either marked as treated when they were not, or if the timing of the intervention is incorrectly logged - the estimated impact may be misleading. This becomes especially challenging in staggered rollouts, where a feature like free shipping is enabled across different stores or zip codes at different times. For instance, if one store starts offering free shipping in March but is mistakenly labeled as treated starting in February, the model may attribute unrelated revenue changes to the feature, distorting the true effect. Defining clear pre- and post-treatment periods, and consistently tracking rollout timelines, is essential for maintaining analytical integrity.

No Anticipation Effects

This assumption posits that units do not alter their behavior in anticipation of the treatment.

For instance, if users begin purchasing more right before a free shipping policy is implemented (expecting the benefit), the observed effect post-launch might be underestimated. Testing for anticipation through pre-trend analysis and excluding data immediately before treatment can help mitigate this issue.

In the next blog, we’ll dive into the methodology of DiD and bring its core assumptions and intuition to life through a practical case study.

Upcoming Courses:

Product Data Science
Master Product Sense and AB Testing, and learn to use statistical methods to drive product growth. I focus on inculcating a problem-solving mindset, and application of data-driven strategies, including A/B Testing, ML, and Causal Inference, to drive product growth.

Check Out Product DS Course

AI/ML Projects for Data Professionals
Gain hands-on experience and build a portfolio of industry AI/ML projects. Scope ML Projects, get stakeholder buy-in, and execute the workflow from data exploration to model deployment. You will learn to use coding best practices to solve end-to-end AI/ML Projects to showcase to the employer or clients.

Check Out AI/ML Projects Course

Not sure which course aligns with your goals? Send me a message on LinkedIn with your background and aspirations, and I'll help you find the best fit for your journey.

A guest post by

Banani Mohapatra

Data science leader at Walmart with 12+ years in e-commerce, payments, and real estate. 100+ experiments run across growth, funnels, and pricing.

Part 1: Introduction to Difference-in-Differences (DiD): A Guide for Data Scientists

Learn how to use DiD to estimate causal effects when A/B testing isn’t possible — with a focus on intuition, assumptions, and real-world applications.

About the Authors:

Discussion about this post