olesia

Olesia Badashova

Product-driven ML practitioner

  Dubai, UAE

  olesiabadashova

  CV

Demystifying the Applications of Causal Inference in Industry

By the end of this article, you will be familiar with Causal Inference applications in industry. While numerous books and articles delve into the theory of Causal Inference, this article spotlights its real-world applications, especially those related to gaining additional insights from well-designed experiments. This list is not exhaustive but rather a representation of my experience with most examples from the ride-hailing industry.

Instrumental Variables in A/B tests

Instrumental variable (IV) analysis is a fundamental tool in econometrics and aids in controlling for unobserved variables when determining a causal relationship.

Assumptions

The key assumptions for using IVs are relevance (the instrument has a high correlation with the treatment – endogenous independent variable) and exogeneity (the instrument affects the dependent variable only through the endogenous variable – treatment). Thus, an instrumental variable should affect the treatment assignment without directly impacting the outcome variable.

Algorithm

A common method for estimating the causal effect of the treatment variable is the Two-Stage Least Squares (2SLS).

The IV2SLS method from the Python package, statsmodels, can be used for this process.

Examples

Propensity Score Matching

This method finds its use when random assignment is not viable. It simulates a randomized experiment by ensuring comparability between treatment and control groups.

Assumptions

The key assumption is conditional independence or ignorability, meaning that given the propensity score, the distribution of observed outcomes in the treatment and control groups should be independent of the treatment assignment. It also requires that all relevant confounding variables be included in the propensity score model and there are no latent variables that affect treatment assignment. It also requires a large sample size for accurate matching.

Algorithm

For the simplest model, Scikit-Learn will do the job.

Examples

Difference-in-Differences

DiD is often the go-to when randomized experiments are unfeasible, unethical, or too costly. Data is collected pre and post-treatment from both the treated and control groups. The method calculates the effect of the treatment as the difference in the average change in outcome over time between these groups.

Assumptions

The DiD method rests on the parallel trends assumption. Without treatment, the average outcomes for the treated and control groups would have followed the same trend over time. Another crucial assumption is that exposure is exogenous and other factors that might be related to the outcome don’t influence the treatment assignment.

Algorithm

Examples

Want to dive deeper?

So, each method has its assumptions and is best suited to different scenarios. As with all statistical methods, it’s crucial to understand these assumptions and carefully check whether they hold in your particular situation. In practice, it’s often beneficial to use multiple methods and see if they provide consistent results, which can increase confidence in your findings.

The methods presented in the article are pretty basic and have more advantaged versions, so it’s worth exploring them further. The field of causal inference is relatively new and evolving quite fast. You can follow the progress through conferences like KDD and CLeaR.

There are also great resources to dive deeper into applications of CausalML:

Go home