Evaluating causal inference methods in a scientifically thorough way is a cumbersome and error-prone task. To foster good scientific practice JustCause provides a framework to easily:
- evaluate your method using common data sets like IHDP, IBM ACIC, and others;
- create synthetic data sets with a generic but standardized approach;
- benchmark your method against several baseline and state-of-the-art methods.
Our cause is to develop a framework that allows you to compare methods for causal inference in a fair and just way. JustCause is a work in progress and new contributors are always welcome.
The reasons for creating a library like JustCause are laid out in the thesis
A Systematic Review of Machine Learning Estimators for Causal Effects
of Maximilian Franz. Therein, it is shown that many publications about causality:
- lack reproducibility,
- use different versions of the seemingly same data set,
- fail to state that some theoretical conditions in the data set are not met,
- miss several state of the art methods in their comparison.
A more standardised approach, as offered by JustCause, is able to improve these points.
Install JustCause with:
pip install justcause
but consider using conda to set up an isolated environment beforehand. This can be done with:
conda env create -f environment.yaml conda activate justcause
with the following
For a minimal example we are going to load the IHDP (Infant Health and Development Program) data set, do a train/test split, apply a basic learner on each replication and display some metrics:
>>> from justcause.data.sets import load_ihdp >>> from justcause.learners import SLearner >>> from justcause.learners.propensity import estimate_propensities >>> from justcause.metrics import pehe_score, mean_absolute >>> from justcause.evaluation import calc_scores >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import LinearRegression >>> import pandas as pd >>> replications = load_ihdp(select_rep=[0, 1, 2]) >>> slearner = SLearner(LinearRegression()) >>> metrics = [pehe_score, mean_absolute] >>> scores =  >>> for rep in replications: >>> train, test = train_test_split(rep, train_size=0.8) >>> p = estimate_propensities(train.np.X, train.np.t) >>> slearner.fit(train.np.X, train.np.t, train.np.y, weights=1/p) >>> pred_ite = slearner.predict_ite(test.np.X, test.np.t, test.np.y) >>> scores.append(calc_scores(test.np.ite, pred_ite, metrics)) >>> pd.DataFrame(scores) pehe_score mean_absolute 0 0.998388 0.149710 1 0.790441 0.119423 2 0.894113 0.151275
- Best Practices
- Contributions & Help
- Module Reference