This is a growing database with different data for testing causal detection algorithms. The goal here is to distinguish between cause and effect. We searched for data sets with known ground truth. However, we do not guarantee that all provided ground truths are correct. The datafiles are .txt-files and contain two variables, one is the cause and the other the effect. For every example there exists a description file where you can find the ground truth and how the data was derived.
Note that not always the first column is the cause and the second the effect. This is indicated in a meta-data file. Please look at README for further explanations. We also suggest a weighting factor for some pairs which are very similar if you want to calculate the overall performance.
To get all data files at once download all data as a zip file.
When you use this data set in a publication, please cite the following paper (which
also contains much more detailed information regarding this data set in the supplement):
J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, B. Schoelkopf:
"Distinguishing cause from effect using observational data: methods and benchmarks",
Journal of Machine Learning Research 17(32):1-102, 2016
Note: pair0001 - pair0041 were taken from the UCI Machine Learning Repository, so if you use these data sets please refer to their webpage. Here you will find their citation policy.
If you have any comments, questions or suggestions for additional data sets, please contact Dominik Janzing.
To enable comparison of results, from time to time we release a version of this database. Below is a list of all past releases. The differences between releases are documented in the CHANGELOG file.