Database with cause-effect pairs


This is a growing database with different data for testing causal detection algorithms. The goal here is to distinguish between cause and effect. We searched for data sets with known ground truth. However, we do not guarantee that all provided ground truths are correct. The datafiles are .txt-files and contain two variables, one is the cause and the other the effect. For every example there exists a description file where you can find the ground truth and how the data was derived.

Note that not always the first column is the cause and the second the effect. This is indicated in a meta-data file. Please look at README for further explanations. We also suggest a weighting factor for some pairs which are very similar if you want to calculate the overall performance.

To get all data files at once download all data as a zip file.

Note: pair0001 - pair0041 were taken from the UCI Machine Learning Repository, so if you use these data sets please refer to their webpage. Here you will find their citation policy.

If you have any comments, questions or suggestions for additional data sets, please contact Jakob Zscheischler.

Data Description Scatter plot (PDF)
Older versions

To enable comparison of results, from time to time we release a version of this database. Below is a list of all past releases. The differences between releases are documented in the CHANGELOG file.