Longitudinal health data for machine learning research and education
These realistic synthetic datasets can be downloaded freely, for example to develop offline reinforcement learning algorithms.


Realistic open data created using the latest advances in generative models
Read the research paper on the use of Generative Adversarial Networks (GANs) to create synthetic datasets and their evaluation in terms of accuracy and disclosure risk.
Download the datasets
Select a dataset to download or view documentation.

Sepsis
This dataset comprises vital signs, lab tests, administered fluid boluses and vasopressors for 2,164 patients with sepsis in the intensive care unit.

Acute Hypotension
This dataset comprises vital signs, lab tests, administered fluid boluses and vasopressors for 3,910 patients with acute hypotension in the intensive care unit.

Antiretroviral Therapy in HIV
This dataset comprises viral loads, CD4 counts, and drug regimen information for 8,916 patients with Human Immunodeficiency Virus (HIV).
Dear viewers, Please consider Version 2.0 of our synthetic ART for HIV dataset. The latest version is much more realistic than the prior version, especially regarding class imbalanceness and utility for training RL algorithms.Interested viewers can also refer to our arXiv...
Datasets Update: Acute Hypotension Our synthetic datasets for acute hypotension is now available on PhysioNet. You may access our PhysioNet project through the following link: https://www.physionet.org/content/synthetic-mimic-iii-health-gym/1.0.0/ Edited 14th-March-2022 CBDRH, the University of New South Wales Nicholas Kuo
Datasets Update: Sepsis Our synthetic dataset for sepsis is now available on PhysioNet. You may access our PhysioNet project through the following link: https://www.physionet.org/content/synthetic-mimic-iii-health-gym/1.0.0/ Edited 14th-March-2022 CBDRH, the University of New South Wales Nicholas Kuo

Be part of the community
here is how to reach out.
Join our community on GitHub or contact us