Longitudinal health data for machine learning research and education
These realistic synthetic datasets can be downloaded freely, for example to develop offline reinforcement learning algorithms.


Realistic open data created using the latest advances in generative models
Read the research paper on the use of Generative Adversarial Networks (GANs) to create synthetic datasets and their evaluation in terms of accuracy and disclosure risk.
Download the datasets
Select a dataset to download or view documentation.

Sepsis
This dataset comprises vital signs, lab tests, administered fluid boluses and vasopressors for 2,164 patients with sepsis in the intensive care unit.

Acute Hypotension
This dataset comprises vital signs, lab tests, administered fluid boluses and vasopressors for 3,910 patients with acute hypotension in the intensive care unit.

Antiretroviral Therapy in HIV
This dataset comprises viral loads, CD4 counts, and drug regimen information for 8,916 patients with Human Immunodeficiency Virus (HIV).
Datasets Update: Acute Hypotension Our synthetic datasets for acute hypotension is now available on PhysioNet. You may access our PhysioNet project through the following link: https://www.physionet.org/content/synthetic-mimic-iii-health-gym/1.0.0/ Edited 14th-March-2022 CBDRH, the University of New South Wales Nicholas Kuo
Datasets Update: Sepsis Our synthetic dataset for sepsis is now available on PhysioNet. You may access our PhysioNet project through the following link: https://www.physionet.org/content/synthetic-mimic-iii-health-gym/1.0.0/ Edited 14th-March-2022 CBDRH, the University of New South Wales Nicholas Kuo
Health Gym: Medical Datasets for Training Machine Learning Algorithms In the past decade, machine learning algorithms have continued to push the state-of-the-art performances on various disciplines including computer vision (CV) and natural language processing (NLP). Three factors have played major...

Be part of the community
here is how to reach out.
Join our community on GitHub or contact us