Health Gym: Medical Datasets for Training Machine Learning Algorithms
In the past decade, machine learning algorithms have continued to push the state-of-the-art performances on various disciplines including computer vision (CV) and natural language processing (NLP). Three factors have played major roles in supporting the continual development in machine learning — they are the theoretical advances in network architecture, the improvements in computer hardwares, and the increasing accessibility to benchmark datasets.
Data is, however, hard to come by in the healthcare domain due to privacy concerns. To this end, the Health Gym initiative aims to introduce highly realistic synthetic medical datasets that can be freely distributed.
Three datasets will be released with the first version of the Health Gym project — including severe hypotension, sepsis, and HIV. These datasets will contain sufficient information for reinforcement learning practitioners to create intelligent agents for the management of illness and for the optimisation of regimen selection.
We hope that the Health Gym will enable the community to develop more powerful models for healthcare, and hasten the integration of machine learning in the medical domain.
Edited 1st-Feb-2022
CBDRH, the University of New South Wales
Nicholas Kuo