# Quickstart

Get started on using the Python API to download the Health Gym datasets, implement your first offline reinforcement learning algorithm in the tutorial, or give us feedback on how we can improve.

- Setup
- Tutorial
- Support

## Setup

Health Gym data can be downloaded either directly from this website or using the Python API.

Note: This section is still in progress, an API will be available soon

Install the health gym python package through pip by running the following in your terminal.

` ````
```pip install healthgym

To download and access the data from within python.

` ````
```import healthgym as hg
hypotension_data = hg.datasets.Hypotension(root: 'path/to/data/', download = True)

All datasets support the following parameters.

` ````
```root (string)
# Root directory of dataset where dataset exists or will be saved to if download is set to True.
download (bool, optional)
# If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

## Example

This tutorial illustrates a simple reinforcement learning approach to optimise the management of acutely hypotensive patients in the intensive care unit (ICU). The complete Jupyter notebook can be found on Github.

Let’s start with the necessary imports

` ````
```import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.cross_decomposition import PLSCanonical
from sklearn.cluster import KMeans

` ````
```df = pd.read_pickle('data_fake.pkl')
num_patients = len(df) // 48
# Divide state and action variables
df_state = df.drop(['fluid_boluses', 'vasopressors'], axis='columns')
df_action = df[['fluid_boluses', 'vasopressors']]

` ````
```# Dummify action columns (fluid_boluses, vasopressors)
df_action = pd.get_dummies(df_action, prefix='fluid_boluses', columns=['fluid_boluses'])
df_action = pd.get_dummies(df_action, prefix='vasopressors', columns=['vasopressors'])
# Partial least squares regression
plsca = PLSCanonical(n_components = 5)
X = df_state.astype(float).values
Y = df_action.astype(float).values
X_norm = (X-X.mean(axis=0))/(X.std(axis=0))
Y_norm = (Y-Y.mean(axis=0))/(Y.std(axis=0))
X_canonical, Y_canonical = plsca.fit_transform(X_norm, Y_norm)

` ````
```num_states = 100
km = KMeans(n_clusters = num_states, n_init = 10, random_state = 123, verbose = True)
state_number = km.fit(X_canonical).labels_

` ````
```df['hour'] = np.tile(range(48), num_patients)
df['fluid_boluses'] = df['fluid_boluses'].replace({'0': 0, '250': 1, '500': 2, '1000': 3})
df['vasopressors'] = df['vasopressors'].replace({'0.0': 0, '1e-06': 1, '8.4': 2, '20.28': 3})
df['action_number'] = 4*df['fluid_boluses'] + df['vasopressors']
df['state_number'] = state_number
df['state_number_tp1'] = df['state_number'].shift(-1)

` ````
```def reward_function(row):
if row.MAP > 65: # 0 at >65
reward = 0
elif row.MAP > 60: # -0.05 at 60 and 0 at 65
reward = -0.05 * (65 - row.MAP) / 5
elif row.MAP > 55: # -0.15 at 55 and -0.05 at 60
reward = -0.10 * (60 - row.MAP) / 5 - 0.05
else: # -1 at 40 and -0.15 at 55
reward = -0.85 * (55 - row.MAP) / 15 - 0.15
if row.urine > 30 and row.MAP > 55:
reward = 0
return reward
df['reward'] = df.apply(lambda row: reward_function(row), axis=1)
# shift one up so that reward is in the same row of action
df['reward'] = df['reward'].shift(-1)
# ignore last hour since we don't observe the reward of action
df = df[df['hour'] < 47]
df['state_number_tp1'] = df['state_number_tp1'].astype(int)

` ````
```# Q learning
Q = np.full((100, 16), np.nan, dtype='float') # 100 states, 16 actions
# Set to 0 if state-action combination has actually been observed in the data
for index, row in df.iterrows():
Q[row['state_number'], row['action_number']] = 0
num_iterations = 100
step_size = 0.1
diff_tracker = np.zeros((num_iterations, 1))
for Q_iter in range(num_iterations):
Q_old = Q.copy()
for index, row in df.iterrows():
# Discount factor gamma = 1
Q[row['state_number'], row['action_number']] += step_size * (row['reward'] + np.nanmax(Q[row['state_number_tp1'], :]) - Q[row['state_number'], row['action_number']])
diff_tracker[Q_iter] = np.nanmean(np.abs(Q-Q_old))
print([Q_iter, diff_tracker[Q_iter]])

` ````
```# Evaluate
Q_RL = 0
Q_clinician = 0
Q_random = 0
for index, row in df.iterrows():
if row['hour'] == 0:
Q_RL += np.nanmax(Q[row['state_number'], :])
Q_clinician += Q[row['state_number'], row['action_number']]
# random policy
h = Q[row['state_number'], :]
h = h[~np.isnan(h)]
Q_random += h[np.random.choice(h.shape[0], 1)][0]
Q_RL = Q_RL / num_patients
Q_clinician = Q_clinician / num_patients
Q_random = Q_random / num_patients
sns.scatterplot(x=['RL', 'Clinician', 'Random'], y=[Q_RL, Q_clinician, Q_random], markers='s', s=100)
plt.ylabel('Expected policy value')
plt.show()

` ````
```df_boxplot_RF = df[['MAP']]
df_boxplot_RF['action_number'] = 0
for index, row in df.iterrows():
df_boxplot_RF.at[index, 'action_number'] = np.nanargmax(Q[row['state_number'], :])
df_boxplot_RF['agent'] = 'RF'
df_boxplot_clinician = df[['MAP', 'action_number']]
df_boxplot_clinician['agent'] = 'Clinician'
df_boxplot = pd.concat([df_boxplot_RF, df_boxplot_clinician], ignore_index=True, sort=False)
fluid_boluses_dict = {
**dict.fromkeys([0, 1, 2, 3], '[0, 250)'),
**dict.fromkeys([4, 5, 6, 7], '[250, 500)'),
**dict.fromkeys([8, 9, 10, 11], '[500, 1000)'),
**dict.fromkeys([12, 13, 14, 15], '>= 1000')
}
vasopressors_dict = {
**dict.fromkeys([0, 4, 8, 12], '0'),
**dict.fromkeys([1, 5, 9, 13], '(0, 8.4)'),
**dict.fromkeys([2, 6, 10, 14], '[8.4, 20.28)'),
**dict.fromkeys([3, 7, 11, 15], '>= 20.28')
}
df_boxplot['fluid_boluses'] = df_boxplot['action_number'].replace(fluid_boluses_dict)
df_boxplot['vasopressors'] = df_boxplot['action_number'].replace(vasopressors_dict)
sns.boxplot(y='MAP', x='fluid_boluses', data=df_boxplot, palette="colorblind", hue='agent')
plt.show()
sns.boxplot(y='MAP', x='vasopressors', data=df_boxplot, palette="colorblind", hue='agent', order=['0', '(0, 8.4)', '[8.4, 20.28)', '>= 20.28'])
plt.show()

If you are a machine learning expert or have access to longitudinal health care data suitable for reinforcement learning, please consider contributing to the Health Gym project.

#### References[+]

1 | https://scikit-learn.org/stable/modules/cross_decomposition.html |
---|---|

2 | https://arxiv.org/abs/2107.04491v1 |

3 | https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html |

4 | https://arxiv.org/abs/2002.03478 |

5 | https://arxiv.org/abs/1812.02900 |

6 | https://en.wikipedia.org/wiki/Q-learning |

7 | https://arxiv.org/abs/2006.04779 |

8 | https://arxiv.org/abs/2106.06860 |

9 | https://github.com/DLR-RM/stable-baselines3 |

## Support

We welcome any feedback or suggestions for improving the Health Gym data and software

### Submit a Pull Request or a New Example

If you are a machine learning expert or have access to longitudinal health care data suitable for reinforcement learning, please consider contributing to the Health Gym project.