# First off, Hylode immediately gives us the features we need to train a time series model
from hycastle.icu_store.retro import retro_dataset
= retro_dataset('T03')
train_df train_df.shape
Exemplar
Let’s begin an end-to-end modelling exemplar. The whole pipeline in c. 20 lines. In the previous section, we made some grand claims about Hylode addressing:
~ train-deploy split
~ feature provision
~ time-series modelling
~ model management
~ transition to deployment
Let’s see a concrete example of how this looks.
Retrospective training set. Extraction & pre-processing
For the sake of these notebooks, we’re going to consider the problem of modelling ICU discharge at 48 hours.
This single piece of code has done a lot of work behind the scenes.
It has pulled data from EMAP, processed it where necessary to create features and then cut those features up so we have one row per hour for each patient – setting us up to make live predictions every hour for each patient on the unit.
Let’s have a look at which features we have:
train_df.columns
This is great, but we still have some categorical variables etc. laced in there. What happens if I want to do some pre-processing?
The answer lies in our lens
abstraction, let’s have a look at one I prepared earlier:
from hycastle.lens.icu import BournvilleICUSitRepLens
= BournvilleICUSitRepLens()
lens
= lens.fit_transform(train_df)
X_train X_train.columns
We can see we’ve just done some useful things. The lens’s fit_transform
method has inserted missingness values for ethnicity, and we have broken out each patient’s admissions time into separate features as we think that will improve our model.
We also define a label:
= train_df['discharged_in_48hr'].astype(int) y_train
Training & storing a model
With this “heavy” lifting done, we should now already be in a position to train a model. Let’s have a go:
from sklearn.ensemble import RandomForestClassifier
= RandomForestClassifier(n_jobs=-1)
m m.fit(X_train.values, y_train.values.ravel())
Everything seems to be working. Let’s imagine we’re happy with what we’ve done. Let’s log the model in our model repo, so it’s primed and ready to deploy…
import os
import mlflow
= os.getenv('HYMIND_REPO_TRACKING_URI')
mlflow_server mlflow.set_tracking_uri(mlflow_server)
from datetime import datetime
import random
import string
# Generate a unique experiment name
= "vignette_0-" + "".join( random.sample(string.ascii_lowercase, k=8)) + str(datetime.now().timestamp())
exp_name
"MLFLOW_EXPERIMENT_NAME"] = exp_name
os.environ[= mlflow.create_experiment(exp_name)
experiment_id
experiment_id
with mlflow.start_run():
'model') mlflow.sklearn.log_model(m,
This screenshot shows what it looks like for the model to land safely in MLFlow (which you should be able to see for yourself if you follow the link here – look for a new experiment at the very bottom of the list on the left hand side)
Loading and deploying a model
Now let’s switch hats and imagine they were are trying to deploy the model in silent mode for the patients currently on the ICU. This is now pretty straightforward:
from hycastle.icu_store.live import live_dataset
= live_dataset('T03')
live_df live_df.shape
live_df.columns
= lens.transform(live_df) X_df
= mlflow.search_runs(experiment_ids=[experiment_id])
runs = runs.iloc[0].run_id
run_id run_id
= f'runs:/{run_id}/model'
logged_model = mlflow.sklearn.load_model(logged_model) loaded_model
= loaded_model.predict_proba(X_df.values)
predictions 'prediction'] = predictions[:, 1]
live_df['bed_code', 'prediction']].head() live_df.loc[:, [