My caption 😄

Hemorrhage diagnosis

Hemorrhage - never heard that word before? Sadly, hemorrhage - or informally referred to as internal bleeding - is the most frequent complication of major surgery and the most frequent cause of a trauma death. So at least if your work is related to medical machine learning, you (sadly) should know this term.

With current medical practice, hemorrhage is very difficult to detect until a profound blood loss has already occurred and the patient is in a serious clinical state. This is due to interrelated reflexes of the human body that try to sustain life (and are at the same time doing well in hiding obvious hints for clinicians). That’s why my collaborators and I tried to tackle this problem.

To monitor the subjects, we obtained waveform (i.e. high-frequently sampled) vital sign data, such as various blood pressures, oxygen saturations, EKG or airway pressure. Given this data, we tried to detect whether a subject is bleeding (internally) or not and ideally predict the onset of the bleeding as quickly as possible, so that a clinician can intervene.

For modeling this complex type of data, we leveraged and tweaked sequential deep learning models, specifically recurrent (GRU and LSTM) and convolutional (dilated, causal) neural networks. While recurrent neural networks - if built correctly - are the go-to model for sequential learning problems and are capable of understanding very long interdependencies in the data, convolutional networks were recently used in various papers, in particular, because they are computationally beneficial (they can be parallelized along the sequential dimension).

What we initially thought to be a well-defined, straight forward task, turned out to be very, very difficult. Why? - Two main reasons: First, we only had 16 (one six) subjects worth of data for the bleeding rate we were interested in, i.e. 16 onsets of a bleeding. Anyone who has worked in deep learning before knows that this sounds nearly impossible to generalize from. Second, we tried to achieve performance results that are operationally relevant, i.e. build a classifier that is useful in a real clinical context. It’s fantastic if your classifier has an AUC > 0.9, but still, it can be completely useless if you can’t achieve a good true-positive at a small false-alarm rate (likewise for negatives). This is mandatory, because otherwise, clinicians won’t trust your system in daily practice.

So what have we achieved for now? - Well, compared to a very strong baseline from colleagues in our lab who worked on the problem for many years that used a random forest trained on manually extracted statistical and medical features, our models perform better in terms of true-positive rate at a clinically relevant (very small) false-positive rate. However, looking at the true-negatives in the small false-negative range, our model is doing poorly. This result is quite unsatisfying and that is why we are still trying to improve it, yet, it is promising at the same time: Note that our neural networks learned from raw data only while the baseline required manually extracted features based on domain knowledge. This property is desirable for multiple reasons, in particular for generalizing and transferring the learned models to other populations (e.g. of other hospitals), and simplifies the pipeline heavily.

We submitted our results to the ML4H workshop at NeurIPS 2018 and were happy to be accepted! We were selected both for presenting a poster as well as give a spotlight talk in front of an expert audience of this most prestigious ML conference. This spotlight presentation, which only 6% of all submissions were assigned, was a truly exciting event for me! I received a ton of valuable feedback and ideas on our work. Besides, I spoke to many researchers in the field and even had the chance to query Jürgen Schmidhuber, a “superstar” in the community, on some research ideas I have on RNNs - what a fantastic time in Montreal!

All this work would not have been possible without fantastic advisors and collaborators. Working with Artur Dubrawski at Carnegie Mellon University, who was my thesis supervisor during a research visit there when I started the project, and Michael R. Pinsky from the School of Medicine at University of Pittsburgh is an absolute privilege. I can’t emphasize enough how their insights significantly shaped the work and also me as a person in my young career of becoming a researcher. I am very thankful for collaborating with them!