Multimodal machine learning

Are you aware of the distinction between machine learning and multimodal machine learning and how it applies to healthcare?
The Consumer Technology Association defines Machine Learning (ML) as an enabling technology of AI that provides systems, using algorithms, data, tools, and techniques; the ability to learn and change without providing/programming an explicit mathematic model for mapping input to output. Tadas Baltrušaitis et al from Cornell University describe that multimodal machine learning on the other hand aims to build models that can process and relate information from multiple modalities modalities, including sounds and languages that we hear, visual messages and objects that we see, textures that we feel, flavors that we taste and odors that we smell. An example of multimodal machine learning in relation to a Knee Osteoarthritis patient might include raw radiographic data, clinical exam results and their medical history.
Senior data scientist at Booz Allen Hamilton, Catherine Ordun, warns US Federal Health Agencies such as National Institutes of Health, the Centers for Disease Control and Prevention, the Defense Health Agency, the Food and Drug Administration to be aware of common unimodal thinking present in certain algorithms and to consider the value of taking multimodel machine learning approach.
“It is critical to understand that not every AI use case, especially in healthcare, can be whittled down to a single type of data. Healthcare is an extraordinarily complex domain that requires information from many sources. Unimodal algorithms that may have impressive results in a computer lab often flounder when exposed to real-world health data and use cases. This is because unimodal AI is typically limited in its ability to be effective or to “generalize” across a broad range of inputs and applications. Humans, with our innate intelligence, generalize with great ease. We can recognize a cat, for example, regardless of how it may be portrayed in a picture or which breed of cat we are looking at. Conversely, AI algorithms struggle with generalizing because they are typically designed and trained to classify only certain types of data. As health agencies adopt AI for applications such as precision medicine, population health, and outcomes evaluation, they should consider aggregating data from multiple sources, such as time series data from Fitbits, textual data from social media, image data from MRIs and X-rays, and columnar data from lab results. Triangulating multiple modes of data produces better results in the form of improved accuracy, more generalizable algorithms, and better insights.”
Ordun underlines as well that today trained health professionals access and review multiple data sources to make their diagnosis or treatment recommendations. If AI is supposed to support them shouldn’t it be done with a multimodal approach?

Leave a Reply