Artificial intelligence is increasingly used to integrate and analyze multiple types of data formats, such as text, images, audio and video. One challenge slowing advances in multimodal AI, however, is the process of choosing the algorithmic method best aligned to the specific task an AI system needs to perform.
Scientists have developed a unified view of AI methods aimed at systemizing this process. The Journal of Machine Learning Research published the new framework for deriving algorithms, developed by physicists at Emory University.
“We found that many of today’s most successful AI methods boil down to a single, simple idea — compress multiple kinds of data just enough to keep the pieces that truly predict what you need,” says Ilya Nemenman, Emory professor of physics and senior author of the paper. “This gives us a kind of ‘periodic table’ of AI methods. Different methods fall into different cells, based on which information a method’s loss function retains or discards.”
An AI system’s loss function is a mathematical equation that measures the error rate of the model’s predictions. During training of an AI model, the goal is to minimize its loss by adjusting the model’s parameters, using the error rate as a guide for improvement.
“People have devised hundreds of different loss functions for multimodal AI systems and some may be better than others, depending on context,” Nemenman says. “We wondered if there was a simpler way than starting from scratch each time you confront a problem in multimodal AI.”
A unifying framework
The researchers developed a unifying mathematical framework for deriving problem-specific loss functions, based on what information to keep and what information to throw away. They dubbed it the Variational Multivariate Information Bottleneck Framework.
“Our framework is essentially like a control knob,” says co-author Michael Martini, who worked on the project as an Emory postdoctoral fellow and research scientist in Nemenman’s group. “You can ‘dial the knob’ to determine the information to retain to solve a particular problem.”
“Our approach is a generalized, principled one,” adds Eslam Abdelaleem, first author of the paper. Abdelaleem took on the project as an Emory PhD candidate in physics before graduating in May and joining Georgia Tech as a postdoctoral fellow.
“Our goal is to help people to design AI models that are tailored to the problem that they are trying to solve,” he says, “while also allowing them to understand how and why each part of the model is working.”
AI-system developers can use the framework to propose new algorithms, to predict which ones might work, to estimate the needed data for a particular multimodal algorithm, and to anticipate when it might fail.
“Just as important,” Nemenman says, “it may let us design new AI methods that are more accurate, efficient and trustworthy.”
A physics approach
The researchers brought a unique perspective to the problem of optimizing the design process for multimodal AI systems.
“The machine-learning community is focused on achieving accuracy in a system without necessarily understanding why a system is working,” Abdelaleem explains. “As physicists, however, we want to understand how and why something works. So, we focused on finding fundamental, unifying principals to connect different AI methods together.”
Abdelaleem and Martini began this quest — to distill the complexity of various AI methods to their essence — by doing math by hand.
“We spent a lot of time sitting in my office, writing on a whiteboard,” Martini says. “Sometimes I’d be writing on a sheet of paper with Eslam looking over my shoulder.”
The process took years, first working on mathematical foundations, discussing them with Nemenman, trying out equations on a computer, then repeating these steps after running down false trails.
“It was a lot of trial and error and going back to the whiteboard,” Martini says.
Doing science with heart
They vividly recall the day of their eureka moment.
They had come up with a unifying principal that described a tradeoff between compression of data and reconstruction of data. “We tried our model on two test datasets and showed that it was automatically discovering shared, important features between them,” Martini says. “That felt good.”
As Abdelaleem was leaving campus after the exhausting, yet exhilarating, final push leading to the breakthrough, he happened to look at his Samsung Galaxy smart watch. It uses an AI system to track and interpret health data, such as his heart rate. The AI however, had misunderstood the meaning of his racing heart throughout that day.
“My watch said that I had been cycling for three hours,” Abdelaleem says. “That’s how it interpreted the level of excitement I was feeling. I thought, ‘Wow, that’s really something! Apparently, science can have that effect.”
Applying the framework
The researchers applied their framework to dozens of AI methods to test its efficacy.
“We performed computer demonstrations that show that our general framework works well with test problems on benchmark datasets,” Nemenman says. “We can more easily derive loss functions, which may solve the problems one cares about with smaller amounts of training data.”
The framework also holds the potential to reduce the amount of computational power needed to run an AI system.
“By helping guide the best AI approach, the framework helps avoid encoding features that are not important,” Nemenman says. “The less data required for a system, the less computational power required to run it, making it less environmentally harmful. That may also open the door to frontier experiments for problems that we cannot solve now because there is not enough existing data.”
The researchers hope others will use the generalized framework to tailor new algorithms specific to scientific questions they want to explore.
Meanwhile, they are building on their work to explore the potential of the new framework. They are particularly interested in how the tool may help to detect patterns of biology, leading to insights into processes such as cognitive function.
“I want to understand how your brain simultaneously compresses and processes multiple sources of information,” Abdelaleem says. “Can we develop a method that allows us to see the similarities between a machine-learning model and the human brain? That may help us to better understand both systems.”
