Scalable and Efficient Multi-modal Learning Algorithm with Applications in Healthcare and Science

Duy Ngyuen, M.Sc.

Multi-modal learning leverages the combination of multiple types of data—such as text, images, and audio, to enhance machine learning models and their applications. In particular, the use of large language models (LLMs) in multi-modal learning (e.g., LLaVa) can greatly improve a model's ability to understand and generate rich, context-aware outputs across different data modalities. By incorporating various sources of information, these models can achieve a more nuanced understanding of tasks and provide more accurate and comprehensive responses. This research investigates emerging machine learning algorithms to bridge multi-modal aims to produce robust and generalization models. The algorithms include but are not limited to scalable optimal transport, a mixture of experts, differentiable multi-modal alignment using graph structures, learning with long-term attention, etc.


[1] Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering, AAAI 2023.

[2] LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching, NeurIPS 2023.

[3] Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks, ICML 2024.


To the top of the page