Towards Resource-Efficient Foundation Models via Architectural, Algorithmic, and Data-Level Optimization

Open Master's Thesis, Supervisor: Duy Nguyen, M.Sc.

true" ? copyright : '' }
Description

This thesis aims to explore and develop efficient architectural mechanisms to improve the scalability and resource-efficiency of foundation models, with a focus on multi-modal systems such as large language models (LLMs), SAM, and SAM-2. As these models grow in size and complexity, their deployment becomes increasingly constrained by computational and memory limitations.

To address this challenge, the research investigates a set of complementary techniques: quantization [1], to reduce the model’s precision and memory usage while preserving accuracy; multi-head prediction [2], to enable diverse and parallelized output generation that improves inference throughput and robustness; and token merging [3], a strategy to dynamically reduce input token redundancy, thereby decreasing attention complexity and accelerating inference. These methods will be evaluated and integrated within representative multi-modal foundation architectures to study their effectiveness in real-world vision-language tasks.

The ultimate objective is to design a more compact and efficient foundation model architecture capable of delivering high performance with significantly reduced computational costs—paving the way for broader accessibility and real-time deployment.

References:

[1] Q-VLM: Post-training Quantization for Large Vision-Language Models, NeurIPS 2024.
[2] Gloeckle, Fabian, et al. "Better & faster large language models via multi-token prediction.", Arxiv 2024
[3] Tran, Chau, et al. "Accelerating transformers with spectrum-preserving token merging.", NeurIPS 2024

Supervisors

To the top of the page