Multi-label Emotion Classification Using Large Language Models

Bachelor Thesis

Multi-label Emotion Classification Using Large Language Models


The accurate detection of emotions in user-generated text has proven to be of significant importance for various e-commerce applications and public well-being. Emotion analysis in natural language processing aims to associate text with a range of emotions, such as anger, fear, joy, surprise, disgust, or sadness. Previous research in textual emotion recognition has mainly focused on single-label emotion classification, assigning the most dominant emotion to a given expression, thus neglecting potential overlaps or coexistence of multiple emotions in a sentence or expression. 

Consider the tweet “I am in #shock and #awe about the crazy places my #toddler manages to get his dinner.” sourced from Affect in Tweets datasets [1], which contains a mix of positive and negative emotions, and the ground truth labels assigned to this example are joy, sadness, surprise, and fear. 

In this thesis, leveraging large language models (LLMs), you will conduct multi-label emotion classification, where one or more labels are assigned to each input by considering both the input sentence and a label set (i.e., emotion classes). The goal is to select a span of emotion classes from the label set as the output. You will provide a comprehensive investigation into the capabilities of LLMs in performing multi-label emotion classification, assessing how well LLMs perform in both zero-shot and few-shot settings.

 The experiments will be conducted on the Affect in Tweets dataset [1], a multi-labeled emotion dataset sourced from Twitter. You will primarily explore the performance of LLMs when directly conducting inference on the downstream task without specific training. This thesis investigates two models from the Flan model family, as they are open-sourced—namely Flan-T5 (XXL version, 13B) [2] and Flan-UL2 (20B) [3], using their checkpoints hosted on Huggingface for inference. Additionally, two models from OpenAI will be considered, including ChatGPT (gpt-3.5-turbo) and the text-davinci-003 model (text-003, 175B) [4,5].


  1. Mohammad, Saif, et al. "Semeval-2018 task 1: Affect in tweets." Proceedings of the 12th international workshop on semantic evaluation. 2018.
  2. Chung, Hyung Won, et al. "Scaling instruction-finetuned language models." arXiv preprint arXiv:2210.11416 (2022).
  3. Tay, Yi, et al. "Ul2: Unifying language learning paradigms." The Eleventh International Conference on Learning Representations. 2022.
  4. Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901.
  5. Ouyang, Long, et al. "Training language models to follow instructions with human feedback." Advances in Neural Information Processing Systems 35 (2022): 27730-27744.


To the top of the page