Our team has achieved remarkable results in the task of understanding smell references in historical texts and images at MediaEval 2023! This is a crucial step towards a more comprehensive understanding of historical experiences. We leveraged the power of Multi-Modal Large Language Models (MM-LLMs) trained as olfactory experts. By fine-tuning a specific MM-LLM called Qwen-VL-Chat on olfactory data, we achieved an impressive F1-macro score of 0.7394 in identifying matching smell references between text and images. This score is statistically on par with the current state-of-the-art system!
Research

Automatic Speech Recognition

Speaker Identification / Diarization

Language Recognition

Multimodal Emotion Recognition

Machine Translation

Keyword Spotting

Natural Language Understanding

Intent Detection

Spoken Dialog Systems

Conversational Agents (chatbots)

Text-To-Speech Synthesis

Topic identification

Text Generation

Trustworthiness Assessment

Wearable Sensing

Multimedia Information Retrieval

Image and Video Processing

Scene Understanding

Multimedia Content Perception

Human Motion Modeling