Our team has achieved remarkable results in the task of understanding smell references in historical texts and images at MediaEval 2023! This is a crucial step towards a more comprehensive understanding of historical experiences. We leveraged the power of Multi-Modal Large Language Models (MM-LLMs) trained as olfactory experts. By fine-tuning a specific MM-LLM called Qwen-VL-Chat on olfactory data, we achieved an impressive F1-macro score of 0.7394 in identifying matching smell references between text and images. This score is statistically on par with the current state-of-the-art system!
Research
Automatic Speech Recognition
Speaker Identification / Diarization
Language Recognition
Multimodal Emotion Recognition
Machine Translation
Keyword Spotting
Natural Language Understanding
Intent Detection
Spoken Dialog Systems
Conversational Agents (chatbots)
Text-To-Speech Synthesis
Topic identification
Text Generation
Trustworthiness Assessment
Wearable Sensing
Multimedia Information Retrieval
Image and Video Processing
Scene Understanding
Multimedia Content Perception
Human Motion Modeling