Our team has achieved remarkable results in the task of understanding smell references in historical texts and images at MediaEval 2023! This is a crucial step towards a more comprehensive understanding of historical experiences. We leveraged the power of Multi-Modal Large Language Models (MM-LLMs) trained as olfactory experts. By fine-tuning a specific MM-LLM called Qwen-VL-Chat on olfactory data, we achieved an impressive F1-macro score of 0.7394 in identifying matching smell references between text and images. This score is statistically on par with the current state-of-the-art system!
Research
![asr](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/person.png)
Automatic Speech Recognition
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/diarization.png)
Speaker Identification / Diarization
![language](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/languages.png)
Language Recognition
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/emotions.png)
Multimodal Emotion Recognition
![translation](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/translating.png)
Machine Translation
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/keyword.png)
Keyword Spotting
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/concept-map.png)
Natural Language Understanding
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/target.png)
Intent Detection
![dialog](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/chat-2.png)
Spoken Dialog Systems
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/10/chatbot.png)
Conversational Agents (chatbots)
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/text-to-speech.png)
Text-To-Speech Synthesis
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/folder.png)
Topic identification
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/book.png)
Text Generation
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/hands.png)
Trustworthiness Assessment
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/wearable.png)
Wearable Sensing
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/multimedia-3.png)
Multimedia Information Retrieval
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/process.png)
Image and Video Processing
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/cinema.png)
Scene Understanding
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/perception.png)
Multimedia Content Perception
![](https://blogs.upm.es/gthau/wp-content/uploads/sites/776/2021/11/motion-sensor.png)
Human Motion Modeling