Grupo de Tecnología del Habla y Aprendizaje Automático

Speech Technology and Machine Learning Group

Smelling Success: Our Breakthrough in Understanding Smell in Text and Images

Our team has achieved remarkable results in the task of understanding smell references in historical texts and images at MediaEval 2023! This is a crucial step towards a more comprehensive understanding of historical experiences. We leveraged the power of Multi-Modal Large Language Models (MM-LLMs) trained as olfactory experts. By fine-tuning a specific MM-LLM called Qwen-VL-Chat on olfactory data, we achieved an impressive F1-macro score of 0.7394 in identifying matching smell references between text and images. This score is statistically on par with the current state-of-the-art system!

Click here for more details about the MUSTI challenge

RentallGPT: AI Assistant Revolutionizes Green Energy Rentals

THAU's innovative AI Virtual Assistant, RentallGPT, developed within the "CIRCULAR" EIT Digital project, empowers Rentall's services. Our chatbot streamlines green energy product rentals by: 1) Supporting solar experts with customer inquiries 2) Offering efficient search and analysis of rental information 3) Providing clear and informative responses like a comprehensive FAQ. RentallGPT leverages ChatGPT and prompt-based development for faster creation and deployment. RAG (Augmented Retrieval Generation) technique enriches RentallGPT's knowledge ensuring accurate and informative responses. RentallGPT paves the way for AI-powered efficiency and sustainability in the rental service industry.

EIT Digital Program at the UPM

Welcome to THAU

Welcome to the Speech Technology and Machine Learning group website

2nd place in the EmoSPeech Challenge at IberLEF 2024

We're thrilled to announce our team's achievement at IberLEF 2024, where we secured second place in the EmoSPeech task, focused on Multimodal Speech-text Emotion Recognition in Spanish. We developed two strategies: using the Qwen-Audio-Chat model with Low-Rank Adaptation (LoRA) and the novel Whisper-Gemma model, combining the Whisper-large-v3 audio encoder with the Gemma Large Language Model (LLM). The Qwen-Audio-Chat model achieved an f1-macro score of 0.8248, and the Whisper-Gemma model scored 0.7904. These results show the effectiveness of parameter-efficient fine-tuning and combining robust audio encoders with LLMs in improving Speech Emotion Recognition (SER) systems. Our work will be presented at IberLEF 2024 on September 24th, 2024, in Valladolid, Spain, as part of SEPLN 2024.

Information about the EmoSpeech challenge

Our Team Makes Strides in Emotion Recognition at Odyssey 2024!

This prestigious workshop brings together researchers worldwide to push the boundaries of speaker and language recognition technologies, including the complex field of emotion detection in speech. The challenge focused on analyzing speech recordings from the MSP-Podcast corpus to identify specific emotions. We're proud to announce that our team successfully qualified 7 out of 69 participants, demonstrating a strong showing in this competitive environment. Stay tuned for further updates as we delve deeper into emotion recognition research and explore its potential applications!

More info