POSTS |

28 September 2023
by luisfernandodharo
0 comments

Automatic Detection of Inconsistencies and Hierarchical Topic Classification for Open-Domain Chatbots

Current State-of-the-Art (SotA) chatbots are able to produce high-quality sentences, handling different conversation topics and larger interaction times. Unfortunately, the generated responses depend greatly on the data on which they have been trained, the specific dialogue history and current turn used for guiding the response, the internal decoding mechanisms, and ranking strategies, among others. Therefore, it may happen that for semantically similar questions asked by users, the chatbot may provide a different answer, which can be considered as a form of hallucination or producing confusion in long-term interactions.

In this research paper, we propose a novel methodology consisting of two main phases: (a) hierarchical automatic detection of topics and subtopics in dialogue interactions using a zero-shot learning approach, and (b) detecting inconsistent answers using k-means and the Silhouette coefficient.

To evaluate the efficacy of topic and subtopic detection, we use a subset of the DailyDialog dataset and real dialogue interactions gathered during the Alexa Socialbot Grand Challenge 5 (SGC5). The proposed approach enables the detection of up to 18 different topics and 102 subtopics. The experimental results demonstrate the efficacy of the topic detection algorithm, achieving an F1 weighted score of 0.67 when detecting 13 distinct topics and an F1 weighted score of 0.45 when detecting 18 distinct topics. In terms of the subtopic level, a weighted F1 score of 0.67 was achieved. Besides, we show how our proposed approach outperforms a larger model trained on specific dialogue data. An advantage of our approach is that it is scalable allowing the incorporation of new categories and subcategories (fine-grained) that the larger model is not able to recognize.

For the purpose of detecting inconsistencies, we manually generate multiple paraphrased questions and employ several pre-trained SotA chatbot models to generate responses. Moreover, the algorithm exhibits precise estimation capabilities in determining the number of diverse responses, as evidenced by an MSE of 3.4 calculated over a set of 109 handcrafted responses, 15 sets of original questions plus their paraphrases, passed to 4 small model chatbots. In the case of the 120 questions created with GPT-4, 15 question sets each consisting of 1 original and its respective 7 paraphrases, and fed into 4 State-of-the-Art chatbots, the overall resulting MSE was 3.2. These results show that even LLMs produce inconsistent answers and our approach is a good proxy to detect such cases.

As future work, we will primarily focus on two main aspects: expanding the range of high-level topics and subsequently evaluating the algorithm’s performance in identifying subtopics. Additionally, we included this topic and subtopic classifier in the dialogue management for the chatbot that we used during our participation in the Alexa Socialbot Grand Challenge (SGC5) [1]. Regarding the detection of inconsistent responses, our efforts will be directed towards the development of controllable algorithms and architectures, such as TransferTransfo [2] or CTRL [3], leveraging persona profiles within these frameworks with the idea of generating more consistent responses. Furthermore, we seek to explore mechanisms to incorporate these identified inconsistencies into an automated evaluation of dialogue systems [4, 5], according to the recommendations made in [6].

Bibliography

[1]. Estecha-Garitagoitia, Marcos, Mario Rodríguez-Cantelar, Alfredo Garrachón Ruiz, Claudia Garoé Fernández García, Sergio Esteban Romero, Cristina Conforto, Alberto Saiz Fernández, Luis Fernando Fernández Salvador, and Luis Fernando D’Haro. “THAURUS: An Innovative Multimodal Chatbot Based on the Next Generation of Conversational AI.” Alexa Prize SocialBot Grand Challenge 5.

[2]. Wolf, T.; Sanh, V.; Chaumond, J.; Delangue, C. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents. arXiv 2019, arXiv:cs.CL/1901.08149.

[3]. Keskar, N.S.; McCann, B.; Varshney, L.R.; Xiong, C.; Socher, R. CTRL: A Conditional Transformer Language Model for Controllable Generation. arXiv 2019, arXiv:cs.CL/1909.05858.

[4]. Zhang, C.; Sedoc, J.; D’Haro, L.F.; Banchs, R.; Rudnicky, A. Automatic Evaluation and Moderation of Open-domain Dialogue Systems. arXiv 2021, arXiv:cs.CL/2111.02110.

[5]. Zhang, C.; D’Haro, L.F.; Friedrichs, T.; Li, H. MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation. Proc. AAAI Conf. Artif. Intell. 2022, 36, 11657–11666.

11 September 2023
by fjmirandav
0 comments

ETHICS AND HUMAN RIGHTS MATTER IN ASTOUND PROJECT

Improving social competences of virtual agents through artificial consciousness based on the Attention Schema Theory

On May 15 and 16 2023, a congress about Public Law and Algorithms was held in the University of Valencia. Professor Celia Fernández Aller and prof. Jesús Salgado Criado were invited to contribute with a presentation: An interdisciplinary conversation: how to integrate ethics and people’s rights into ASTOUND project.

https://esdeveniments.uv.es/97884/detail/public-law-and-algorithms-towards-a-legal-framework-for-the-fair-and-non-discriminatory-use-of-ai-w.html

The starting point was that a human rights law approach to algorithmic accountability is something crucial. Ethics is relevant, and the rights approach is a complementary and a essential framework.

Ethics is currently at the heart of ASTOUND project. At this stage, the project has checked available multimodal datasets for training and evaluation and has carried out an analysis of current approaches for dataset curation for bias and toxicity. While designing our chatbot architecture, many discussions during monthly general meetings have taken place around ethical aspects. Apart from that, a small ethics group has been organized, and the External Ethical Board has been selected.

ASTOUND project is based on the assumption that Ethics will not create, but avoid, future risk for the project’s success, as it will bring trust. There is an opportunity to offer guidelines to contribute to other future projects with similar challenges.

The most pressing issue is related to selecting risks to avoid. During the first phase of ethical analysis, a thorough list of potential issues or topics has been identified, such as: a) fairness (no discrimination against any group of persons); b) dignity (not impersonating, making all the time clear that the person is chatting with a machine); c) autonomy (potential influence/manipulation from chatbot side, exercising influence to persuade on a specific position); d) explainability/interpretability of chatbot responses; e) observability, auditability and monitoring: how is the system planned to be audited and monitored in terms of performance; what performance variables can be collected; f) safety: controllability – how can the system be controlled in case of degradation of performance?; g) security/privacy in order not to use personal data without knowdlege of the data subject; h) transparency; accountability/responsibility assigned in case of malfunction (legal and moral), errors, inaccuracies or dangerous suggestions; i) long term impacts of the technology.

Some tools which are being used are the Assessment List for Trustworthy Artificial Intelligence (ALTAI) to develop procedures to detect, assess the level and address potential risks (https://futurium.ec.europa.eu/en/european-aialliance/pages/altai-assessment-list-trustworthy-artificial-intelligence), as well the Ethics By Design and Ethics of Use Approaches for Artificial Intelligence (https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/horizon/guidance/ethics-by-design-and-ethics-of-use-approaches-for-artificial-intelligence_he_en.pdf).

As Nissenbaum says, “we cannot simply align the world with the values and principles we adhered to prior to the advent of technological challenges. Rather, we must grapple with the new demands that changes wrought by the presence and use of information technology have placed on values and moral principles“¹. We must bring attention to the values that are unconsciously built into technology.

Although the ethical framework is something crucial, it has limitations, as ethics is not compulsory, its principles are not universal and do not have consensus (hundreds of ethical codes are available). The human rights approach offers a complementary vision, based on common principles as universality of human rights, participation, transparency, accountabiliy and non discrimination. AI governance is needed and legal instruments as the UE Artificial Intelligence Act can be very useful. The regulator will make clear which are the responsibilities of each of the actors involved in the life of an AI system. Supervisory bodies will be able to remove from the loop those who seek to make irresponsible use of this technology. The AI Act is based on a system of risk analysis and provides for a series of requirements applicable to high-risk AI systems, in particular for system providers, such as the obligation to draw up an EU declaration of conformity and to affix the CE conformity marking². The human rights approach will help adapting and overcoming limitations of the text³.

General-purpose artificial intelligence (AI) technologies are now included in the AI Act, so ASTOUND project will have to analyse carefully the legal implications and impacts on human rights of future conscious chatbots .

References:

¹C. Allen, W. Wallach and I. Smit, “Why Machine Ethics?,” in IEEE Intelligent Systems, vol. 21, no. 4, pp. 12-17, July-Aug. 2006, doi: 10.1109/MIS.2006.83.

² Leonardo Cervera Navas. Por qué hay que abordar la regulación de la inteligencia artificial como la de la aviación comercial. EL PAIS, 25-05-2023.

³ J. Salgado-Criado and C. Fernández-Aller, “A Wide Human-Rights Approach to Artificial Intelligence Regulation in Europe,” in IEEE Technology and Society Magazine, vol. 40, no. 2, pp. 55-65, June 2021, doi: 10.1109/MTS.2021.3056284; Vinodkumar Prabhakaran, Margaret Mitchell, Timnit Gebru, Iason Gabriel. “A Human Rights-Based Approach to Responsible AI”. arXiv:2210.02667 [cs.AI]

6 June 2023
by hhardy
2 Comments

The Science of Consciousness

From the 22rd of May to the 27th of May 2023, our consortion members Indeep AI and École Normale Supérieure, attended The Science of Consciousness conference in Taormina, Italy as part of our ground breaking project, ASTOUND.

Our mission at Astound is to push the boundaries of artificial intelligence and lay the groundwork for the development of consciousness within AI systems. Through knowledge-sharing, idea exchange, and collaboration with some of the brightest minds in the scientific community, our team strove to foster innovation and gain deeper insights into the enigmatic nature of consciousness.

Furthermore, our very own Aïda Elamrani of École Normale Supérieure gave a talk during the conference on “To What Extent can Machines be Conscious”. The talk discussed how experts disagree on the possibility of mechanically implementing consciousness, with challenges involving phenomenalism, physicalism, computation, and information. A compromise is possible by viewing consciousness as a virtual reality implemented through computational mechanisms, but this raises further research questions regarding different interpretations of information. Achieving machine consciousness depends on the chosen interpretation.

Stay tuned for more updates as we take the lead in making significant progress towards developing consciousness in artificial intelligence.

23 March 2023
by luisfernandodharo
0 comments

Paper on automatic evaluation at IEEE/ACM Trans. on Audio, Speech, and Language Processing

On March 1st, 2023 was published our paper entitled PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment which describe a state-of-the-art model for automatic evaluation of dialogue systems at turn-level. The proposed model was assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. The model achieves high Spearman correlations (+0.47) with respect to the human annotations over all the evaluation datasets. This result is particularly good as the model exhibits better zero-shot generalization (i.e., good correlations on completely unseen datasets) than existing state-of-the-art models. Besides, the proposed model has the ability to easily adapt to new domains thanks to the usage of few-shot transfer learning techniques.

System architecture of a Panel of Experts (PoE). A transformer encoder T consists of L layers (T₁, T₂, …T_L). Different colors (blue, red, and green) denote domain-specific adapter modules. Each domain-specific adapter has L − 1 layers that are injected in between every two consecutive transformer layers. There are domain-specific classifiers after the final transformer layer, T_L. T is shared by all the domain-specific modules. Each adapter is trained using a different dataset. Adapters can be added as required or removed after testing the performance of the model on an new dataset.

In more detail, the proposed Panel of Experts (PoE) model is a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. The following figure shows the architecture of the network.

In addition, to improve the performance of the system, we also applied four different data augmentation techniques: 1) Syntactic & Semantic Negative Sampling, 2) Back-Translation, 3) Generation From State-of-the-art Dialogue Systems, and 4) Automatic Generation of Adversarial Responses.

Finally, the model generates the final score as the average of the different adapters or using one of the adapters whose trained data is closer to the evaluation data. Different tables including comparisons between the proposed model against other state-of-the-art metrics are provided on different settings. Besides, the zero or few-shot capabilities of the model are also evaluated depending on the percentage of in domain data used for adapting the model.

This paper is a collaboration between Universidad Politécnica de Madrid (UPM) and the National University of Singapore (NUS). The work leading to these results is also supported by the
European Commission through Project ASTOUND (101071191 – HORIZON-EIC-2021-PATHFINDERCHALLENGES-01)

23 March 2023
by luisfernandodharo
Comments Off on ASTOUND meeting on Feb 22, 2023

ASTOUND meeting on Feb 22, 2023

On February 22th, 2023 we had our third meeting to discuss and do brainstorming on the design of the AST architecture, and how to create a new dataset that can be used for training and deploying the proof of concept of our conscious chatbot. The total number of attendees was 20 coming from all the partners in the project. The meeting started at 13:00 and finished at 17:00.

The agenda started with some administrative topics presented by Dr. Giorgos Kontaxakis (UPM) and reminders about the next deliverables and milestones for the project. After this, Dr. Fernando Fernández (UPM) presented an interesting proposal about the dataset creation for training the chatbot, which received good comments from the attendees.

Then, some of the initial experiments from Dr. Cristina Becchio (UKE) for evaluating the consciousness of the chatbot were presented. These initial experiments are relevant since they provide some light on the limitations of current chatbots and what it can be done from the project.

Starting at 3:40 pm, Dr. Aïda Elamrani (ENS) provided an update about the constitution of the Ethics Advisory Board. Finally, Dr. Luis Fernando D’Haro (UPM) presented a more detailed plan for the modules and architecture of the chatbot based on the proposed AST implementation presented in January by Dr. Guido Manzi (IAI).

Finally, the consortium established the creation of 2 topic groups on Ethics and Chatbot to promote a more dynamic interchange of information and discussion, focused on the tasks that are more relevant for the following months of the project.

The next meeting will be held in March 28th, 2023 with the idea of describing progresses in the definition of the AST, the dataset creation, and ethical aspects. Stay tuned for next news about our project.