Paper on automatic evaluation at IEEE/ACM Trans. on Audio, Speech, and Language Processing

23 March 2023 by luisfernandodharo | 0 comments

On March 1st, 2023 was published our paper entitled PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment which describe a state-of-the-art model for automatic evaluation of dialogue systems at turn-level. The proposed model was assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. The model achieves high Spearman correlations (+0.47) with respect to the human annotations over all the evaluation datasets. This result is particularly good as the model exhibits better zero-shot generalization (i.e., good correlations on completely unseen datasets) than existing state-of-the-art models. Besides, the proposed model has the ability to easily adapt to new domains thanks to the usage of few-shot transfer learning techniques.

System architecture of a Panel of Experts (PoE). A transformer encoder T consists of L layers (T₁, T₂, …T_L). Different colors (blue, red, and green) denote domain-specific adapter modules. Each domain-specific adapter has L − 1 layers that are injected in between every two consecutive transformer layers. There are domain-specific classifiers after the final transformer layer, T_L. T is shared by all the domain-specific modules. Each adapter is trained using a different dataset. Adapters can be added as required or removed after testing the performance of the model on an new dataset.

In more detail, the proposed Panel of Experts (PoE) model is a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. The following figure shows the architecture of the network.

In addition, to improve the performance of the system, we also applied four different data augmentation techniques: 1) Syntactic & Semantic Negative Sampling, 2) Back-Translation, 3) Generation From State-of-the-art Dialogue Systems, and 4) Automatic Generation of Adversarial Responses.

Finally, the model generates the final score as the average of the different adapters or using one of the adapters whose trained data is closer to the evaluation data. Different tables including comparisons between the proposed model against other state-of-the-art metrics are provided on different settings. Besides, the zero or few-shot capabilities of the model are also evaluated depending on the percentage of in domain data used for adapting the model.

This paper is a collaboration between Universidad Politécnica de Madrid (UPM) and the National University of Singapore (NUS). The work leading to these results is also supported by the
European Commission through Project ASTOUND (101071191 – HORIZON-EIC-2021-PATHFINDERCHALLENGES-01)

ASTOUND

A EC funded project aimed at improving social competences of virtual agents through artificial consciousness based on the Attention Schema Theory

Paper on automatic evaluation at IEEE/ACM Trans. on Audio, Speech, and Language Processing

Leave a Reply Cancel reply