On March 1st, 2023 was published our paper entitled PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment which describe a state-of-the-art model for automatic evaluation of dialogue systems at turn-level. The proposed model was assessed on 16 dialogue evaluation datasets spanning a wide range of dialogue domains. The model achieves high Spearman correlations (+0.47) with respect to the human annotations over all the evaluation datasets. This result is particularly good as the model exhibits better zero-shot generalization (i.e., good correlations on completely unseen datasets) than existing state-of-the-art models. Besides, the proposed model has the ability to easily adapt to new domains thanks to the usage of few-shot transfer learning techniques.
In more detail, the proposed Panel of Experts (PoE) model is a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters. The shared encoder captures the general knowledge of dialogues across domains, while each adapter specializes in one specific domain and serves as a domain expert. The following figure shows the architecture of the network.
In addition, to improve the performance of the system, we also applied four different data augmentation techniques: 1) Syntactic & Semantic Negative Sampling, 2) Back-Translation, 3) Generation From State-of-the-art Dialogue Systems, and 4) Automatic Generation of Adversarial Responses.
Finally, the model generates the final score as the average of the different adapters or using one of the adapters whose trained data is closer to the evaluation data. Different tables including comparisons between the proposed model against other state-of-the-art metrics are provided on different settings. Besides, the zero or few-shot capabilities of the model are also evaluated depending on the percentage of in domain data used for adapting the model.
This paper is a collaboration between Universidad Politécnica de Madrid (UPM) and the National University of Singapore (NUS). The work leading to these results is also supported by the
European Commission through Project ASTOUND (101071191 – HORIZON-EIC-2021-PATHFINDERCHALLENGES-01)