%0 Conference Proceedings
%T Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
%A Bang, Yejin
%A Yu, Tiezheng
%A Madotto, Andrea
%A Lin, Zhaojiang
%A Diab, Mona
%A Fung, Pascale
%Y Ovalle, Anaelia
%Y Chang, Kai-Wei
%Y Mehrabi, Ninareh
%Y Pruksachatkun, Yada
%Y Galystan, Aram
%Y Dhamala, Jwala
%Y Verma, Apurv
%Y Cao, Trista
%Y Kumar, Anoop
%Y Gupta, Rahul
%S Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
%D 2023
%8 July
%I Association for Computational Linguistics
%C Toronto, Canada
%F bang-etal-2023-enabling
%X Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI.
%R 10.18653/v1/2023.trustnlp-1.27
%U https://aclanthology.org/2023.trustnlp-1.27
%U https://doi.org/10.18653/v1/2023.trustnlp-1.27
%P 311-325