%0 Conference Proceedings %T Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values %A Bang, Yejin %A Yu, Tiezheng %A Madotto, Andrea %A Lin, Zhaojiang %A Diab, Mona %A Fung, Pascale %Y Ovalle, Anaelia %Y Chang, Kai-Wei %Y Mehrabi, Ninareh %Y Pruksachatkun, Yada %Y Galystan, Aram %Y Dhamala, Jwala %Y Verma, Apurv %Y Cao, Trista %Y Kumar, Anoop %Y Gupta, Rahul %S Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023) %D 2023 %8 July %I Association for Computational Linguistics %C Toronto, Canada %F bang-etal-2023-enabling %X Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI. %R 10.18653/v1/2023.trustnlp-1.27 %U https://aclanthology.org/2023.trustnlp-1.27 %U https://doi.org/10.18653/v1/2023.trustnlp-1.27 %P 311-325