Cross-Cultural Similarity Features for Cross-lingual Transfer Learning of Pragmatically Motivated Tasks

This repository contains data and codes used for Cross-Cultural Similarity Features for Cross-lingual Transfer Learning of Pragmatically Motivated Tasks, EACL 2021. We proposed pragmatically-motivated features that operationalize linguistic concepts such as language context-level and emotion semantics. As explained in the paper, we trained a gradient boosted decision tree based ranking model to select transfer languages. The code trains and evaluate the ranking model with the proposed pragmatically-motivated features. The code is based on this repository.

Dependencies

Below packages are required to run the code.

lang2vec
lightgbm

Data

For sentiment analysis task, we have collected review dataset across 16 languages(one can find details about each dataset in the appendix of the paper). We have formatted the dataset and put it in datasets/sa directory. Note that the languages are expressed in terms of ISO 639-3 codes. The same set of languages were used for dependency parsing task(Universal Dependencies). Again, the formatted files are located in datasets/dep. The raw zero-shot results that are used to train the ranking model is in Optimal ranking extraction raw data.xlsx.

How to run

To replicate the experiment results in Table 2 & 3, simply run

./run.sh

This code runs langrank_train.py and langrank_predict.py. One can specify the task and the feature group of interest in task and featrues variable in runs.sh. We used LambdaRank to train the model. Note that langrank_predict.py prints performance of each cross-validation split as well as the averaged performance, in terms of Mean Average Precision(MAP) and Normalized Discounted Cumulative Gain(NDCG).

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
RDRPOSTagger		RDRPOSTagger
datasets		datasets
features		features
indexed		indexed
indexing		indexing
pretrained		pretrained
rankings		rankings
.gitignore		.gitignore
Optimal ranking extraction raw data.xlsx		Optimal ranking extraction raw data.xlsx
README.md		README.md
langrank.py		langrank.py
langrank_predict.py		langrank_predict.py
langrank_train.py		langrank_train.py
new_features.py		new_features.py
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Cultural Similarity Features for Cross-lingual Transfer Learning of Pragmatically Motivated Tasks

Dependencies

Data

How to run

About

Releases

Packages

Languages

hwijeen/langrank

Folders and files

Latest commit

History

Repository files navigation

Cross-Cultural Similarity Features for Cross-lingual Transfer Learning of Pragmatically Motivated Tasks

Dependencies

Data

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages