Efficient embeddings

Experiments for efficient repurposing embedding models via contrastive learning of pre-trained LLMs.

Requirements

Python 3.10 -- 3.11

Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .

Downloading and preprocessing data

The shell scripts for downloading and light pre-processing of various datasets are located in scripts/data_prep. E.g., to download NLI dataset run:

./scripts/data_prep/nli.sh

and the dataset in JSONL file will be save to data/nli/all.jsonl.

The data used for experiments in the paper (the BAAI data set) can be downloaded from https://data.baai.ac.cn/details/BAAI-MTP (login required).

Running training and evaluation

To run training and MTEB evaluation for the four methods considered in the paper (full fine-tuning, layer freezing, only-bias tuning, and LoRA), please run an appropriate script from among experiments/*.py. For instance:

./experiments/train_and_eval_freezing.py

The checkpoints of the model and other data will be saved to the result directory.

experiments/*.gin files associated with the Python scripts define configurations of the experiments (learning rate, weight decay, LoRA rank, etc.) You can modify them.

In the file data/mteb/test_tasks_list there is a short list of MTEB tasks used for evaluation for debugging purposes. The list used for evaluation in the paper is in data/mteb/broad_tasks_list. (The former list is specified in the config files.)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/mteb		data/mteb
effemb		effemb
experiments		experiments
scripts/data_prep		scripts/data_prep
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient embeddings

Requirements

Setup

Downloading and preprocessing data

Running training and evaluation

About

Releases

Packages

Languages

License

SeqDM/Efficient-Embeddings

Folders and files

Latest commit

History

Repository files navigation

Efficient embeddings

Requirements

Setup

Downloading and preprocessing data

Running training and evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages