MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Source code for our ACL 2024 paper : MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Click the links below to view our papers and checkpoints

If you find this work useful, please cite our paper and give us a shining star 🌟

@inproceedings{zhou2024marvel,
 title={MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin},
 author={Zhou, Tianshuo and Mei, Sen and Li, Xinze and Liu, Zhenghao and Xiong, Chenyan and Liu, Zhiyuan and Gu, Yu and Yu, Ge},
 booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics},
 year={2024}
}

Overview

MARVEL unlocks the multi-modal capability of dense retrieval via visual module plugin. It encodes queries and multi-modal documents with a unified encoder model to bridge the modality gap between images and texts, conducts retrieval, modality routing, and result fusion within a unified embedding space.

Requirement

1. Install the following packages using Pip or Conda under this environment

Python==3.7
Pytorch
transformers
clip
faiss-cpu==1.7.0
tqdm
numpy
base64
Install the pytrec_eval from https://github.com/cvangysel/pytrec_eval

We provide the version file requirements.txt of all our used packages, if you have any problems configuring the environment, please refer to this document.

2. Prepare the pretrained CLIP and T5-ANCE

MARVEL is built on CLIP and T5-ANCE model.

Reproduce MARVEL

Download Code & Dataset

First, use git clone to download this project:

git clone https://github.com/OpenMatch/MARVEL
cd MARVEL

Download link for our WebQA: WebQA. (❗️Note: For the imgs.tsv, you need to download the data from this link and run 7z x imgs.7z.001).
Please refer to ClueWeb22-MM to obtain pretrain data and retrieval benchmark.
Place the downloaded dataset in the data folder:

data/
├──WebQA/
│   ├── train.json
│   ├── dev.json
│   ├── test.json
│   ├── test_qrels.txt
│   ├── all_docs.json
│   ├── all_imgs.json
│   ├── imgs.tsv
│   └── imgs.lineidx.new
├──ClueWeb22-MM/
│   ├── train.parquet
│   ├── dev.parquet
│   ├── test.parquet
│   ├── test_qrels.txt
│   ├── text.parquet
│   └── image.parquet
└──pretrain/
    ├── train.parquet
    └── dev.parquet

Train MARVEL-ANCE

Using the WebQA dataset as an example, I will show you how to reproduce the results in the MARVEL paper. The same is true for the ClueWeb22-MM dataset. Also, we provide the checkpoint for each step. You can skip a step and continue training.

First step: Go to the pretrain folder and pretrain MARVEL's visual module checkpoint:

cd pretrain
bash train.sh

Second step: Go to the DPR folder and train MARVEL-DPR using inbatch negatives checkpoint:

cd DPR
bash train_webqa.sh

Third step: Then using MERVEL-DPR to generate hard negatives for training MARVEL-ANCE:

bash get_hn_webqa.sh

Final step: Go to the ANCE folder and train MARVEL-ANCE using hard negatives checkpoint:

cd ANCE
bash train_ance_webqa.sh

Evaluate Retrieval Effectiveness

These experimental results are shown in Table 2 of our paper.
Go to the DPR or ANCE folder and evaluate model performance as follow:

bash gen_embeds.sh
bash retrieval.sh

Results

The results are shown as follows.

WebQA

Setting	Model	MRR@10	NDCG@10	Rec@100
Single Modality\(Text Only)	BM25	53.75	49.60	80.69
	DPR (Zero-Shot)	22.72	20.06	45.43
	CLIP-Text (Zero-Shot)	18.16	16.76	39.83
	Anchor-DR (Zero-Shot)	39.96	37.09	71.32
	T5-ANCE (Zero-Shot)	41.57	37.92	69.33
	BERT-DPR	42.16	39.57	77.10
	NQ-DPR	41.88	39.65	42.44
	NQ-ANCE	45.54	42.05	69.31
Divide-Conquer	VinVL-DPR	22.11	22.92	62.82
	CLIP-DPR	37.35	37.56	85.53
	BM25 & CLIP-DPR	42.27	41.58	87.50
UnivSearch	CLIP (Zero-Shot)	10.59	8.69	20.21
	VinVL-DPR	38.14	35.43	69.42
	CLIP-DPR	48.83	46.32	86.43
	UniVL-DR	62.40	59.32	89.42
	MARVEL-DPR	55.71	52.94	88.23
	MARVEL-ANCE	65.15	62.95	92.40

ClueWeb22-MM

Setting	Model	MRR@10	NDCG@10	Rec@100
Single Modality\(Text Only)	BM25	40.81	46.08	78.22
	DPR (Zero-Shot)	20.59	23.24	44.93
	CLIP-Text (Zero-Shot)	30.13	33.91	59.53
	Anchor-DR (Zero-Shot)	42.92	48.50	76.52
	T5-ANCE (Zero-Shot)	45.65	51.71	83.23
	BERT-DPR	38.56	44.41	80.38
	NQ-DPR	42.35	61.71	83.50
	NQ-ANCE	45.89	51.83	81.21
Divide-Conquer	VinVL-DPR	29.97	36.13	74.56
	CLIP-DPR	39.54	47.16	87.25
	BM25 & CLIP-DPR	41.58	48.67	83.50
UnivSearch	CLIP (Zero-Shot)	16.28	18.52	40.36
	VinVL-DPR	35.09	40.36	75.06
	CLIP-DPR	42.59	49.24	87.07
	UniVL-DR	47.99	55.41	90.46
	MARVEL-DPR	46.93	53.76	88.74
	MARVEL-ANCE	55.19	62.83	93.16

Contact

If you have questions, suggestions, and bug reports, please email:

zhoutianshuo@stumail.neu.edu.cn     meisen@stumail.neu.edu.cn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Overview

Requirement

Reproduce MARVEL

Download Code & Dataset

Train MARVEL-ANCE

Evaluate Retrieval Effectiveness

Results

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
ANCE		ANCE
ClueWeb22-MM		ClueWeb22-MM
DPR		DPR
data		data
image		image
pretrain		pretrain
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

OpenMatch/MARVEL

Folders and files

Latest commit

History

Repository files navigation

MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Overview

Requirement

Reproduce MARVEL

Download Code & Dataset

Train MARVEL-ANCE

Evaluate Retrieval Effectiveness

Results

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages