MM
[ACL 2024] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin".
[ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval".
Collection of Composed Image Retrieval (CIR) papers.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
E5-V: Universal Embeddings with Multimodal Large Language Models
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zero-shot Document Ranking with Large Language Models.
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Mixture-of-Experts for Large Vision-Language Models
A lightweight open-source package to fine-tune embedding models.
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representations.
Generative Representational Instruction Tuning
This is the official repository for Retrieval Augmented Visual Question Answering
The code used to train and run inference with the ColPali architecture.
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
(ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
LAVIS - A One-stop Library for Language-Vision Intelligence
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
The implementation of FINER-MLLM, which is accepted by MM2024.
A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
"Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?"