[go: nahoru, domu]

Skip to content
View shufangxun's full-sized avatar
  • Shanghai Jiao Tong University
  • Beijing, China

Block or report shufangxun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Modeling, training, eval, and inference code for OLMo

Python 4,371 432 Updated Sep 7, 2024

Mathematical Visual Instruction Tuning for Multi-modal Large Language Models

85 1 Updated Aug 5, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

118 Updated Aug 29, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 446 14 Updated Sep 7, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 720 35 Updated Sep 7, 2024

research work on multimodal cognitive ai

Python 52 9 Updated Aug 28, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 11,635 816 Updated Aug 31, 2024

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 128 Updated Sep 2, 2024

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 442 17 Updated Aug 16, 2024

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 626 107 Updated Aug 28, 2024

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,073 186 Updated Aug 20, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 710 34 Updated Aug 20, 2024

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

Python 16 Updated Aug 2, 2024

Framework to achieve context distillation in LLMs

Python 7 2 Updated Nov 24, 2023

MINT-1T: A one trillion token multimodal interleaved dataset.

722 18 Updated Jul 31, 2024

Implementation of Autoregressive Diffusion in Pytorch

Python 240 3 Updated Jul 30, 2024

Mixture of A Million Experts

Python 29 1 Updated Jul 30, 2024

Scaling Diffusion Transformers with Mixture of Experts

Python 172 7 Updated Sep 2, 2024

Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

74 2 Updated Jul 16, 2024

Transformer implementation for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 49 3 Updated Aug 15, 2024

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 469 16 Updated Aug 30, 2024

A Survey of Image Editing

206 8 Updated Jul 22, 2024

Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Python 430 32 Updated Jul 3, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 629 35 Updated Aug 5, 2024

A RLHF Infrastructure for Vision-Language Models

Python 85 5 Updated Jun 12, 2024

Visual Instruction Tuning for Qwen2 Base Model

Python 13 1 Updated Jun 29, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,185 46 Updated Aug 15, 2024

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Python 199 6 Updated Sep 2, 2024

[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

Python 301 20 Updated Jul 17, 2024
Python 41 2 Updated Apr 12, 2024
Next