Block or Report
Block or report ZZYuting
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation me…
LAVIS - A One-stop Library for Language-Vision Intelligence
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
GPT4V-level open-source multi-modal model based on Llama3-8B
✨✨Latest Advances on Multimodal Large Language Models
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
A latent text-to-image diffusion model
High-Resolution Image Synthesis with Latent Diffusion Models
[Arxiv] A Survey on Video Diffusion Models
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
assistant tools for attention visualization in deep learning
Implementation of popular deep learning networks with TensorRT network definition API
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
主要存储Datawhale组队学习中“数据挖掘/机器学习”方向的资料。
2019 农业银行雅典娜杯数据挖掘大赛高校 Top2 Solution
Enhance your application with the ability to see and interact with humans using any RGB camera.
(CVPR 2022 Oral) Official implemention: TransRAC
repnet for mobile (counting repetitions in videos)
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
This is an official implementation for "Video Swin Transformers".