[go: nahoru, domu]

Skip to content
View Anirudh257's full-sized avatar
💭
Hustling
💭
Hustling
Block or Report

Block or report Anirudh257

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Code for ALBEF: a new vision-language pre-training method

Python 1,462 193 Updated Sep 20, 2022

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

316 21 Updated Jan 21, 2024

Grounded Language-Image Pre-training

Python 2,089 186 Updated Jan 24, 2024

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 4,503 601 Updated May 20, 2024
Python 22 1 Updated Jun 22, 2023

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 5,693 390 Updated May 29, 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)

Jupyter Notebook 23 Updated Jun 25, 2024

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Python 1,484 206 Updated Apr 9, 2024

Simple implementation of OpenAI CLIP model in PyTorch.

Jupyter Notebook 583 87 Updated Apr 17, 2024

An open source implementation of CLIP.

Python 9,302 926 Updated Jul 23, 2024

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supportin…

Jupyter Notebook 10,607 1,510 Updated Jul 23, 2024

Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Python 17 1 Updated Mar 16, 2024

Inference code for Llama models

Python 54,389 9,335 Updated Jul 23, 2024

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Python 2,376 247 Updated Apr 24, 2024

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Jupyter Notebook 181 34 Updated Aug 22, 2022

Mamba SSM architecture

Python 11,863 987 Updated Jul 24, 2024

Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)

132 9 Updated Jan 7, 2024

An efficient video loader for deep learning with smart shuffling that's super easy to digest

C++ 1,742 150 Updated Jul 17, 2024

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Go 79,837 6,097 Updated Jul 24, 2024

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 32,202 5,487 Updated Jul 24, 2024

Multilingual Sentence & Image Embeddings with BERT

Python 14,517 2,403 Updated Jul 22, 2024

🔍 Explore Egocentric Vision: research, data, challenges, real-world apps. Stay updated & contribute to our dynamic repository! Work-in-progress; join us!

59 6 Updated Jul 8, 2024

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Python 31,139 2,325 Updated Jul 24, 2024

A collection of the forefront of Egocentric Human Activity Recognition (HAR) and Action Anticipation through Deep Learning

6 Updated Feb 7, 2024

EILEV: Efficient In-Context Learning in Vision-Language Models for Egocentric Videos

Python 104 9 Updated Jun 13, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 631 48 Updated Mar 25, 2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 2,721 192 Updated Jul 20, 2024

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

478 28 Updated Jul 24, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,229 2,452 Updated Jul 15, 2024

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,617 237 Updated Jun 4, 2024
Next