-
Former-Applied Science Intern @amzn
- Riverside, California
- https://anirudh257.github.io/
Block or Report
Block or report Anirudh257
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Code for ALBEF: a new vision-language pre-training method
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
a state-of-the-art-level open visual language model | 多模态预训练模型
Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Simple implementation of OpenAI CLIP model in PyTorch.
An open source implementation of CLIP.
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supportin…
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)
An efficient video loader for deep learning with smart shuffling that's super easy to digest
Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Multilingual Sentence & Image Embeddings with BERT
🔍 Explore Egocentric Vision: research, data, challenges, real-world apps. Stay updated & contribute to our dynamic repository! Work-in-progress; join us!
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
A collection of the forefront of Egocentric Human Activity Recognition (HAR) and Action Anticipation through Deep Learning
EILEV: Efficient In-Context Learning in Vision-Language Models for Egocentric Videos
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding