This repository is created as part of Round 3 of Crux Inductions. I was given three tasks to implement. These tasks included the following:
- Reading and understanding the Attention Is All You Need paper.
- Implementing the transformer architecture and using it to perform named entity recognition on the given NER Corpus dataset.
- I was also supposed to do data preprocessing, train the model and report the performance of the model.
- Reading and understanding the architectures of the Convolutional Block Attention Mechanism(CBAM) and the Squeeze-And-Excitation Network.
- Implementing and comparing both architectures on any non-vanilla classification task of your choosing.
- For clarification, a vanilla classification task is one where you have a labelled set of images on which you directly perform single or multi-class classification. Examples of non-vanilla classification tasks are image segmentation, object detection etc.
- I chose image segmentation using the Cityscapes dataset.
- Understanding the architecture of the Vision Transformer and implementing the architecture to perform image classification on a dataset of your choosing.
- I chose the CIFAR10 dataset for this task.
- Understanding the architecture of the Swin Transformer and should be able to explain how it differs from previous vision transformers.