This project aims to detect football passes—including throw-ins and crosses—and challenges in original Bundesliga matches using a computer vision model. It was inspired by the kaggle competition: DFL - Bundesliga Data Shootout. The project is still a work in progress.
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks.
│
├── src <- Source code for use in this project.
│ │
│ ├── data <- Code to process data
│ │
│ ├── training <- Code to train models and then use trained models to make
│ │ predictions
│ │
│ └── models <- Pytorch models
│
├── environment.yml <- Requirements for conda environment. Allows to easily install poetry and CUDA
|
├── README.md <- The top-level README for developers using this project.
│
├── pyproject.toml <- Project settings and poetry dependencies
|
└── tox.ini <- tox file with settings for running tox
- Install
conda
in your system - Create environment:
If you already have the
conda env create --file environment.yml
dfl
environment, you have to first remove it:conda remove --name dfl --all
- Activate environment:
conda activate dfl
- Install dependencies:
poetry install
These steps have to be executed only once. To use this environment later just activate it:
conda activate dfl
The objective of this project is to create and train a model that can accurately detect one of four specified football events - challenge, throw-in, play, or no event - using a sequence of video frames. Once the model is developed, it can be used to analyze a recording of a football match and identify the exact time when each event occurred.
To prepare the dataset, short clips from Bundesliga matches are extracted based on given annotations. All the code for this step is located under src/data/
. Each clip is saved to a separate file, and all labels are saved in a CSV file. To execute the code, run the src/data/make_dataset.py
script with the following command:
python -m src.data.make_dataset extract --frame-size 960 540 --window-size=32
This script extracts 32 frames around each event and resizes them to 960x540. It also samples clips with no events from the recordings. To view all available options, run the script with the --help
flag.
The next step is to split the dataset into train and test sets. It can be achieved using the below command which splits the labels.csv
file into train_labels.csv
and test_labels.csv
:
python -m src.data.make_dataset split
To view all available options, run the script with the --help
flag.
2 architectures were tested as the classification model:
- ResNet 3D,
- ResNet + LSTM.
Their code can be found in src/models/models.py
. To simplify the training process on the prepared dataset, the models were wrapped with a Lightning Module. The training scripts can be found in src/training/train_r3d.py
, and an example of their execution is shown below:
python -m src.training.train_r3d.py --max-epochs=10 --video-size 140 250 --batch-size=8
This command trains the model for 10 epochs, scales each frame to 140x250, and sets the batch size to 8. To see all available options, use the --help
flag when running the script.