ALTeGraD 2023 Data Challenge

This repository contains the work of Team "Queniaric," consisting of Hugo Queniat and Simon Queric, both students from Télécom Paris and the Master MVA organized by ENS Paris Saclay, for the ALTeGraD 2023 Data Challenge. The challenge details can be found here.

Over the public leaderboard, which accounts for around 50% of the test set, our best model achieved a score of 0.8993. This score was achieved using a soft voting classifier, which combines the predictions of several models trained on the same data. The models used in the soft voting classifier are the following and all follow similar Attentive structures:

model_name= 'distilbert-base-uncased-finetuned-sst-2-english'
ModelGATPerso(model_name, n_in=300, nout=768, nhid=1024, n_heads=8, dropout=0.6)
ModelGATwMLP(model_name, nout=768, nhid=768, n_heads=4, n_in=300, dropout=0.75)
ModelTransformer(model_name, n_in=300, nout=768, nhid=768, n_heads=4, dropout=0.6)
ModelTransformerv2(model_name, n_in=300, nout=768, nhid=100, n_heads=2, dropout=0.75)

As required by the competition, the code is entirely reproducible and the models are trained on the provided data, meaning that you can reproduce our experiments and results on your own machine by following the instructions below.

Installation

Prerequisites

Python (version 3.10)
Pip (package installer for Python)

Instructions

Clone the repository to your local machine

git clone https://github.com/hugo-qea/ALTeGraD-2023-Data-Challenge.git

Navigate to the project directory
```
cd ALTeGraD-2023-Data-Challenge
```
Install the required dependencies
```
pip3 install -r requirements.txt
```
The requirements.txt file contains a list of dependencies and their versions required for this project. The command above installs these dependencies in your Python environment.
Download the competition data from Kaggle.
Extract the downloaded data into the ./data directory of this project.
```
unzip -q altegrad-2023-data-challenge.zip -d ./data
```

Usage

Now that you have installed the dependencies, you can run the any model you have setup using the following command.

python3 source/main.py

To follow the training of the model, you can also run the following command :

tensorboard --logdir=source/runs

The submissions will be stored in the ./submissions directory while a summary of the models, its training and the saves of its weights will be stored in the ./saves directory. There are also several other files you can run in case you just want to create a submission after setuping the file :

python3 source/submit.py

You can also train a soft voting classifier where all classifiers are trained with the exact same Text Encoder and through the same loops :

python3 source/voting.py

Finally, you can also create a submission created from the averaging of the similarities found by distinct models :

python3 source/submitSoftVoting.py

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
data		data
docs		docs
saves		saves
source		source
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALTeGraD 2023 Data Challenge

Installation

Prerequisites

Instructions

Usage

About

Releases

Packages

hugo-queniat/ALTeGraD-2023-Data-Challenge

Folders and files

Latest commit

History

Repository files navigation

ALTeGraD 2023 Data Challenge

Installation

Prerequisites

Instructions

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages