This project investigates and compares the question-answering capabilities of BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-Trained Transformer) models, using the SQuAD 2.0 dataset.
-
BERT Model Fine-tuning and Evaluation Notebook: Explore how we fine-tuned the BERT model on the SQuAD dataset and evaluated its performance. View Notebook
-
GPT Model Fine-tuning and Evaluation Notebook: See how the GPT model is fine-tuned for the QA task and its subsequent evaluation. View Notebook
- George Kaceli
- Nisarg Patel
- Jianguo Lu
We aim to explore the intricacies of BERT and GPT models by fine-tuning them on the SQuAD 2.0 dataset. This exploration includes an introduction to transformer architectures, the fine-tuning process, and an optimization of the models for the QA task. Through this process, we intend to understand the strengths and limitations of each model in the context of natural language understanding and information retrieval.
BERT and GPT represent significant advancements in NLP and have set benchmarks across various tasks. By comparing their performance on a common dataset, we seek to gain insights into their behaviour in question-answering scenarios, their error patterns, and the practical applications of each model's strengths.
- A fine-tuned BERT model for question answering.
- A fine-tuned GPT model for question answering, alongside additional applications.
- A comparative analysis of BERT and GPT performance, as well as the incorporation of the LLaMA model.
- A presentation summarizing the findings and practical insights derived from the comparison of these models.
- Hardware: Standard personal computers for implementation and testing.
- Software: Python 3, PyTorch, TensorFlow, Hugging Face Transformers library, Jupyter Notebook.
- Knowledge: Proficiency in Python and familiarity with NLP concepts and machine learning frameworks.
View our fine-tuning and evaluation processes in these publicly available Colab notebooks:
Key project risks include technical complexities, data quality and diversity, time management, and model performance expectations. Strategies for mitigation involve leveraging existing libraries, ensuring robust and varied datasets, incorporating buffer times, and conducting thorough validation experiments.
- March to April: Dataset preparation, BERT model loading and fine-tuning, evaluation, and prediction testing.
- September: Loading and fine-tuning of the GPT model, including exploratory work with the LLaMA model.
- October: Evaluation of the GPT model on various metrics and prediction testing.
- November: Comparative analysis of BERT and GPT model findings.
- December: Final analysis, presentation preparation, and highlighting further research opportunities.
We encourage contributions from the community.