Skip to content

This project covers the concepts of : Topic Modelling using LDA Clustering through tf-idf and BoW Dimension reduction through t-SNE and truncated SVD Classification and Regression algorithms

License

Notifications You must be signed in to change notification settings

MortadhaMannai/Natural-Language-Processing-Analyzing-GitHub-Pull-Requests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Processing : Analyzing GitHub Pull Requests

Context

The dataset github_comments.tsv that carries 4000 comments that were published on pull requests on Github by developer teams.

Here is an explanation of the table columns:

  • Comment: the comment made by a developer on the pull request.
  • Comment_date: date at which the comment was published
  • Is_merged: shows whether the pull request on which the comment was made has been accepted (therefore merged) or rejected.
  • Merged_at: date at which the pull request was merged (if accepted).
  • Request_changes: each comment is labelled either 1 or 0: if it’s labelled as 1 if the comment is a request for change in the code. If not, it’s labelled as 0.

The goal is to dig deeper into the nature of blockers and analyze the requests for change. If possible, try to answer the following questions:

  • What are the most common problems that appear in these comments?
  • Can we cluster the problems by topic/problem type?
  • How long is the resolution time after a change was requested?

Content

  • Report.pdf is a PDF report that details my approach.
  • images is a collection of the images that I included in my report.
  • TopicModelling.ipynb is a Jupyter Notebook in which I have do my analysis in Python.
  • corpus.pkl, dictionary.gensim, and all files starting with model… are files generated in the notebook that I use to avoid re-running some steps.

Theory covered

This project covers the concepts of :

  • Topic Modelling using LDA
  • Clustering through tf-idf and BoW
  • Dimension reduction through t-SNE and truncated SVD
  • Classification and Regression algorithms

About

This project covers the concepts of : Topic Modelling using LDA Clustering through tf-idf and BoW Dimension reduction through t-SNE and truncated SVD Classification and Regression algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published