This project aims to collect commit KLOC of open source projects and store the data in MongoDB with the help of Apache Spark. The Inital goal is to have the code working with a smaller open source project and then move on to larger projects.
- Ingestion Service works as follows
- Get the username, repository name and branch name from the frontend.
- Clone the repository in a temporary location, which will be deleted once the data is received
- Create folders for storing the respective commit object - the folder name is the commit sha for that commit
- Extract cloc, commit date, commit SHA and other required info
- Calculate the Range*LOC for the commits per file
- calculate Defect Density, Spoilage and Productivity