ScrapLogGit2Net

A toolset for mining and visualizing Git repositories with Social Network Analysis. ScrapLogGit2Net allows its users to scrape, model, and visualize social networks based on common source-code file edits for any given Git repository.

The toolset was first developed by Jose Apolinário Teixeira during his doctoral studies with some guidance from Software Engineering scholars with expertise in the mining of software repositories. The tool merits by considering both individuals and organizations. The tool maps developers to organizations by the commit email address and external APIs such as the REST and GraphQL ones provided by GitHub.

Newer features allow you to:

Transform a network of individuals/individuals into a network of organizations/firms. The weighted edge between organizations is the sum of developers that worked together (i.e., co-edited the same source-code files).
Filter developers by email (handy to deal with bots that commit code)
Support for parallel edges (i.e., multiple edges between two nodes) that allow attributing weight to a cooperative relationship between two developers (e.g., the number of times they co-edited a source code file).
Visualize collaborations dynamically using NetworkX is a Python package and Matplotlib: Visualization with Python.

The code was also recently (i.e., Spring 2024) made compliant with the NetworkX is a Python package data structures and the Python 3.10 version runtime which simplified the original codebase.

For more information, see the publication and related website:

Teixeira, J., Robles, G., & González-Barahona, J. M. (2015). Lessons learned from applying social network analysis on an industrial Free/Libre/Open Source Software ecosystem. Journal of Internet Services and Applications, 6, 1-27. Available open-access at https://jisajournal.springeropen.com/articles/10.1186/s13174-015-0028-2.
Website http://users.abo.fi/jteixeir/OpenStackSNA/ with the obtained social networks and visualizations included in publications by the author on the OpenStack software ecosystem.
Website http://users.abo.fi/jteixeir/TensorFlowSNA/ with the obtained social networks and visualizations for the TensorFlow open and cooperative project (publication forthcoming).

Problem statement

Hard to figure out (visualize) who works with whom in complex software projects.

Vision

A world where software co-production analytics put social network visualizations at the side of standard quantitative statistical data. All towards the improved management and engineering of complex software projects orchestrated on Git.

Contributor Guide

Welcome to the project! We're excited to have you contribute to ScrapLogGit2Net. This guide will help you get started and ensure that your contributions are aligned with our project's standards.

Setting Up Your Development Environment

Fork the repository on GitHub. To fork the ScrapLogGit2Net repository on GitHub, go to https://github.com/jaateixeira/ScrapLogGit2Net/. In the top right corner of the page, you will see a "Fork" button. Click on this button, and GitHub will create a copy of the repository under your GitHub account. This forked repository is now independent of the original repository, allowing you to freely make changes without affecting the original project. You can then clone your forked repository to your local machine, make your changes, and push them back to your fork on GitHub.

Clone your forked repository to your local machine:

git clone https://github.com/jaateixeira/ScrapLogGit2Net.git

Install the dependencies See dependencies.sh

Coding Style

We adopt the PEP 8 style guide towards writing clean, readable Python code.

ScrapLogGit2Net started as a quick script for scientific research. Quickly obtaining and processing data for research papers was the main goal. This is not a large, clean, object-oriented, test-driven masterpiece. Still, good principles for Python programming apply: (1) Follow naming conventions, (2) type-check your function parameters, and be careful with the use of global variables. Variables should have descriptive names in snake_case for readability and consistency. Type hints should be used in function definitions to specify expected input and output types, enhancing code clarity and facilitating debugging. Accessing global variables should be minimized, as it can lead to code that is difficult to understand and maintain. Instead, use function parameters and return values to manage data flow whenever possible, promoting modularity and reducing side effects.

PEP 8 Coding Style Guide

PEP 8 is the style guide for Python code. It promotes readability and consistency in Python codebases. Following these guidelines will help improve the readability and maintainability of your code.

For the full PEP 8 documentation, please visit the official page: PEP 8 -- Style Guide for Python Code

Using flake8 for PEP 8 Compliance

flake8 is a tool that checks your Python code against the PEP 8 style guide. It helps identify and fix stylistic issues in your code.

Setting Up flake8

To install flake8, run the following command:

pip install flake8

To check your Python files for PEP 8 compliance, navigate to your project directory and run:

flake8 your_module.py

Allowed global variables

Please use the built-in globals() function to access the global scope’s name table. This signals developers that we are dealing with an important global variable that we should not mess up with.

TODO Table

File	Variable	Type	Description
scrapLog.py	G_network_Dev2Dev_singleEdges	nx.Graph()	Inter-individual network - edges are unweighted
scrapLog.py	G_network_Dev2Dev_multiEdges	nx.MultiGraph()	Inter-individual network - edges can be weighted
scrapLog.py	stats	Dictionary (immutable keys)	Keeps statistics of the scraping
formatFilterAndViz-nofi-GraphML.py	TODO	TODO	TODO
transform-nofi-2-nofo-GraphML.py	TODO	TODO	TODO
formatFilterAndViz-nofo-GraphML.py	TODO	TODO	TODO

Project Architecture

The ScrapLogGit2Net project leverages several powerful Python libraries to achieve its functionality. This section provides an overview of the key libraries used and how they fit into the project's architecture.

NumPy

NumPy is used for numerical operations, including the creation and manipulation of arrays and matrices. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

Usage: NumPy is typically used for data manipulation, mathematical calculations, and handling large datasets.
Documentation: NumPy Documentation

NetworkX

NetworkX is utilized for creating, manipulating, and studying the structure, dynamics, and functions of complex networks. It allows for the creation of both undirected and directed graphs, along with various algorithms to analyze them.

Usage: NetworkX is used for constructing and analyzing network graphs, which is a core part of the project's functionality.
Documentation: NetworkX Documentation

Matplotlib

Matplotlib is a plotting library used for creating static, interactive, and animated visualizations in Python. It is heavily used for generating plots, charts, and other graphical representations of data.

Usage: Matplotlib is used to visualize data, such as network graphs and other statistical plots.
Documentation: Matplotlib Documentation

Argparse

Rich

Rich is a library for rich text and beautiful formatting in the terminal. It is used to create aesthetically pleasing and user-friendly command-line interfaces with features like progress bars, tables, and syntax highlighting.

Usage: Rich is used to enhance the terminal output, making it more informative and visually appealing, especially for progress indicators and formatted output.
Documentation: Rich Documentation

Rich is used for:

Printing colored text in the console (e.g., debug information)
Printing text in Markdown format for better readability
Printing emojis that reflect the state or mood in the console
Inspect function to help you learn about objects
Colored Logging in integration with Loguru
Good-looking Tables
Progress Bars and Wait Spinners
Better Looking Errors, with colored stack traces

See video tutorial for more information.

Loguru

Loguru is a library designed for simple and effective logging. It simplifies the process of logging by providing an easy-to-use and powerful logging mechanism.

Usage: Loguru is used to handle logging throughout the project, ensuring that logs are informative, easy to read, and useful for debugging.
Documentation: Loguru Documentation

Integration and Workflow

The integration of these libraries follows a well-structured workflow:

Data Handling: NumPy is used to preprocess and handle data efficiently.
Network Construction: NetworkX is used to construct and manipulate network graphs from the data.
Visualization: Matplotlib is used to create visual representations of the network graphs and other data.
User Interface: Rich is used to create an enhanced command-line interface for better user interaction.
Logging: Loguru is used throughout the project to log important information, errors, and debugging details.

By leveraging these libraries, ScrapLogGit2Net achieves a robust, efficient, and user-friendly architecture that simplifies complex data operations, network analysis, visualization, and interaction.

For more detailed guidelines on how to contribute to the project, please refer to the rest of the CONTRIBUTING.md file.

Thank you for your contributions and helping improve ScrapLogGit2Net!

Fork the Repository

Go to the ScrapLogGit2Net repository on GitHub.
Click the Fork button in the upper right corner of the page. This will create a copy of the repository under your GitHub account.

Clone the Forked Repository

Open your terminal or command prompt.

Clone your forked repository to your local machine:

git clone https://github.com/your-username/ScrapLogGit2Net.git

Sync your fork:

git checkout main
git pull upstream main
git push origin main

Create a Feature Branch

Create a new branch for your feature/bugfix:

git checkout -b feature-branch

Make your changes and commit them:

git add .
git commit -m "Description of your changes"

Push your branch to GitHub:

git push origin feature-branch

Submitting a Pull Request

Go to your fork on GitHub.
Click on the "New Pull Request" button.
Select the base fork and branch (our repository's main branch) and compare it with your feature-branch.
Create the pull request with a clear and detailed description of your changes.

Code Review Process

Once you submit your pull request, it will be reviewed by the project maintainers. Here’s what to expect:

Initial Review: We will review your code for adherence to the coding standards and overall implementation.
Feedback: You might receive feedback or requests for changes.
Approval: Once your pull request passes review, it will be merged into the main branch.

Please be responsive to feedback and make the necessary changes promptly to expedite the review process.

Contact

Jose Teixeira [email protected]

Easy Hacks

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 518 Commits
.vscode		.vscode
docs		docs
paper-specific-analysis		paper-specific-analysis
project-specific-analysis		project-specific-analysis
test-configurations		test-configurations
test-data		test-data
test-scripts		test-scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.txt		TODO.txt
compareGraphMLNetworks.py		compareGraphMLNetworks.py
deanonymize_github_users.py		deanonymize_github_users.py
dependencies.sh		dependencies.sh
exportGraphMLformat.py		exportGraphMLformat.py
exportLogData.py		exportLogData.py
formatAndReport-nofi-GraphML.py		formatAndReport-nofi-GraphML.py
formatAndReport-nofo-GraphML.py		formatAndReport-nofo-GraphML.py
formatFilterAndViz-nofi-GraphML.py		formatFilterAndViz-nofi-GraphML.py
formatFilterAndViz-nofo-GraphML.py		formatFilterAndViz-nofo-GraphML.py
github_cache.sqlite		github_cache.sqlite
lessons-learned.txt		lessons-learned.txt
networkMeasures.py		networkMeasures.py
scrapLog.py		scrapLog.py
test-nofi-2-nofo-transformer.sh		test-nofi-2-nofo-transformer.sh
transform-nofi-2-nofo-GraphML.py		transform-nofi-2-nofo-GraphML.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrapLogGit2Net

Problem statement

Vision

Contributor Guide

Table of Contents

Setting Up Your Development Environment

Coding Style

PEP 8 Coding Style Guide

Using flake8 for PEP 8 Compliance

Setting Up flake8

To check your Python files for PEP 8 compliance, navigate to your project directory and run:

Allowed global variables

Project Architecture

NumPy

NetworkX

Matplotlib

Argparse

Rich

Loguru

Integration and Workflow

Git Workflow

Fork the Repository

Clone the Forked Repository

Create a Feature Branch

Submitting a Pull Request

Code Review Process

Contact

Easy Hacks

About

Releases

Packages

Languages

License

SabbirGit/ScrapLogGit2Net

Folders and files

Latest commit

History

Repository files navigation

ScrapLogGit2Net

Problem statement

Vision

Contributor Guide

Table of Contents

Setting Up Your Development Environment

Coding Style

PEP 8 Coding Style Guide

Using flake8 for PEP 8 Compliance

Setting Up flake8

To check your Python files for PEP 8 compliance, navigate to your project directory and run:

Allowed global variables

Project Architecture

NumPy

NetworkX

Matplotlib

Argparse

Rich

Loguru

Integration and Workflow

Git Workflow

Fork the Repository

Clone the Forked Repository

Create a Feature Branch

Submitting a Pull Request

Code Review Process

Contact

Easy Hacks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages