Skip to content

jaateixeira/ScrapLogGit2Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScrapLogGit2Net

A toolset for mining and visualizing Git repositories with Social Network Analysis. ScrapLogGit2Net allows its users to scrape, model, and visualize social networks based on common source-code file edits for any given Git repository.

The toolset was first developed by Jose Apolinário Teixeira during his doctoral studies with some guidance from Software Engineering scholars with expertise in the mining of software repositories. The tool merits by considering both individuals and organizations. The tool maps developers to organizations by the commit email address and external APIs such as the REST and GraphQL ones provided by GitHub.

Newer features allow you to:

  • Transform a network of individuals/individuals into a network of organizations/firms. The weighted edge between organizations is the sum of developers that worked together (i.e., co-edited the same source-code files).
  • Filter developers by email (handy to deal with bots that commit code)
  • Support for parallel edges (i.e., multiple edges between two nodes) that allow attributing weight to a cooperative relationship between two developers (e.g., the number of times they co-edited a source code file).
  • Visualize collaborations dynamically using NetworkX is a Python package and Matplotlib: Visualization with Python.

The code was also recently (i.e., Spring 2024) made compliant with the NetworkX is a Python package data structures and the Python 3.10 version runtime which simplified the original codebase.

For more information, see the publication and related website:

Problem statement

Hard to figure out (visualize) who works with whom in complex software projects.

Vision

A world where software co-production analytics put social network visualizations at the side of standard quantitative statistical data. All towards the improved management and engineering of complex software projects orchestrated on Git.

Contributor Guide

Welcome to the project! We're excited to have you contribute to ScrapLogGit2Net. This guide will help you get started and ensure that your contributions are aligned with our project's standards.

Table of Contents

  1. Setting Up Your Development Environment
  2. Coding Style
  3. Architecture
  4. Logging
  5. Progress Bars
  6. Git Workflow
  7. Easy Hacks
  8. Contact

Setting Up Your Development Environment

  1. Fork the repository on GitHub. To fork the ScrapLogGit2Net repository on GitHub, go to https://github.com/jaateixeira/ScrapLogGit2Net/. In the top right corner of the page, you will see a "Fork" button. Click on this button, and GitHub will create a copy of the repository under your GitHub account. This forked repository is now independent of the original repository, allowing you to freely make changes without affecting the original project. You can then clone your forked repository to your local machine, make your changes, and push them back to your fork on GitHub.

  2. Clone your forked repository to your local machine:

    git clone https://github.com/jaateixeira/ScrapLogGit2Net.git
  3. Install the dependencies See dependencies.sh

Coding Style

We adopt the PEP 8 style guide towards writing clean, readable Python code.

ScrapLogGit2Net started as a quick script for scientific research. Quickly obtaining and processing data for research papers was the main goal. This is not a large, clean, object-oriented, test-driven masterpiece. Still, good principles for Python programming apply: (1) Follow naming conventions, (2) type-check your function parameters, and be careful with the use of global variables. Variables should have descriptive names in snake_case for readability and consistency. Type hints should be used in function definitions to specify expected input and output types, enhancing code clarity and facilitating debugging. Accessing global variables should be minimized, as it can lead to code that is difficult to understand and maintain. Instead, use function parameters and return values to manage data flow whenever possible, promoting modularity and reducing side effects.

PEP 8 Coding Style Guide

PEP 8 is the style guide for Python code. It promotes readability and consistency in Python codebases. Following these guidelines will help improve the readability and maintainability of your code.

For the full PEP 8 documentation, please visit the official page: PEP 8 -- Style Guide for Python Code

Using flake8 for PEP 8 Compliance

flake8 is a tool that checks your Python code against the PEP 8 style guide. It helps identify and fix stylistic issues in your code.

Setting Up flake8

To install flake8, run the following command:

pip install flake8

To check your Python files for PEP 8 compliance, navigate to your project directory and run:

flake8 your_module.py

Allowed global variables

Please use the built-in globals() function to access the global scope’s name table. This signals developers that we are dealing with an important global variable that we should not mess up with.

TODO Table

File Variable Type Description
scrapLog.py G_network_Dev2Dev_singleEdges nx.Graph() Inter-individual network - edges are unweighted
scrapLog.py G_network_Dev2Dev_multiEdges nx.MultiGraph() Inter-individual network - edges can be weighted
scrapLog.py stats Dictionary (immutable keys) Keeps statistics of the scraping
formatFilterAndViz-nofi-GraphML.py TODO TODO TODO
transform-nofi-2-nofo-GraphML.py TODO TODO TODO
formatFilterAndViz-nofo-GraphML.py TODO TODO TODO

Project Architecture

The ScrapLogGit2Net project leverages several powerful Python libraries to achieve its functionality. This section provides an overview of the key libraries used and how they fit into the project's architecture.

ScrapLogGit2Net Architecture Diagram

NumPy

NumPy is used for numerical operations, including the creation and manipulation of arrays and matrices. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

  • Usage: NumPy is typically used for data manipulation, mathematical calculations, and handling large datasets.
  • Documentation: NumPy Documentation

NetworkX

NetworkX is utilized for creating, manipulating, and studying the structure, dynamics, and functions of complex networks. It allows for the creation of both undirected and directed graphs, along with various algorithms to analyze them.

  • Usage: NetworkX is used for constructing and analyzing network graphs, which is a core part of the project's functionality.
  • Documentation: NetworkX Documentation

Matplotlib

Matplotlib is a plotting library used for creating static, interactive, and animated visualizations in Python. It is heavily used for generating plots, charts, and other graphical representations of data.

  • Usage: Matplotlib is used to visualize data, such as network graphs and other statistical plots.
  • Documentation: Matplotlib Documentation

Argparse

Rich

Rich is a library for rich text and beautiful formatting in the terminal. It is used to create aesthetically pleasing and user-friendly command-line interfaces with features like progress bars, tables, and syntax highlighting.

  • Usage: Rich is used to enhance the terminal output, making it more informative and visually appealing, especially for progress indicators and formatted output.
  • Documentation: Rich Documentation

Rich is used for:

  • Printing colored text in the console (e.g., debug information)
  • Printing text in Markdown format for better readability
  • Printing emojis that reflect the state or mood in the console
  • Inspect function to help you learn about objects
  • Colored Logging in integration with Loguru
  • Good-looking Tables
  • Progress Bars and Wait Spinners
  • Better Looking Errors, with colored stack traces

See video tutorial for more information.

Loguru

Loguru is a library designed for simple and effective logging. It simplifies the process of logging by providing an easy-to-use and powerful logging mechanism.

  • Usage: Loguru is used to handle logging throughout the project, ensuring that logs are informative, easy to read, and useful for debugging.
  • Documentation: Loguru Documentation

Integration and Workflow

The integration of these libraries follows a well-structured workflow:

  1. Data Handling: NumPy is used to preprocess and handle data efficiently.
  2. Network Construction: NetworkX is used to construct and manipulate network graphs from the data.
  3. Visualization: Matplotlib is used to create visual representations of the network graphs and other data.
  4. User Interface: Rich is used to create an enhanced command-line interface for better user interaction.
  5. Logging: Loguru is used throughout the project to log important information, errors, and debugging details.

By leveraging these libraries, ScrapLogGit2Net achieves a robust, efficient, and user-friendly architecture that simplifies complex data operations, network analysis, visualization, and interaction.

For more detailed guidelines on how to contribute to the project, please refer to the rest of the CONTRIBUTING.md file.

Thank you for your contributions and helping improve ScrapLogGit2Net!

Git Workflow

  1. Fork the Repository
  2. Clone the Forked Repository
  3. Create a Feature Branch
  4. Make Your Changes
  5. Commit Your Changes
  6. Push Your Feature Branch
  7. Create a Pull Request
  8. Respond to Feedback

Fork the Repository

  1. Go to the ScrapLogGit2Net repository on GitHub.
  2. Click the Fork button in the upper right corner of the page. This will create a copy of the repository under your GitHub account.

Clone the Forked Repository

  1. Open your terminal or command prompt.
  2. Clone your forked repository to your local machine:
    git clone https://github.com/your-username/ScrapLogGit2Net.git

Sync your fork:

git checkout main
git pull upstream main
git push origin main

Create a Feature Branch

Create a new branch for your feature/bugfix:

git checkout -b feature-branch

Make your changes and commit them:

git add .
git commit -m "Description of your changes"

Push your branch to GitHub:

git push origin feature-branch

Submitting a Pull Request

  1. Go to your fork on GitHub.
  2. Click on the "New Pull Request" button.
  3. Select the base fork and branch (our repository's main branch) and compare it with your feature-branch.
  4. Create the pull request with a clear and detailed description of your changes.

Code Review Process

Once you submit your pull request, it will be reviewed by the project maintainers. Here’s what to expect:

  1. Initial Review: We will review your code for adherence to the coding standards and overall implementation.
  2. Feedback: You might receive feedback or requests for changes.
  3. Approval: Once your pull request passes review, it will be merged into the main branch.

Please be responsive to feedback and make the necessary changes promptly to expedite the review process.

Contact

Jose Teixeira [email protected]

Easy Hacks

TODO

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published