Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSS101] Task 6:Open source repository collaboration network and npm artifact library dependency network mapping dataset #62

Open
bifenglin opened this issue May 14, 2024 · 0 comments

Comments

@bifenglin
Copy link
Collaborator

bifenglin commented May 14, 2024

Objective

This dataset is designed to map the relationships between npm package registries and their corresponding open source repositories. It aims to address the challenges posed by incomplete or outdated metadata in npm registries due to individual contributions and repository name changes, facilitating accurate prediction and mapping of these networks.

Contents

Two networks cannot be fully mapped, but subsets of the two networks can have corresponding relationships, and mapping can be done based on the repo_url field in the npm package info.

Open Source Repository Collaboration Network:

Nodes: Represent individual developers or teams.
Edges: Represent collaborative relationships, including contributions like commits, reviews, and discussions.
Attributes: Include metrics such as number of contributions, nature of the contributions (code, documentation, etc.), and duration of collaboration.

npm Artifact Library Dependency Network:

Nodes: Represent individual npm packages.
Edges: Represent dependency links, where one package is dependent on another.
Attributes: Include version numbers, frequency of updates, and popularity metrics (downloads, description).

Data Collection Methods:

Data for collaboration networks is collected from public APIs of popular source code hosting platforms like GitHub, GitLab, and Bitbucket. You can also directly download the sample dataset provided by opendigger, and it is recommended to compare one year of behavioral data. https://github.com/X-lab2017/open-digger/blob/master/sample_data/README.md

Data for npm artifact library dependency networks is extracted from the npm registry's public API, focusing on the package.json files to map dependencies. You can crawl through npm.org. Here are the global npm libraries and their dependencies provided :

npm dependencies: npm_dependencies.zip 7.15M
npm packages:npm_packages.zip 69.28M

Potential Use Cases:

  • Get metrics from two networks:Degree、Clustering Coefficient、Average Path Length、Diameter、Centrality、Density、Modularity、Connected Components etc.
  • Visualize two network mappings
  • Studying the resilience of software ecosystems by examining dependency chains and their impacts on software reliability.
  • Evaluating trends in software development practices over time.

Format

The dataset is provided in formats suitable for machine learning and network analysis, such as CSV for tabular data and JSON for structured metadata.

Output results

  • A complete dataset containing open-source repository collaboration networks and npm artifact library dependency networks.

  • Usage instructions for the dataset, detailing data items, sources, collection, and processing methods.

  • Data analysis report summarizing key findings and insights.

@PureNatural PureNatural changed the title [OSS101] Open source repository collaboration network and npm artifact library dependency network mapping dataset [OSS101] Task 6:Open source repository collaboration network and npm artifact library dependency network mapping dataset May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant