You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This dataset is designed to map the relationships between npm package registries and their corresponding open source repositories. It aims to address the challenges posed by incomplete or outdated metadata in npm registries due to individual contributions and repository name changes, facilitating accurate prediction and mapping of these networks.
Contents
Two networks cannot be fully mapped, but subsets of the two networks can have corresponding relationships, and mapping can be done based on the repo_url field in the npm package info.
Open Source Repository Collaboration Network:
Nodes: Represent individual developers or teams. Edges: Represent collaborative relationships, including contributions like commits, reviews, and discussions. Attributes: Include metrics such as number of contributions, nature of the contributions (code, documentation, etc.), and duration of collaboration.
npm Artifact Library Dependency Network:
Nodes: Represent individual npm packages. Edges: Represent dependency links, where one package is dependent on another. Attributes: Include version numbers, frequency of updates, and popularity metrics (downloads, description).
Data Collection Methods:
Data for collaboration networks is collected from public APIs of popular source code hosting platforms like GitHub, GitLab, and Bitbucket. You can also directly download the sample dataset provided by opendigger, and it is recommended to compare one year of behavioral data. https://github.com/X-lab2017/open-digger/blob/master/sample_data/README.md
Data for npm artifact library dependency networks is extracted from the npm registry's public API, focusing on the package.json files to map dependencies. You can crawl through npm.org. Here are the global npm libraries and their dependencies provided :
Objective
This dataset is designed to map the relationships between npm package registries and their corresponding open source repositories. It aims to address the challenges posed by incomplete or outdated metadata in npm registries due to individual contributions and repository name changes, facilitating accurate prediction and mapping of these networks.
Contents
Two networks cannot be fully mapped, but subsets of the two networks can have corresponding relationships, and mapping can be done based on the
repo_url
field in the npm package info.Open Source Repository Collaboration Network:
Nodes: Represent individual developers or teams.
Edges: Represent collaborative relationships, including contributions like commits, reviews, and discussions.
Attributes: Include metrics such as number of contributions, nature of the contributions (code, documentation, etc.), and duration of collaboration.
npm Artifact Library Dependency Network:
Nodes: Represent individual npm packages.
Edges: Represent dependency links, where one package is dependent on another.
Attributes: Include version numbers, frequency of updates, and popularity metrics (downloads, description).
Data Collection Methods:
Data for collaboration networks is collected from public APIs of popular source code hosting platforms like GitHub, GitLab, and Bitbucket. You can also directly download the sample dataset provided by opendigger, and it is recommended to compare one year of behavioral data. https://github.com/X-lab2017/open-digger/blob/master/sample_data/README.md
Data for npm artifact library dependency networks is extracted from the npm registry's public API, focusing on the package.json files to map dependencies. You can crawl through npm.org. Here are the global npm libraries and their dependencies provided :
npm dependencies: npm_dependencies.zip 7.15M
npm packages:npm_packages.zip 69.28M
Potential Use Cases:
Format
The dataset is provided in formats suitable for machine learning and network analysis, such as CSV for tabular data and JSON for structured metadata.
Output results
A complete dataset containing open-source repository collaboration networks and npm artifact library dependency networks.
Usage instructions for the dataset, detailing data items, sources, collection, and processing methods.
Data analysis report summarizing key findings and insights.
The text was updated successfully, but these errors were encountered: