This repo contains structured data files of public activity on GitHub, aggregated by economy on a quarterly basis from 2020 onward.
Through offerings such as the GitHub Innovation Graph, we hope to inform research and public policy that could benefit from data on software development activity globally. We welcome developers, data analysts, researchers, policymakers, and all other interested stakeholders to explore the data, discover insights, and create visualizations, among much more.
The GitHub Innovation Graph provides data on the following areas:
See the datasheet for more information.
For an overview of the dataset, check out the charts and tables at the GitHub Innovation Graph website.
To dive deeper into the data and run your own analyses, feel free to fork this repo, explore the structured data files using the exploratory data analysis tool of your choice, and share your findings in our Discussions page.
The GitHub Innovation Graph dataset contains data on (1) public activity (2) on GitHub (3) aggregated by economy (4) on a quarterly basis. As such, this dataset would not be useful for understanding:
- private activity;
- outside of GitHub;
- at a more granular geographic level than economy; or
- at a more granular temporal level than quarterly.
Additionally, economies that have fewer developers on GitHub (which generally correlates with the population of an economy) will have less data associated with them in this dataset.
See the datasheet for more information on limitations.
We endeavor to publish as much data about public activity on GitHub as possible. However, the number of developers varies considerably by economy, and in some cases we decline to publish specific statistics for economies with fewer than 100 unique developers performing the relevant activity during the specified quarter out of an abundance of caution for developers’ privacy. You can find more information on our methodology in the datasheet.
Below a heatmap shows the count of economies reported for each data file by quarter:
You can also find the CSV for this heatmap in the data/representativeness_data directory.
We aggregate GitHub activity for economies using a definition broader than recognized UN member states. For example, AQ reports activity from developers stationed on Antarctica. Below a heatmap reports the count of data files for each economy by quarter:
You can also find the CSV for this heatmap in the data/representativeness_data directory.
This project is released under CC0-1.0.
See CODEOWNERS
See SUPPORT