These Datasets by Dawn Foster are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The goal is to allow people to use these datasets freely with attribution.
If you find any issues with these datasets, please feel free to file an issue in this GitHub repo. Please do not send PRs against the csv files, since those are auto generated using scripts.
These datasets use Kubernetes OWNERS files along with CNCF Affiliation data to gather information about leads, approvers, and reviewers by SIG / Subproject (where available) and including their corporate affiliations for most people. The dataset also includes the OWNERS file where the information was obtained to allow for data validation.
Uses only the OWNERS files found in sigs.yaml plus the OWNERS_ALIASES file containing leads. For details, see: https://github.com/kubernetes/community/blob/master/sigs.yaml https://github.com/kubernetes/kubernetes/blob/master/OWNERS_ALIASES
Caveat: sigs.yaml is always a little out of date. See this GitHub issue for more details.
owners_data_2022-04-18_xtra_owners.csv
Contains the above data plus extra OWNERS files found using the Github search API generated with get_more_owners.py
in this repo using the kubernetes GitHub org.
Caveat: This probably includes OWNERS files that are in deprecated bits of the code that are no longer in use, and the GitHub search API is a bit flaky, so it's also likely missing some OWNERS files.
You can run the owners_details.py
program in this repo to generate your own, up to date dataset.
This dataset uses the Istio teams.yaml file along with CNCF Affiliation data and the GitHub API for emails listed on GitHub profiles to gather information about maintainers and other leadership positions.
owners_data_istio_2022-04-25.csv
There is a Jupyter Notebook with some basic analysis using this Istio dataset.
You can run the istio_owners.py
program in this repo to generate your own, up to date dataset.