Skip to content

Latest commit

 

History

History

datasets

Datasets

Creative Commons License
These Datasets by Dawn Foster are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The goal is to allow people to use these datasets freely with attribution.

If you find any issues with these datasets, please feel free to file an issue in this GitHub repo. Please do not send PRs against the csv files, since those are auto generated using scripts.

Kubernetes OWNERS Datasets

These datasets use Kubernetes OWNERS files along with CNCF Affiliation data to gather information about leads, approvers, and reviewers by SIG / Subproject (where available) and including their corporate affiliations for most people. The dataset also includes the OWNERS file where the information was obtained to allow for data validation.

owners_data_2022-04-18.csv

Uses only the OWNERS files found in sigs.yaml plus the OWNERS_ALIASES file containing leads. For details, see: https://github.com/kubernetes/community/blob/master/sigs.yaml https://github.com/kubernetes/kubernetes/blob/master/OWNERS_ALIASES

Caveat: sigs.yaml is always a little out of date. See this GitHub issue for more details.

owners_data_2022-04-18_xtra_owners.csv

Contains the above data plus extra OWNERS files found using the Github search API generated with get_more_owners.py in this repo using the kubernetes GitHub org.

Caveat: This probably includes OWNERS files that are in deprecated bits of the code that are no longer in use, and the GitHub search API is a bit flaky, so it's also likely missing some OWNERS files.

Generating updates

You can run the owners_details.py program in this repo to generate your own, up to date dataset.

Istio Leadership Dataset

This dataset uses the Istio teams.yaml file along with CNCF Affiliation data and the GitHub API for emails listed on GitHub profiles to gather information about maintainers and other leadership positions.

owners_data_istio_2022-04-25.csv

There is a Jupyter Notebook with some basic analysis using this Istio dataset.

Generating updates

You can run the istio_owners.py program in this repo to generate your own, up to date dataset.