These look to see how many developers work on projects in different ecosystems (defined as projects that submit their packages to language-specific repositories like NPM and CRAN).
Code assumes a file in data/common/-repos.txt for each ecosystem of interest, listing all repositories in that ecosystem using WoC's naming scheme for repository names. Most of these lists we got from libraries.io.
Reads all commits for each project in an ecosystem and writes authact..e.csv that has author, ecosystem, package, and the date for each commit
Optional arguments: list of ecosystems; if omitted, uses a hard-coded list
ex: author-activity.py NPM CRAN Go
Sample output:
author,eco,package,epoch,tz,core,files
<email redacted>,Lua,mah0x211_lua-tointeger,1508289637.0,+0900,1,
<email redacted>,Lua,mah0x211_lua-tointeger,1508289514.0,+0900,1,
Reads a bunch of authact files and creates a single list of unique authors (authors.csv)
ex: extract-authors.py authact.*.e.csv
See https://github.com/ssc-oscar/titan for software that creates the mapping.
To translate this mapping into a pickled python dictionary mapping various author emails to canonical author emails:
ex: make-alias-dictionary.py <matchfile>.tgz data/mapping.dict
Reads from authact files and author alias lookup dictionaries, and writes a unified author activity file that summarizes all of a person's activities over multiple ecosystems
The author map file should be a pickled dictionary mapping an email address to a canonical email address for that person
ex: substitute-authors.py authormap.dict.pickle authact*.csv
Writes to unified.authors.csv.gz
Sample output:
author,package,epoch,eco
<redacted email>,mah0x211_lua-tointeger,1508289637.0,Lua
<redacted email>,mah0x211_lua-tointeger,1508289514.0,Lua
Tally a measure of influence among pairs of ecosystems by counting people who have migrated from one to another
ex: tally-ecosystem-influences.py
Sample output:
year,influence,proportion,raw_influence,raw_scale,commit_count
2016,Lua,6.560709534926016e-10,3061,42270547,110376
2016,Hex,9.05928449583639e-06,42267486,42270547,110376
2017,Lua,2.3208423734345453e-09,13421,48532270,119154
Tally a measure of influence from all other ecosystems on each package in each ecosystem
Writes some pickled files (using pypi package dill) of counts useful for later calculation: cmtcount.dill, cmtcountp.dill, infl.dill
ex: tally-package-influences.py unified.authors.csv.gz
Sample output: graph_auth_pkgs_Lua.csv.gz:
year,repo,influence,proportion,raw_influence,raw_scale,commit_count_repo,commit_
count_eco,current_ecosystem_use
2017,stefano-m_lua-media_player,Lua,0.05,1200,1200,20,21498,2960,1
2017,stefano-m_lua-media_player,Hex,0.0,0,1200,20,21498,2960,1
2011,mkottman_luaspell,Lua,1.0,19,19,1,4681,9,1
Sample output: authtotals.csv.gz
author,year,ecosystem,commits
<email redacted>,2015,Lua,0
<email redacted>,2016,Hex,0
<email redacted>,2017,Hex,17
<email redacted>,2018,Hex,7
Reads the infl.dill file Creates files for each ecosystem pair summarizing influence from one to the other, listing the influence from a single package to another ecosystem.
ex: extract-specific-influence.py
Read from infl files and summarize over the destination, with and without normalization. Creates: infl_total and infl_normalized files
ex: calc-net-influence.py
data/common/-repos.txt List of repos for each ecosystem