-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathVisualizations.txt
49 lines (33 loc) · 2.63 KB
/
Visualizations.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Questions Tim suggested and some comments:
How successful are commercial repositories compared to private repositories -> Don't know how to determine if the repo is commercial?
How to make a GitHub repository more successful -> Maybe try something without a model? Can we see some trends of popular contributors connecting
popular GitHub repos? Is that a reliable metric? That would run in parallel to how one can make it in the industry (-> If you know somebody who already made it big)
-> Look at the same creation year, same topic/keyword -> Very tough question, but high impact
General purpose system that allows general exploration -> Need to argue why our system is good/what questions it can answer.
-----------------------------------------------
How do we evaluate our visualization platform
-> formulate some questions by ourselves
-> build a system that can answer most if not all of them
-> We also need to explain the whole process, so that they know how challenging the project
-> Include that we had to pivot our project right before the first check in and why
-> What were the challenges with the new approach, scraping etc., why did we choose the visualization we did
-> 10-15min presentation
-----------------------------------------------
Brainstorming:
Question to answer:
- Does a successful contributor automatically make the repository successful (what's the metric of success)
- Can we group related topics together efficiently -> have a multilevel graph, KNN style
- Size of node: Popularity of Repo, width of edges: Popularity of contributors
-> Do popular contributors only work on popular repos
-> Filter the displayed repositors by number of popular contributors and then display them
- Is GitHub's keyword assignement accurate w.r.t. connectivity of the repos/topics
- Repo to repo, color by keyword -> are there distinct groups of colors, or mixed
Visualization ideas:
- topic to topic dataframe with # of shared repositories -> first and easiest one
- topic to topic dataframe with # of shared contributors -> slider to change the threshold of needed shared contributors -> How do the connections look like
color coding topic bubbles by language used (%) -> slider to change creation date of the repos
--> See changes in language distribution -> Can answer the question of how fast languages spread in the coding world & what their impact is?
Evaluation:
- Check if the questions can be answered and maybe include the graphs for each of the questions
- If time: Answer the questions with traditional visualizations (for example Emma's visualization of Jupyter Notebook's spread)
--> Compare the two, does the interactive network make sense?