se-sic · hechtlC · Jan 29, 2025 · Nov 27, 2024 · Dec 11, 2024 · Dec 11, 2024
diff --git a/NEWS.md b/NEWS.md
@@ -20,6 +20,7 @@
 - Add `remove.duplicate.edges` function that takes a network as input and conflates identical edges (PR #268, d9a4be417b340812b744f59398ba6460ba527e1c, 0c2f47c4fea6f5f2f582c0259f8cf23af985058a, c6e90dd9cb462232563f753f414da14a24b392a3)
 - Add `cumulative` as an argument to `construct.ranges` which enables the creation of cumulative ranges from given revisions (PR #268, a135f6bb6f83ccb03ae27c735c2700fccc1ee0c8, 8ec207f1e306ef6a641fb0205a9982fa89c7e0d9)
 - Add function `get.last.activity.data` to compute developers' last activities in a project, as well as function `add.vertex.attribute.author.last.activity` to add a developer's date of last activity as vertex attribute to a network, as well as helper functions `get.aggregated.activity.data` and `add.vertex.attribute.author.aggregated.activity` to allow for other activity aggregations than first and last activity (PR #275, 9f231612fcd33a283362c79b35a94295ff3d4ef9, 8660ed763ba4b69e909e7fbb01e27e1999522047)
+- Add four new metric which can be used for the classification of authors into core and peripheral: Betweenness, Closeness, Pagerank and Eccentricity (PR #276, 65d5c9cc86708777ef458b0c2e744ab4b846bdd1, b392d1a125d0f306b4bce8d95032162a328a3ce2, c5d37d40024e32ad5778fa5971a45bc08f7631e0)
 
 ### Changed/Improved
 
@@ -30,6 +31,7 @@
 - Explicitly add R version 4.4 to the CI test pipeline (c8e6f45111e487fadbe7f0a13c7595eb23f3af6e)
 - Refactor function `construct.edge.list.from.key.value.list` to be more readable (PR #263, 05c3bc09cb1d396fd59c34a88030cdca58fd04dd)
 - Update necessary `igraph` version to 2.1.0 in `README.md` (PR #274, 6c3bcd1a2366d0d3a176d9fde95b8356b0158da3)
+- Include core/peripheral classification in the `README.md` (PR #276, )
 
 ### Fixed
 

diff --git a/README.md b/README.md
@@ -34,6 +34,9 @@ If you wonder: The name `coronet` derives as an acronym from the words "configur
       - [Splitting data and networks based on defined time windows](#splitting-data-and-networks-based-on-defined-time-windows)
       - [Cutting data to unified date ranges](#cutting-data-to-unified-date-ranges)
       - [Handling data independently](#handling-data-independently)
+      - [Core/Peripheral classification](#coreperipheral-classification)
+           - [Count-based metrics](#count-based-metrics)
+           - [Network-based metrics](#network-based-metrics)
     - [How-to](#how-to)
     - [File/Module overview](#filemodule-overview)
   - [Configuration classes](#configuration-classes)
@@ -375,6 +378,55 @@ Analogously, the `NetworkConf` parameter `unify.date.ranges` enables this very f
 
 In some cases, it is not necessary to build a network to get the information you need. Therefore, please remember that we offer the possibility to get the raw data or mappings between, e.g., authors and the files they edited. The data inside an instance of `ProjectData` can be accessed independently. Examples can be found in the file `showcase.R`.
 
+#### Core/Peripheral classification
+
+Core/Peripheral classification descibes the process of dividing the authors of a project into either `core` or `peripheral` developers based on the principle that the core developers contribute most of the work in a given project. The concrete threshold can be configured in `CORE.THRESHOLD` and is set to 80%  per default, a value commonly used in literature. In practice, this is done by assigning scores to developers to approximate their importance in a project and then dividing the authors into `core` or `peripheral` based on these scores such that the desired split is achieved.
+
+##### Count-based metrics
+
+In this section, we provide descriptions of the different algorithms we provide for classifying authors into core or peripheral authors using count-based metrics.
+- `commit.count`
+    * calculates scores based on the number of commits per author
+- `loc.count`
+    * calculates scores based on the number of lines of code changed by each author
+- `mail.count`
+    * calculates scores based on the number of mails sent per author
+- `mail.thread.count`
+    * calculates scores based on the number of mail threads each author participated in
+- `issue.count`
+    * calculates scores based on the number of issues each author participated in
+- `issue.comment.count`
+    * calculates scores based on the number of comments each author made in issues
+- `issue.commented.in.count`
+    * calculates scores based on the number of issues each author commented in
+- `issue.created.count`
+    * calculates scores based on the number of issues each author created
+
+##### Network-based metrics
+
+In this section, we provide descriptions of the different algorithms we provide for classifying authors into core or peripheral authors using metrics that are used on author networks. Note that the provided methods can be used for any network and not just author networks. The classification would then occur regarding the type of the vertices, i.e. an artifact network would result in a classification of the artifacts based on their importance in the network.
+- `network.degree`
+    * calculates scores based on the vertex degrees in a network
+    * the degree of a vertex is the number of adjacent edges
+- `network.eigen`
+    * calculates scores based on the eigenvector centralities in a network
+    * eigenvector centrality measures the importance of vertices within a network by awarding scores for adjacent edges proportional to the score of the connected vertex
+- `network.hierarchy`
+    * calculates scores based on the hierarchy found within a network
+    * hierarchical scores are calculated by dividing the vertex degree by the clustering coefficient of each vertex
+- `network.betweenness`
+    * calculates scores based on the betweenness of vertices in a network
+    * betweenness measures the number of shortest paths between any two vertices that go through each vertex
+- `network.closeness`
+    * calculates scores based on the closeness of vertices in a network
+    * closeness measures how close vertices are to each other by calculating the sum of their shortest paths to all other vertices
+- `network.pagerank`
+    * calculates scores based on the pagerank of vertices in a network
+    * pagerank refers to the pagerank algorithm employed by google, which is closely related to eigenvector centrality
+- `network.eccentricity`
+    * calculates scores based on the eccentricity of vertices in a network
+    * eccentricity measures the length of the shortest path to each vertex's furthest reachable vertex
+
 ### How-to
 
 In this section, we give a short example on how to initialize all needed objects and build a bipartite network.

diff --git a/tests/test-core-peripheral.R b/tests/test-core-peripheral.R
@@ -18,6 +18,7 @@
 ## Copyright 2019 by Christian Hechtl <[email protected]>
 ## Copyright 2021 by Christian Hechtl <[email protected]>
 ## Copyright 2023-2024 by Maximilian Löffler <[email protected]>
+## Copyright 2024 by Leo Sendelbach <[email protected]>
 ## All Rights Reserved.
 
 
@@ -105,6 +106,117 @@ test_that("Eigenvector classification", {
     expect_equal(expected, result, tolerance = 0.0001)
 })
 
+test_that("Hierarchy classification", {
+
+    vertices = data.frame(
+        name = c("Olaf", "Thomas", "Karl"),
+        kind = TYPE.AUTHOR,
+        type = TYPE.AUTHOR
+        )
+    edges = data.frame(
+        from = c("Olaf", "Thomas", "Karl", "Thomas"),
+        to = c("Thomas", "Karl", "Olaf", "Thomas"),
+        func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"),
+        hash = c("0a1a5c523d835459c42f33e863623138555e2526",
+                 "418d1dc4929ad1df251d2aeb833dd45757b04a6f",
+                 "5a5ec9675e98187e1e92561e1888aa6f04faa338",
+                 "d01921773fae4bed8186b0aa411d6a2f7a6626e6"),
+        file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"),
+        base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774",
+                      "0a1a5c523d835459c42f33e863623138555e2526",
+                      "1143db502761379c2bfcecc2007fc34282e7ee61",
+                      "0a1a5c523d835459c42f33e863623138555e2526"),
+        base.func = c("test2.c::test2", "test2.c::test2",
+                      "test3.c::test_function", "test2.c::test2"),
+        base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
+        artifact.type = c("CommitInteraction", "CommitInteraction", "CommitInteraction", "CommitInteraction"),
+        weight = c(1, 1, 1, 1),
+        type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
+        relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
+        )
+    test.network = igraph::graph_from_data_frame(edges, directed = FALSE, vertices = vertices)
+
+    ## Act
+    result = get.author.class.network.hierarchy(test.network)
+    ## Assert
+    expected.core = data.frame(author.name = c("Thomas"),
+                               hierarchy = c(4))
+    expected.peripheral = data.frame(author.name = c("Olaf", "Karl"),
+                                     hierarchy = c(2, 2))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result)
+})
+
+test_that("Betweenness classification", {
+
+    ## Act
+    result = get.author.class.network.betweenness(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               betweenness.centrality = c(1))
+    expected.peripheral = data.frame(author.name = c("Björn", "udo", "Thomas", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     betweenness.centrality = c(0, 0, 0, 0, 0, 0))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result)
+})
+
+test_that("Closeness classification", {
+
+    ## Act
+    result = get.author.class.network.closeness(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               closeness.centrality = c(0.5))
+    expected.peripheral = data.frame(author.name = c("Björn", "Thomas", "udo", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     closeness.centrality = c(0.33333, 0.33333, 0.0, 0.0, 0.0, 0.0))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result, tolerance = 0.0001)
+})
+
+test_that("Pagerank classification", {
+
+    ## Act
+    result = get.author.class.network.pagerank(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               pagerank.centrality = c(0.40541))
+    expected.peripheral = data.frame(author.name = c("Björn", "Thomas", "udo", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     pagerank.centrality = c(0.21396, 0.21396, 0.041667, 0.041667, 0.041667, 0.041667))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result, tolerance = 0.0001)
+})
+
+test_that("Eccentricity classification", {
+
+    ## Act
+    result = get.author.class.network.eccentricity(network)
+
+    ## Assert
+    expected.core = data.frame(author.name = c("Olaf"),
+                               eccentricity = c(1))
+    expected.peripheral = data.frame(author.name = c("Björn", "udo", "Thomas", "Fritz [email protected]",
+                                                     "georg", "Hans"),
+                                     eccentricity = c(0, 0, 0, 0, 0, 0))
+    expected = list(core = expected.core, peripheral = expected.peripheral)
+    row.names(result[["core"]]) = NULL
+    row.names(result[["peripheral"]]) = NULL
+    expect_equal(expected, result)
+})
+
 # TODO: Add a test for hierarchy classification
 
 test_that("Commit-count classification using 'result.limit'" , {