algorithms.tex

\section{Algorithms}
\label{sec:algorithms}

%-------------------------------------------------------------------------------
\subsection{Breadth-First Search (BFS)}
%-------------------------------------------------------------------------------
\label{sec:bfs}

The breadth-first search (BFS)
builds on the observation that vector-matrix multiplication $\grbv{f}\grbt\grbm{A}$ expresses 
navigation from the nodes selected by vector $\grbv{f}$ in the graph represented
by $\grbm{A}$.
The direction-optimizing push/pull BFS \cite{DBLP:conf/sc/BeamerAP12} is simple
to express in GraphBLAS \cite{DBLP:conf/icpp/YangBO18}.  If $\grbm{A}$ is held by row,
then $\grbv{f}\grbt\grbm{A}$ is a push step, while $\grbm{B}\grbv{f}$ is a pull step, where
$\grbm{B}=\grbm{A}\grbt$ is the explicit transpose of $\grbm{A}$, also held by row.
Other \grb libraries, \eg GraphBLAST, store both directions and perform
direction-optimization automatically~\cite{DBLP:journals/corr/abs-1908-01407}.
The push-only BFS is shown in
\autoref{alg:bfs-parents}, while the push/pull BFS is \autoref{alg:bfs-parents-do}.

The GraphBLAS BFS relies on the $\grbanysecondi$ %ANY-SECONDI
semiring to compute a single step,
$\grbv{q}\grbt \grbmaskreplace{\grbneg \grbstr{\grbv{p}\grbt}}  = \grbv{q}\grbt\grbm{A}$, where $\grbv{q}$ is the current frontier,
%$\grbv{q} \grbmask{\grbneg \grbstr{\grbv{p}}} = \grbv{q}\grbt\grbm{A}$, where $\grbv{q}$ is the current frontier
$\grbv{p}$ is the parent vector, and $\grbm{A}$ is the adjacency matrix.
The mask is a \emph{complemented structural mask} which means the mask corresponds to the 
empty elements of the mask vector.   Replace semantics are indicated (due to the \emph{r} in the mask expression)
so any elements of the vector other than those selected by the mask are deleted.
The result is the assignment to the parent vector on line 8 updates the vector with the parents of the the newly visited nodes.
%
%This step assigns the parents of newly visited nodes, which do not yet have a parent,
%using the complemented structural mask with replace semantics $\grbmaskreplace{\grbneg \grbstr{\grbv{p}}}$.
%using the complemented structural mask $\grbmask{\grbneg \grbstr{\grbv{p}}}$.

Consider a matrix multiply for conventional linear algebra, where the $\grbplus$ %PLUS
monoid sums a set of $t$ entries to obtain a single scalar for computing
$c_{ij} = \sum a_{ik} b_{kj}$ in the matrix multiply $\grbm{C} = \grbm{A}\grbm{B}$.  The $\grbany$ %ANY
monoid performs the reduction of $t$ entries to a single number by merely selecting
any one of the $t$ entries as the result $c_{ij}$.  The selection is done
non-deterministically, allowing for a benign race condition.  In the BFS, this
corresponds to selecting any valid parent of a newly discovered node.  Indeed,
the creation of the $\grbany$ %ANY
operator was inspired by Scott Beamer's \verb'bfs.cc'
method in the GAP benchmark, which has the same benign race condition.  The $\grbany$ %ANY
monoid translates the concept of this benign race condition to construct a
valid BFS tree into a linear algebraic operation, suitable for implementation
in GraphBLAS.

The $\grbsecondi$ %SECONDI
operator is the multiplicative operator in the $\grbanysecondi$ %ANY-SECONDI
semiring, where the result of $a_{ik} b_{kj}$ is simply the index $k$ in the
semiring for $\grbm{C} = \grbm{A}\grbm{B}$.  This gives the id of the parent node for a newly
discovered node in the next frontier.  The $\grbany$ %ANY
monoid then selects any valid
parent $k$.
%
\input{algorithms/bfs}

%-------------------------------------------------------------------------------
\subsection{Betweenness Centrality (BC)}
%-------------------------------------------------------------------------------
\label{sec:bc}

\input{algorithms/bc}
%
The vertex betweenness centrality metric is based on the number of
shortest paths through any given node,
$ \sum_{s \ne i \ne t} {\sigma (s, t|i)}/{\sigma(s,t)}, $
where $\sigma(s,t)$ is the total number of shortest paths from node $s$ to $t$,
and $\sigma(s,t|i)$ is the total number of shortest paths from node $s$ to $t$
that pass through node $i$.  This is expensive to compute, so in practice,
a subset of source nodes are chosen at random (a {\em batch}), of size $\mathit{ns}$.

Like the BFS, direction-optimization is incredibly simple to add to the LAGraph
algorithm for batched betweenness centrality (BC).
It only requires a simple heuristic to determine which
direction to use, followed by masked matrix-matrix multiplication with the
matrix or its transpose: $\grbm{F} \grbmask{\grbneg \grbstr{\grbm{P}}} = \grbm{F}\grbm{B}\grbt$ (the pull) or $\grbm{F}
\grbmask{\grbneg \grbstr{\grbm{P}}} = \grbm{F} \grbm{A}$ (the push), where $\grbm{A}$ is the adjacency matrix of
the graph and $\grbm{B} = \grbm{A}\grbt$ is its explicit transpose, $\grbm{F}$ is the frontier, and the
complemented structural mask $\grbneg \grbstr{\grbm{P}}$ is the set of unvisited nodes.  The multiplication
$\grbm{F} \grbm{B}\grbt$ relies on the descriptor to represent the transpose of $\grbm{B}$, which is not
explicitly transposed.  In the backward phase, the pull step is $\grbm{W} = \grbm{W} \grbm{A}\grbt$ while
the push is $\grbm{W} = \grbm{W} \grbm{B}$, where $\grbm{W}$ is the $\mathit{ns}$-by-$n$ matrix in which centrality is
accumulated (where $\mathit{ns}=4$ is a typical batch size).

To simplify the presentation of the entire BC algorithm, \autoref{alg:bc} does
not show the direction-optimization.  It is the same transformation as the pair
of BFS algorithms, where the push-only step (line 5 of
\autoref{alg:bfs-parents}), is expanded to a push/pull heuristic (lines 4-7 of
\autoref{alg:bfs-parents-do}).

%-------------------------------------------------------------------------------
\subsection{PageRank (PR)}
%-------------------------------------------------------------------------------
\label{sec:pagerank}

PageRank (PR) computes the importance of each node as a recursively-defined
metric: a web page is important if important pages link to it.
\autoref{alg:pagerank} shows the GraphBLAS implementation of PR as specified in
the GAP benchmark.  It uses the $\grbplussecond$ semiring, where
$\grboperator{second}(x,y)=y$, so it can ignore any edge weights in the input
matrix.  The PR in GAP does not properly handle dangling vertices in the graph.
The Graphalytics benchmark has a PageRank variant which avoids this
problem~\cite{DBLP:journals/corr/abs-2011-15028}.  We have included this
version to compare its performance with the GAP benchmark algorithm
\verb'pr.cc'.
%
\input{algorithms/pagerank}

%-------------------------------------------------------------------------------
\subsection{Single-Source Shortest Paths (SSSP)}
%-------------------------------------------------------------------------------
\label{sec:sssp}

A Delta-Stepping Single-Source Shortest Paths algorithm in GraphBLAS is shown in
\autoref{alg:sssp-delta-stepping}.  It relies on the $\grbminplus$ semiring.
Since it is a fairly complex algorithm, refer to
\cite{DBLP:conf/ipps/SridharBMSLM19} for a description of the method.
%
\input{algorithms/sssp-delta-stepping}

%-------------------------------------------------------------------------------
\subsection{Triangle Counting (TC)}
%-------------------------------------------------------------------------------
\label{sec:triangle-count}

The triangle counting (TC) problem is to compute the number of unique cliques
of size 3 in a graph.  The TC algorithm is shown in
\autoref{alg:triangle-count-sandiadot}, based on \cite{8091043}.
%
\input{algorithms/triangle-count}
%
It starts with a heuristic that decides when
to sort the input graph by ascending degree.  Next, it constructs the lower and
upper triangular part and computes a masked matrix multiply using the
$\grbpluspair$ semiring.  Internally, a dot product method is used in SS:GrB,
because $\grbm{U}$ is transposed via the descriptor.  The $\grboperator{pair}$
is the simple function $\grboperator{pair}(x,y)=1$.  When used in a semiring,
it acts like the $\grboperator{times}$ operator of the conventional semiring,
except that it can ignore the values of its inputs and treat them both as~1.
This semiring is useful for structural computations, such as triangle counting,
when the edge weights of a graph may be present but should be ignored in a
particular algorithm.

%-------------------------------------------------------------------------------
\subsection{Connected Components}
%-------------------------------------------------------------------------------
\label{sec:connected-components}

The connected components algorithm in LAGraph (\autoref{alg:fastsv})
is written by Zhang, Azad, and Bulu{\c{c}}
\cite{ZHANG202014,DBLP:conf/ppsc/ZhangAH20}.  The method maintains a forest of
trees represented by a parent vector, and iteratively merges trees until no
more merging is possible.  The method as shown in \autoref{alg:fastsv} is a
simplified variant that operates on the entire graph.  In the LAGraph
version, a subgraph is constructed first, and the method finds the connected
components of the subgraph, and then operates on the entire graph.
\input{algorithms/fastsv}