Skip to content
Abdullah Al Raqibul Islam edited this page Aug 19, 2021 · 8 revisions

Introduction to the example applications.

The best way to get familiar with GraphChi is to run and study the example applications. You should start with the simple applications, Pagerank and Connected Components prior to looking at matrix factorization or triangle counting.

Input data

To run the examples, you need an input graph (network). Currently !GraphChi can read graphs in the standard [Edge-List-Format] or [Adjacency-List-Format], but it is fairly easy to write your own parsers. Remember that GraphChi can run even very big graphs on a normal PC or laptop!

A good set of graphs of different sizes are available below:

A great collection of graphs is available at ( http://law.di.unimi.it/datasets.php ), but notice that they are in the compressed format of WebGraph Framework ( http://webgraph.dsi.unimi.it/ ).

Preprocessing of graphs

All example programs automatically convert the input graph to GraphChi's internal shards (see Introduction-To-GraphChi). That is, you just need to compile and run the application. This conversion needs to be done only once for each application, and the system will detect whether preprocessing is needed. Also, if the value type stored in edges is of the same size, different applications can use the same shards.

Example Applications

Pagerank (Easy)

What you will learn:

  • Basic structure of GraphChi programs
  • How to "communicate" via edges
  • Flavor of asynchronous computation

Source: example_apps/pagerank.cpp

To build and run: make example_apps/pagerank bin/example_apps/pagerank file GRAPH-NAME

Above, replace GRAPH with the input graph. If the graph has not been preprocessed, the program will ask for the format of the graph (edgelist or adjacency-list).

You may adjust the number of iterations by specifying parameter "niters" as follows:

> bin/example_apps/pagerank file GRAPH-NAME niters 10

The application will print the ids of the top 20 vertices with the highest PageRank score.

Analyzing the output

GraphChi will write the values of the edges in a binary file, which is easy to handle in other programs. The name of the file containing vertex values is GRAPH-NAME.4B.vout. Here "4B" refers to the vertex-value being a 4-byte type (float).

You can also analyze the vertex-data programmatically, in memory-efficient manner, by using VertexAggregators.

Remark

While this Pagerank example is simple, it is not the fastest way to compute Pagerank with GraphChi. Particularly, it unnecessarily loads the values for out-edges, while the value is never read, it is just written. There is another version of Pagerank available, example_apps/pagerank_functional.cpp, which uses an experimental alternative "functional" API, which is roughly 50% more efficient for Pagerank.

Connected Components (Easy)

What you will learn:

  • Dealing with undirected edges
  • Understanding asynchronous computation
  • Selective (dynamic) scheduling

Source: example_apps/connectedcomponents.cpp

To build and run:

> make example_apps/connectedcomponents
> bin/example_apps/connectedcomponents file GRAPH-NAME

Output

The program will produce output GRAPHNAME_components.txt, which on each line has component id, num-of-vertices. As an interesting analysis, you can plot a histogram of the sizes of the connected components.

Community Detection (Easy+)

What you will learn:

  • How to exchange information via edges in both directions

Source: example_apps/communitydetection.cpp

To build and run:

> make example_apps/communitydetection
> bin/example_apps/communitydetection file GRAPH-NAME

Output

Same output as connected components, but instead of components, it lists community-ids and their sizes.

Matrix factorization: Alternative Least Squares - ALS (Moderate)

What you will learn:

  • How to integrate linear algebra package (Eigen)
  • How to deal with a bipartite graph
  • Special graph input / output
  • Two strategies of reading neighbor vertex value (via edges, and in-memory)

There are two versions of the program. The first version, edge-version, writes a vertex's value (a latent factor vector) to its adjacent edges; the vertices-in-memory version uses a separate array to store vertices in-memory. See the code for more details.

Dependency: Eigen

You need the Eigen matrix library for compiling the matrix factorization programs. It is available here, as a headers-only installation.

Edge-version

Source: example_apps/als_edgefactors.cpp

To build and run:

> make example_apps/matrix_factorization/als_edgefactors
> bin/example_apps/matrix_factorization/als_edgefactors file MATRIX-NAME

Vertices in-memory version

Source: example_apps/als_vertices_inmem.cpp

> make example_apps/matrix_factorization/als_vertices_inmem
> bin/example_apps/matrix_factorization/als_vertices_inmem file MATRIX-NAME

Input

For this program, you need to provide your input in the Matrix Market format: http://math.nist.gov/MatrixMarket/

For example, if you are running ALS for the Netflix movie rating dataset, the matrix is a sparse matrix containing the ratings that users have given each movie.

Output

The matrix factors are output in the Matrix Market format with names

  • GRAPH-NAME_U.mm
  • GRAPH-NAME_V.mm

Triangle Counting (Advanced)

What you will learn:

  • A special design pattern, which requires working outside the basic abstraction
  • Writing a special preprocessor that orders the vertices by their degree

Source: https://github.com/GraphChi/graphchi-cpp/blob/master/example_apps/trianglecounting.cpp

Streaming Graph + Pagerank

This application starts with a base-graph, and streams additional edges to the graph from a streaming graph file (which must be in the EdgeListFormat), while simultaneously computing pagerank. The source code contains also an example of how to use an HTTP-based admin dashboard for !GraphChi applications. More information is added later, and be warned that the code in the example application is not finished and contains a lot of demonstration code.

Source: https://github.com/GraphChi/graphchi-cpp/blob/master/example_apps/streaming_pagerank.cpp