Problem statement

Community Detection

A problem of Community Detection in an undirected graph with weights that are non-negative real numbers is proposed.

A graph G = (V,E) is given where V is a vertex set, E is an independent edge set, w(u,v) is the weight of each edge e(u,v) \in E , u,v \in V .

The objective of the task is to find such a partition of G into a set C of disjoint communities of vertices c_i \subseteq V :

\cup c_i = V , \forall c_i \in C

c_i \cap c_j = \varnothing , \forall c_i,c_j \in C, i \ne j ,

to maximize the modularity Q:

Q = \sum\limits_{c \in C} \left(\frac{\sum_{in}^c}{m} - \lbrack\frac{\sum_{tot}^c}{m}\rbrack^2 \right), where

\sum_{in}^c = \sum_{u,v \in c, \; e(u,v) \in E} w(u,v),

\sum_{tot}^c = \sum_{u \in c \; or \; v \in c, \; e(u,v) \in E} w(u,v) =
= \sum_{in}^c + \sum_{u \in c, v \notin c, e(u,v) \in E} w(u,v),

m = \sum_{e(u,v) \in E} w(u,v) .

Informally, the vertices of one community should be strongly connected while the vertices from different communities should be loosely connected.

Testing

To assess the efficiency of the implementation three types of synthetic graphs are used: RMAT, SSCA2 and LFR*. There are 3 fixed size secret graphs about 3 GB.

  • R-MAT graphs [1] provide good simulation of real graphs from social networks, Internet.
  • SSCA2 graph [2] is a set of independent components connected with each other by a number of edges.
  • LFR graph [3] is specially designed for testing of algorithms for solving Community Detection problems. Depending on the parameters the communities could be easily isolated, or vice versa, merged. A slightly simplified LFR* graph is used in the contest.

The position of the participant’s solution in the final list of the contest is determined by a rating function that primarily depends on the solution time. If the solution modularity is greater than the reference modularity, a significant bonus is given, if modularity is greater than 90% of the reference modularity there is a small penalty, otherwise there is a great penalty.

    # mod - solution modularity 
    # mod_ref - reference modularity 
    # time - average time
    ratio = mod / mod_ref 
    if ratio > 1 then
        f = (ratio - 1)*10 + 1
    else if 0.9 < ratio <= 1 then
        f = ratio
    else
        f = 1e-10 * exp(25.0531 * ratio)
    rating = 10000 * f / time

Rules

A problem solution may use any algorithm. Developers can change the internal representation of the graph at their discretion using different data structures, but time of internal representation construction must be included to the solution time.

Requirements for the MST algorithm output. Output data should contain:

  • an array of community numbers in which a community number for each vertex is stored.

Computing systems

The following type of computing systems is provided for problem implementation:

  • Single computing node (CPU and/or GPU).

Organizers will provide all the computing resources for the participants. Implementations for other architectures are encouraged and will be considered separately.

Example implementation

Version 1.0.1 of the example implementation of the Louvain algorithm [4] of solving the problem, as well as the necessary infrastructure for serial (single-node) generation of graphs and calculation of modularity can be downloaded here: GraphHPC-1.0.1.tgz.

Questions

If you have any question, please email at contest at dislab org.

References

  1. R-MAT: A Recursive Model for Graph Mining (English paper, PDF).
  2. SSCA2. Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors (English paper, разделы 2 и 3, PDF).
  3. LFR. Benchmark graphs for testing community detection algorithms (English paper, PDF).
  4. Louvain algorithm. Fast unfolding of communities in large networks (English paper, PDF).