Community Detection and Classification in Hierarchical Stochastic Blockmodels

Human Language Technology Center of Excellence
Department of Applied Mathematics and Statistics
Center for Imaging Science
Johns Hopkins University

V. Lyzinski, M. Tang, A. Athreya, Y. Park, C.E. Priebe, “Community Detection and Classification in Hierarchical Stochastic Blockmodels,” IEEE Transactions on Network Science and Engineering, vol. 4, no. 1, pp 13-26, 2017. publisher site

Abstract

We propose a robust, scalable, integrated methodology for community detection and community comparison in graphs. In our procedure, we first embed a graph into an appropriate Euclidean space to obtain a low-dimensional representation, and then cluster the vertices into communities. We next employ nonparametric graph inference techniques to identify structural similarity among these communities. These two steps are then applied recursively on the communities, allowing us to detect more fine-grained structure. We describe a hierarchical stochastic blockmodel — namely, a stochastic blockmodel with a natural hierarchical structure — and establish conditions under which our algorithm yields consistent estimates of model parameters and motifs, which we define to be stochastically similar groups of sub-graphs. Finally, we demonstrate the effectiveness of our algorithm in both simulated and real data. Specifically, we address the problem of locating similar subcommunities in a partially reconstructed Drosophila connectome and in the social network Friendster.

Algorithm

The following algorithm is our methodology for identifying and estimating the structural properties of repeated motifs.

Detecting hierarchical structure for graphs
Input: Adjacency matrix \(A \in \{0,1\}^{n\times n}\) for a latent position random graph.
Output: Subgraphs and characterization of their dissimilarity
while Cluster size exceeds threshold do
  Step 1: Compute the adjacency spectral embedding \(\hat{X}\);
  Step 2: Project the rows of \(\hat{X}\) onto the sphere yielding \(\hat{Y}\); i.e., for each \(i \in [n]\), \(\hat{Y}_i := \hat{X}_i / \|\hat{X}_i\|_2\);
  Step 3: Cluster \(\hat{Y}\) to obtain subgraphs \(\hat{H}_1,\cdots,\hat{H}_R\);
  Step 4: For each \(i\in R\), use ASE to re-embed \(\hat{H}_i\), obtaining \(\hat{X}_{\hat{H}_i}\);
  Step 5: Compute \(\hat{S} :=[T_{\hat{n}_{r},\hat{n}_{s}}(\hat{X}_{\hat{H}_r},\hat{X}_{\hat{H}_s})]\) producing a pairwise dissimilarity matrix on induced subgraphs;
  Step 6: Cluster induced subgraphs into motifs according to \(\hat{S}\);
  Step 7: Recurse on each motif;
end while

Definition & Corollary

Definition 5: Let \((A,X) \sim RDPG(F)\) and \((B,Y) \sim RDPG(G)\). We say that \(A\) and \(B\) are of the same motif if there exists a unitary transformation \(U\) such that \(F = G \circ U\).
Corollary 8: Clustering \(\hat{S}\) matrix yields a consistent clustering of \(\{\hat{H}_i\}^{R}_{i=1}\) into motifs.

Experiments

simulation: A simulation with a HSBM graph with 8 blocks in 3 motifs in Section 3 in the manuscript. The vlclust version is here.
real data:
- fly structural data: Motif detection in the Drosophila Connectome data (“fly” in this link) in Section 5.1 in the manuscript. The vlclust version is here.
- Friendster network data: Motif detection in the Friendster network data in Section 5.2 in the manuscript.

`R` Package

The latest R source package can be downloaded from here.
It can be installed via:

if (!require(hsbm)) {
    install.packages("http://www.cis.jhu.edu/~parky/HSBM/hsbm_0.1.0.tar.gz",type="source",method="wget")
    library(hsbm)
}

and the simulation demo can be run by typing

demo(hsbm)

library(help='hsbm')

##      Information on package 'hsbm'
## 
## Description:
## 
## Package:       hsbm
## Type:          Package
## Title:         Hierarchical Stochastic Blockmodels
## Version:       0.1.0
## Depends:       R (>= 3.0)
## Imports:       igraph, Matrix, lattice, irlba
## Author:        Minh Tang, Youngser Park
## Maintainer:    Youngser Park <[email protected]>
## Description:   Routines to perform community detection and
##                classification in Hierarchical Stochastic
##                Blockmodels
## License:       GPL (>= 2)
## URL:           http://www.cis.jhu.edu/~parky/HSBM/
## LazyData:      TRUE
## RoxygenNote:   5.0.0
## Built:         R 3.3.2; ; 2016-12-14 15:55:56 UTC; unix
## 
## Index:
## 
## computeS                auxiliary functions to compute the kernel test
##                         statistic.
## find.transform          auxiliary functions to compute the kernel test
##                         statistic.
## getElbows               Dimensionality selection for singular values
##                         using profile likelihood.
## kernel.stat             auxiliary functions to compute the kernel test
##                         statistic.
## my.image2               auxiliary functions for visualization
## plotmemb                auxiliary functions for visualization
## rect.dist               auxiliary functions to compute the kernel test
##                         statistic.
## reembed                 auxiliary functions to compute the kernel test
##                         statistic.
## rg.sample               Make a random graph
## stfp                    Spectral Embedding of Adjacency Matrices
## vlclust                 Angle based clustering

prepared by [email protected] on Sat Mar 25 12:11:05 2017