**Human Language Technology Center of Excellence**

**Department of Applied Mathematics and Statistics**

**Center for Imaging Science**

**Johns Hopkins University**

V. Lyzinski, M. Tang, A. Athreya, Y. Park, C.E. Priebe, “Community Detection and Classification in Hierarchical Stochastic Blockmodels,”

IEEE Transactions on Network Science and Engineering, vol. 4, no. 1, pp 13-26, 2017. publisher site

We propose a robust, scalable, integrated methodology for

community detectionandcommunity comparisonin graphs. In our procedure, we first embed a graph into an appropriate Euclidean space to obtain a low-dimensional representation, and then cluster the vertices into communities. We next employ nonparametric graph inference techniques to identify structural similarity among these communities. These two steps are then applied recursively on the communities, allowing us to detect more fine-grained structure. We describe ahierarchical stochastic blockmodel— namely, a stochastic blockmodel with a natural hierarchical structure — and establish conditions under which our algorithm yields consistent estimates of model parameters andmotifs, which we define to be stochastically similar groups of sub-graphs. Finally, we demonstrate the effectiveness of our algorithm in both simulated and real data. Specifically, we address the problem of locating similar subcommunities in a partially reconstructedDrosophilaconnectome and in the social network Friendster.

The following algorithm is our methodology for identifying and estimating the structural properties of repeated motifs.

**Detecting hierarchical structure for graphs**

**Input**: Adjacency matrix \(A \in \{0,1\}^{n\times n}\) for a latent position random graph.

**Output**: Subgraphs and characterization of their dissimilarity

**while** Cluster size exceeds threshold **do**

**Step 1**: Compute the adjacency spectral embedding \(\hat{X}\);

**Step 2**: Project the rows of \(\hat{X}\) onto the sphere yielding \(\hat{Y}\); i.e., for each \(i \in [n]\), \(\hat{Y}_i := \hat{X}_i / \|\hat{X}_i\|_2\);

**Step 3**: Cluster \(\hat{Y}\) to obtain subgraphs \(\hat{H}_1,\cdots,\hat{H}_R\);

**Step 4**: For each \(i\in R\), use ASE to re-embed \(\hat{H}_i\), obtaining \(\hat{X}_{\hat{H}_i}\);

**Step 5**: Compute \(\hat{S} :=[T_{\hat{n}_{r},\hat{n}_{s}}(\hat{X}_{\hat{H}_r},\hat{X}_{\hat{H}_s})]\) producing a pairwise dissimilarity matrix on induced subgraphs;

**Step 6**: Cluster induced subgraphs into motifs according to \(\hat{S}\);

**Step 7**: Recurse on each motif;

**end while**

**Definition 5**: Let \((A,X) \sim RDPG(F)\) and \((B,Y) \sim RDPG(G)\). We say that \(A\) and \(B\) are of the same *motif* if there exists a unitary transformation \(U\) such that \(F = G \circ U\).

**Corollary 8**: Clustering \(\hat{S}\) matrix yields a consistent clustering of \(\{\hat{H}_i\}^{R}_{i=1}\) into motifs.

- simulation: A simulation with a HSBM graph with 8 blocks in 3 motifs in Section 3 in the manuscript. The
`vlclust`

version is here. - real data:
- fly structural data: Motif detection in the
*Drosophila*Connectome data (“fly” in this link) in Section 5.1 in the manuscript. The`vlclust`

version is here. `Friendster`

network data: Motif detection in the Friendster network data in Section 5.2 in the manuscript.

- fly structural data: Motif detection in the

`R`

PackageThe latest `R`

source package can be downloaded from here.

It can be installed via:

```
if (!require(hsbm)) {
install.packages("http://www.cis.jhu.edu/~parky/HSBM/hsbm_0.1.0.tar.gz",type="source",method="wget")
library(hsbm)
}
```

and the *simulation* demo can be run by typing

`demo(hsbm)`

`library(help='hsbm')`

```
## Information on package 'hsbm'
##
## Description:
##
## Package: hsbm
## Type: Package
## Title: Hierarchical Stochastic Blockmodels
## Version: 0.1.0
## Depends: R (>= 3.0)
## Imports: igraph, Matrix, lattice, irlba
## Author: Minh Tang, Youngser Park
## Maintainer: Youngser Park <youngser@jhu.edu>
## Description: Routines to perform community detection and
## classification in Hierarchical Stochastic
## Blockmodels
## License: GPL (>= 2)
## URL: http://www.cis.jhu.edu/~parky/HSBM/
## LazyData: TRUE
## RoxygenNote: 5.0.0
## Built: R 3.3.2; ; 2016-12-14 15:55:56 UTC; unix
##
## Index:
##
## computeS auxiliary functions to compute the kernel test
## statistic.
## find.transform auxiliary functions to compute the kernel test
## statistic.
## getElbows Dimensionality selection for singular values
## using profile likelihood.
## kernel.stat auxiliary functions to compute the kernel test
## statistic.
## my.image2 auxiliary functions for visualization
## plotmemb auxiliary functions for visualization
## rect.dist auxiliary functions to compute the kernel test
## statistic.
## reembed auxiliary functions to compute the kernel test
## statistic.
## rg.sample Make a random graph
## stfp Spectral Embedding of Adjacency Matrices
## vlclust Angle based clustering
```

*prepared by youngser@jhu.edu on Sat Mar 25 12:11:05 2017*