Graph Convolutional Neural Networks via Scattering
Graph Convolutional Neural Networks via Scattering
Zou, Dongmian;Lerman, Gilad
2018-03-31 00:00:00
We generalize the scattering transform to graphs and consequently construct a convolutional neural network on graphs. We show that under certain conditions, any feature generated by such a network is approximately invariant to permutations and stable to graph manipulations. Numerical results demon- strate competitive performance on relevant datasets. 1 Introduction Many interesting and modern datasets can be described by graphs. Examples include social [1], physical [2], and transportation [3] networks. The recent survey paper of Bronstein et al. [4] on geometric deep learning emphasizes the need to develop deep learning tools for such datasets and even more importantly to understand the mathematical properties of these tools, in particular, their invariances. They also mention two types of problems that may be addressed by such tools. The rst problem is signal analysis on graphs with applications such as classi cation, prediction and inference on graphs. The second problem is learning the graph structure with applications such as graph clustering and graph matching. Several recent works address the rst problem [5, 6, 7, 8]. In these works, the lters of the networks are designed to be parametric functions of graph operators, such as the graph adjacency and Laplacian, and the parameters of those functions have to be trained. The second problem is often explored with random graphs generated according to two common models: Erd} os{R enyi, which is used for graph matching, and the Stochastic Block Model (SBM), which is used for community detection. Some recent graph neural networks have obtained state-of-the-art performance for graph matching [9] and community detection [10, 11] with synthetic data generated from the respective graph models. As above, the lters in these works are parametric functions of either the graph adjacency or Laplacian, where the parameters are trained. Despite the impressive progress in developing graph neural networks for solving these two problems, the performance of these methods is poorly understood. Of main interest is their invariance or stability to basic signal and graph manipulations. In the Euclidean case, the stability of a convolutional neural network [12] to rigid transformations and deformations is best understood in view of the scattering transform [13]. The scattering transform has a multilayer structure and uses wavelet lters to propagate signals. It can be viewed as a convolutional neural network where no training is required to design the lters. Training is only required for the classi ers given the transformed data. Nevertheless, there is freedom in the selection and design of the wavelets. The scattering transform is approximately invariant to translation and rotation. More precisely, under strong assumptions on the wavelet and scaling functions and as the coarsest scale J approaches 1, the scattering transform becomes invariant to translations and rotations. Moreover, it is Lipschitz continuous with respect to smooth deformation. These properties are shown in [13] for signals in 2 d 2 L (R ) and L (H ), where H is a compact Lie group. It is interesting to note that the design of lters in existing graph neural networks is related to the design of wavelets on graphs in the signal processing literature. Indeed, the construction of wavelets on graphs use special operators on graphs such as the graph adjacency and Laplacian. As mentioned above, these operators are commonly used in graph neural networks. The earliest works on graph wavelets [14, 15] apply the normalized graph Laplacian to de ne the diusion wavelets on graphs and use them to study multiresolution decomposition of graph signals. Hammond et al. [16] use the unnormalized graph Laplacian to de ne analogous graph wavelets and study properties of these wavelets such as reconstructibility and arXiv:1804.00099v2 [cs.IT] 18 Nov 2018 locality. One can easily construct a graph scattering transform by using any of these wavelets. A main question is whether this scattering transform enjoys the desired invariance and stability properties. In this work, we use a special instance of the graph wavelets of [16] to form a graph scattering network and establish its covariance and approximate invariance to permutations and stability to graph manipulations. We also demonstrate the practical eectiveness of this transform in solving the two types of problems discussed above. The rest of the paper is organized as follows. The scattering transform on graphs is de ned in Section 2. Section 3 shows that the full scattering transform preserves the energy of the input signal. This section also provides an absolute bound on the energy decay rate of components of the transform at each layer. Section 4 proves the permutation covariance and approximate invariance of the graph scattering transform. It also brie
y discusses previously suggested candidates for the notion of translation or localization on graphs and the possible covariance and approximate invariance of the scattering transform with respect to them. Furthermore, it clari es why some special permutations are good substitutes for Euclidean rigid transforma- tions. Section 5 establishes the stability of the scattering transform with respect to graph manipulations. Section 6 demonstrates competitive performance of the proposed graph neural network in solving the two types of problems. 2 Wavelet graph convolutional neural network We rst review the graph wavelets of [16] in Section 2.1. We then use these wavelets and ideas of [13] to construct a graph scattering transform in Section 2.2. 2.1 Wavelets on graphs We review the wavelet construction of Hammond et al. [16] and adapt it to our setting. Our general theory applies to what we call simple graphs, that is, weighted, undirected and connected graphs with no self-loops. We remark that we may also address self-loops, but for simplicity we exclude them. Throughout the paper we x an arbitrary simple graph G = (V; E) with N vertices. We also consistently use uppercase boldface letters to denote matrices and lowercase boldface letters to denote vectors or vector-valued functions. The weight matrix of G is an N N symmetric matrix W with zero diagonal, where W (n; m) denotes the weight assigned to the edge fn; mg of G. The degree matrix of G is an N N diagonal matrix with D(n; n) = W (n; m) ; 1 n N : (1) m=1 The (unnormalized) Laplacian of G is the N N matrix L = D W : (2) The eigenvalues of L are non-negative and the smallest one is 0. Since the graph is connected, the eigenspace of 0 (that is, the kernel of L) has dimension one. It is spanned by a vector with equal nonzero entries for all vertices. This vector represents a signal of the lowest possible \frequency". The graph Laplacian L is symmetric and can be represented as N 1 L = u u ; (3) l l l=0 where 0 = < are the eigenvalues of L, u ; ;u are the corresponding eigenvectors, 0 1 N 1 0 N 1 and denotes the conjugate transpose. We remark that the phases of the eigenvectors of L and their order within any eigenspace of dimension larger than 1 can be arbitrarily chosen without aecting our theory for the graph scattering transform formulated below. Let f 2 L (G) be a graph signal. Note that in our setting we can regard L (G) ' L (V ) ' C , and 2 2 2 N N N without further speci cation we shall consider f 2 C . We de ne the Fourier transform F : C ! C by N 1 Ff = f := (u f ) ; (4) l l=0 2 1 N N and the inverse Fourier transform F : C ! C by N 1 ^ ^ F f := f (l)u : (5) l=0 Let denote the Hadamard product, that is, for g , g 2 C , g g (l) = g (l)g (l), l = 0; ; N 1. 1 2 1 2 1 2 ^ ^ De ne the convolution of f and f in L (G) as the inverse Fourier transform of f f , that is, 1 2 1 2 N 1 N 1 N 1 X X X ^ ^ ^ ^ ^ f f = F f f = u f (l)f (l) = u u f f (l) = u u f u f : (6) l l l 1 2 1 2 1 2 l 1 2 l 1 l 2 l=0 l=0 l=0 When emphasizing the dependence of on the graph G, we denote it by . Euclidean wavelets use shift and scale in Euclidean space. For signals de ned on graphs, which are discrete, the notions of translation and dilation need to be de ned in the spectral domain. Hammond et al. [16] view R as the spectral domain since it contains the eigenvalues of L. Their procedure assumes a ^ ^ scaling function and a wavelet functions [17, 18] with corresponding Fourier transforms and . They have minimal assumptions on and . In our construction, we consider dyadic wavelets, that is, ^ ^ (!) = (2 !); j 2 Z : (7) Also, we x a scale J 2 Z of coarsest resolution and assume that and can be constructed from multiresolution analysis, that is, 2 2 ^ ^ + = 1 : (8) J j j>