# Communications of the ACM

Research highlights

## Spectral Sparsification of Graphs: Theory and Algorithms

Graph sparsification is the approximation of an arbitrary graph by a sparse graph.

We explain what it means for one graph to be a spectral approximation of another and review the development of algorithms for spectral sparsification. In addition to being an interesting concept, spectral sparsification has been an important tool in the design of nearly linear-time algorithms for solving systems of linear equations in symmetric, diagonally dominant matrices. The fast solution of these linear systems has already led to breakthrough results in combinatorial optimization, including a faster algorithm for finding approximate maximum flows and minimum cuts in an undirected network.

### 1. Introduction

A sparse graph is one whose number of edges is reasonably viewed as being proportional to its number of vertices. In contrast, the complete graph on n vertices, with about n2/2 edges, is the paradigmatic dense graph. Sparse graphs are often easier to handle than dense ones. Most graph algorithms run faster, sometimes by orders of magnitude, when there are fewer edges, and the graph itself can be stored more compactly. By approximating a dense graph of interest by a suitable sparse one, one can save time and space.

We will work with weighted graphs, where the weights might represent capacities, conductance, similarity, or just coefficients in a linear system. In a sparse graph, all of the edges can be important for a graph's structure. In a tree, for example, each edge provides the only path between its endpoints. Not so in a dense graph, where some edges will serve similar functions. A collection of low-weight edges connecting two clusters of vertices in a graph might be approximable by a single high-weight edge connecting vertices representative of those clusters. Sparsification can be viewed as a procedure for finding a set of representative edges and weighting them appropriately.

What exactly do we mean by sparse? We would certainly consider a graph sparse if its average degree were less than 10, and we would probably consider a graph sparse if it had one billion vertices and average degree one hundred. We formalize the notion of sparsity in the usual analysis-of-algorithms way by considering infinite families of graphs, and proclaiming sparse those whose average degrees are bounded by some constant, or perhaps by a polynomial in the logarithm of their number of vertices.

One may at first think that sparsification is unnecessary, as common wisdom holds that all large real-world graphs are sparse. While this may be true of natural graphs such as social networks, it is not true of the graphs that arise inside algorithms. Many algorithms involve the construction of graphs that are dense, even when solving problems on graphs that are sparse. Moreover, the common wisdom may be an artifact of the difficulty of storing and manipulating a large dense graph. Improvements in sparsification may one day ameliorate these difficulties.

The use of sparse graphs to approximate dense ones is not unique to algorithm design. In a parallel computer, for instance, information needs to be able to flow from any processor to any other. Hardwiring all those pairwise connections would be physically difficult, so a sparse graph simulating the connectivity properties of the complete graph is needed. The hypercube graph plays this role in the CM5, built by Thinking Machines.25 Intuitively, the hypercube has no "bottlenecks." Formally, the (weighted) hypercube is a good spectral sparsifier for the complete graph defined on its nodes. We have shown that every graph has a very sparse spectral approximation, with constant average degree.

### 2. Notions of Similarity

A few conventions: we specify a weighted graph by a 3-tuple, G = (V, E, w), with vertex set V = {1, ..., n}, edge set E ⊆ {(u, v) | u, vV}, and weights w(u,v) > 0 for each (u, v) ∈ E. All graphs will be undirected and weighted, unless otherwise stated. We sometimes express a graph simply by G = (V, w), as E can be defined implicitly by setting w(u,v) = 0 for all (u, v) ∉ E. We will always write n for the number of vertices in a graph and m for the number of edges. When measuring the similarity between two graphs, we will always assume that they have the same set of vertices.

2.1. Cut similarity

The notion of cut similarity of graphs was first considered by Benczúr and Karger8 as part of an effort to develop fast algorithms for the minimum cut and maximum flow problems. In these problems, one is interested in the sum of the weights of edges that are cut when one divides the vertices of a graph into two pieces. Two weighted graphs on the same vertex set are cut-similar if the sum of the weights of the edges cut is approximately the same in each such division.

To write this symbolically, we first observe that a division of the vertices into two parts can be specified by identifying the subset of vertices in one part. For a weighted graph G = (V, w) and a subset of vertices SV, we define

We say that G = (V, w) and = (V, ) are σ-cut similar if

for all SV. Surprisingly, every graph is cut-similar to a graph with average degree O(log n), and that graph can be computed in polylogarithmic time.

THEOREM 1 (BENCZÚR-KARGER). For all ε > 0, every G = (V, E, w) has a (1 + ε)-cut similar graph = (V, , ) such that E and || = O(n log n2). Moreover can be computed in O(m log3 n + m log n2) time.

The sizes of cuts in a graph tell us a lot about its structurefor starters, the weighted degrees of vertices are given by cuts of size |S| = 1. Most ways of defining a cluster of vertices in a graph involve comparing the number of edges in the cut defined by the set of vertices to the number of edges internal to that set.

2.2. Spectral similarity

Motivated by problems in numerical linear algebra and spectral graph theory, Spielman and Teng34 introduced a notion of spectral similarity for two graphs. We will first describe it as a generalization of cut similarity.

Given a weighted graph G = (V, w), we define the Laplacian quadratic form of G to be the function QG from V to given by

If S is a set of vertices and x is the characteristic vector of S (1 inside S and 0 outside), then it is easy to see that

We say two graphs G = (V, w) and = (V, ) are σ-spectrally similar if

Thus, cut similarity can be viewed as the special case of spectral similarity in which we only consider vectors x that take values in {0, 1}.

It is possible to construct graphs that have very similar cuts, but which are highly dissimilar from the spectral perspective; for instance, the n-vertex path is 2-cut similar but only n-spectrally similar to the n-vertex cycle. Although spectral similarity is strictly stronger than cut similarity, it is easier to check if two graphs are spectrally similar. In particular, one can estimate the spectral similarity of two graphs to precision ε in time polynomial in n and log(1/ε), but it is NP-hard to approximately compute the cut-similarity of two graphs.

Graphs that are spectrally similar share many algebraic properties. For example, the effective resistance distances between all pairs of vertices are similar in spectrally similar graphs. The effective resistance distance is defined by viewing each edge in a graph as a resistor: an edge of weight w becomes a resistor of resistance 1/w. The entire graph is then viewed as a resistive circuit, and the effective resistance between two vertices is just the electrical resistance in the network between them. It equals the potential difference induced between the vertices when a unit current is injected at one and extracted at the other. In terms of the Laplacian quadratic form, the effective resistance between vertices u and v may be written as

an identity which follows from the well-known energy minimizing property of electrical flows.

The Laplacian quadratic form provides a natural way to solve regression problems on graphs. In these problems, one is told the values of x on some subset of the nodes S and is asked to infer the values on the remaining nodes. One approach to solving these problems is to view the known values as voltages that have been fixed, and the values at the other nodes as the induced voltages. That is, one seeks the vector x that minimizes QG(x) while agreeing with the known values.36 One can show that if two graphs are spectrally similar, then the solutions to all such regression problems on the graphs will be similar as well.

The problems of regression and computing effective resistances are special cases of the problem that motivated the definition of spectral similarity: the solution of linear equations in Laplacian matrices. The Laplacian quadratic form can be written as

where LG is the Laplacian matrix of G. The Laplacian matrix of a graph G = (V, w) is defined by

The problem of solving systems of linear equations in Laplacian matrices arises in many areas of computational science and optimization. In fact, the spectral similarity measure is identical to the concept of relative condition number in numerical linear algebra. If two graphs are spectrally similar, then through the technique of preconditioning one can use solutions to linear equations in the Laplacian matrix of one graph to solve systems of linear equations in the Laplacian of the other.

2.3. Spectral similarity of matrices

For two symmetric matrices A and B in n×n, we write A B to indicate that

We say A and B are σ-spectrally similar if

We have named this relation spectral similarity because it implies that the two matrices have similar eigenvalues. The Courant-Fisher Theorem tells us that

Thus, if λ1, ..., λn are the eigenvalues of A and 1,...,n are the eigenvalues of B, then for all i, λi/σ ≤ i ≤ σ · λi.

Using this notation, we can now write inequality (1) as

That is, two graphs are σ-spectrally similar if their Laplacian matrices are. We remark that the Laplacian matrix of a graph is (i) symmetric, that is, LG = LTG, (ii) positive semi-definite, that is, all eigenvalues of LG are non-negative, and (iii) (weakly) diagonally dominant, that is, for all i, LG(i, i)≥ ∑ji|LG(i, j)|. From consideration of the Laplacian quadratic form, it is easy to verify that if G is connected, then the null space of LG is just the span of the all 1's vector. Thus all connected graphs have the same Laplacian null space and exactly one zero eigenvalue.

2.4. Distance similarity

It is worth mentioning an interesting alternative to cut- and spectral-similarity. If one assigns a length to every edge in a graph, then these lengths induce a shortest-path distance between every pair of vertices. We say that two different graphs on the same vertex set are σ-distance similar if the distance between each pair of vertices in one graph is within a multiplicative factor of σ of the distance between the corresponding pair of vertices in the other graph. Formally, if G and are the graphs and if distG(u, v) is the distance between vertices u and v in G, then G and are σ-distance similar if for all u, vV,

When is a subgraph of G, the inequality distG(u, v) ≤ dist(u, v) is automatically satisfied. Peleg and Ullman29 defined a t-spanner of a graph G to be a subgraph such that for all u, vV,

They were interested in finding sparse t-spanners. It has been shown4 that every weighed graph has a (2t + 1)-spanner with O(n1 + 1/t) edges. The most extreme form of a sparse spanner is the low stretch spanning tree, which has only n  1 edges, but which only approximately preserves distances on average,1 up to polylogarithmic distortion.

### 3. Finding Sparse Substitutes

A (σ, d)-spectral sparsifier of a graph G is a graph satisfying

1. is σ-spectrally similar to G
2. The edges of consist of reweighted edges of G
3. has at most d|V| edges

Since spectral similarity implies that the total edge weight of a graph is preserved, the spectral sparsifier can only have fewer edges than G if those edges have larger weights.

In this section, we begin by discussing sparsifiers of the complete graph, which have been known for some time. We then describe a sequence of ever-stronger existence theorems, culminating in the statement that any graph G has a (1 + ε, 4/ε2)-spectral sparsifier for every ε ∈ (0, 1).

3.1. Cliques have constant-degree sparsifers

To warm up, let us first examine the quality of the hypercube as a spectral sparsifier of the complete graph. Assume for convenience that n is a power of two. Let G be the complete graph on n vertices. All the non-zero eigenvalues of LG equal n, so for every unit vector x orthogonal to the all-1s vector,

The non-zero eigenvalues of the Laplacian of the hypercube with n vertices in which every edge has weight 1 are (2, ..., 2 log n). Let H be this hypercube, but with edge weights n/(2). The non-zero eigenvalues of the Laplacian of H are then

which implies that for every unit vector x orthogonal to the all-1s vector,

Thus, H is a (, log n/2)-spectral sparsifier of the n-clique.

In fact, the complete graph has much better spectral sparsifiers. Consider the Ramanujan graphs,26, 27 which are d-regular graphs all of whose non-zero Laplacian eigenvalues lie between d  2 and d + 2. If we let be a Ramanujan graph with edge weight n/d, then for every unit vector x orthogonal to the all-1s vector,

Thus, is a ((1  2/d)−1, d/2)-spectral sparsifier of the complete graph G.

3.2. Sampling and decomposition: every graph has a good sparsifier

Ramanujan graphs are members of the family of expander graphs. These are sparse graphs in which every subset of vertices has a significant fraction of its edges connecting to vertices outside the subset (see below for details). As with the hypercube and Ramanujan graphs, we can show that every expander can be rescaled to be a good spectral sparsifier of the complete graph.

It is well known that random sparse graphs are usually good expanders (see Bollobás9 and Friedman17). Therefore, one can obtain a good spectral sparsifier of the complete graph by random sampling. Spielman and Teng34 took this view one step further to show that graph sampling can be used to obtain a good spectral sparsifier for every graph. Their construction was strongly motivated by the work of Benczúr and Karger8 and Achlioptas and McSherry.2

The sampling procedure involves assigning a probability pu,v to each edge (u, v) ∈ G, and then selecting edge (u, v) to be in the graph with probability pu,v. When edge (u, v) is chosen to be in the graph, we multiply its weight by 1/pu,v.

This procedure guarantees that

The key step in this approach is to determine the sampling probability pu,v for each edge; there is a tension between choosing small pu,v to generate a sparser and choosing larger pu,v to more accurately approximate G. Spielman and Teng recognized that some edges are more essential than others, and used a graph decomposition process to implicitly identify these edges and set the sampling probabilities accordingly.

Conductance and graph decomposition. For an unweighted graph G = (V, E) and disjoint subsets S, TV, we let E(S, T) denote the set of edges in E connecting one vertex of S with one vertex of T. We define Vol (S) = ∑i∈Sdi and observe that Vol (V) = 2|E|. We define the conductance of a set S of vertices to be

and we define

The normalized Laplacian matrix of a graph (see [14]), is defined to be

where D is the diagonal matrix whose u-th entry is du. The discrete version3, 31 of Cheeger's11 inequality (Theorem 2) relates the second eigenvalue of the normalized Laplacian to the conductance of a graph.

THEOREM 2 (DISCRETE CHEEGER INEQUALITY).

We define a decomposition of G to be a partition of V into sets (A1, ..., Ak), for some k. The boundary of a decomposition (A1, ..., Ak) is then the set of edges between different vertex sets in the partition:

We say that the decomposition is a φ-decomposition if for all i, where G(Ai) denotes the subgraph induced on Ai. It is a λ-spectral decomposition if the smallest non-zero normalized Laplacian eigenvalue of G(Ai) is at least λ, for all i. By Cheeger's inequality, every φ-decomposition is a (φ2/2)-spectral decomposition.

The following two theorems (see Spielman and Teng34) together imply that for all ε, every graph has a((1 + ε), O−2 log7 n))-spectral sparsifier.

THEOREM 3 (SPECTRAL DECOMPOSITION). Every G has an Ω (log−2n)-spectral decomposition with boundary size at most |E|/2.

THEOREM 4 (SAMPLING WORKS FOR EXPANDERS). Suppose ε ∈ (0, 1/2) and G = (V, E) is an unweighted graph with smallest non-zero normalized Laplacian eigenvalue at least λ. Let = (V, , ) be a graph obtained by sampling the edges of G with probabilities

where

and setting weights (e) = 1/pe for e. Then, with probability at least 1/2, is a (1 + ε)-spectral approximation of G, and the average degree of is O((log n)2 (ελ)−2)

To construct a spectral sparsifier of an arbitrary unweighted graph, we first apply Theorem 3 to find a Ω(1/log2n)-spectral decomposition of the graph in which the boundary has at most half the edges. We then sparsify each of the components by random sampling, and we sparsify the graph formed by the boundary edges recursively. Adding the sparsifiers obtained yields a sparsifier for the original graph, as desired.

3.3. Sampling by effective resistance

By using effective resistances to define the edge sampling probabilities pe, Spielman and Srivastava32 proved that every graph has a ((1 + ε), O(log n2))-spectral sparsifier. These spectral sparsifiers have a similar number of edges to the cut sparsifiers described in Theorem 1, and many fewer edges than those produced by Spielman and Teng34. We define Re, the effective resistance of an edge e, to be the effective resistance between its endpoints. It is well-known that Re is proportional to the commute time between the end-vertices of e,10 and is equal to the probability that e appears in a random spanning tree of G. Spielman and Srivastava proved that sampling with edge probability pe proportional to weRe is the "right" distribution for creating spectral sparsifiers.

THEOREM 5 (SAMPLING BY EFFECTIVE RESISTANCE). For any weighted graph G = (V, E, w) and 0 < ε ≤ 1, let be the graph obtained by the following random process:

Set q = 8n log n2. Choose a random edge of G with probability pe proportional to weRe, and add e to the edge set of with weight we/qpe. Take q samples independently with replacement, summing weights if an edge is chosen more than once.

Then with probability at least 1/2, is a (1 + ε)-spectral approximation of G.

The proof of Theorem 5 is matrix-analytic. We begin by observing that the Laplacian matrix of G can be expressed as a sum of outer products of vectors:

where eu denotes the elementary unit vector in direction u. Since the edges in are a subset of the edges in G, its Laplacian may also be expressed as a (differently weighted) sum of the same outer products:

Suppose the non-zero eigenvalue-and-eigenvector pairs of LG are (Λ1,u1),...,(Λn−1,un−1). Then we can write

Let LG+ be the Moore-Penrose Pseudoinverse of LG, that is,

Then is the projection matrix onto the span of {ui}.

The key to the analysis of Theorem 5, and the improved construction of Section 3.4, is the observation that

where LG+/2 is the square root of LG+. We will show that the sampling procedure described is likely to satisfy this latter condition. To this end, define random variables {se}eE to capture the outcome of the sampling procedure:

Then

and E[s(u,v)] = 1 for all (u, v) ∈ E, whence E[L] = LG. We now write:

where the Yi are i.i.d. random vectors sampled from the distribution

Notice that E[YYT] = Π so the expectation of each term is correct. To analyze the sum, we apply the following concentration theorem for sums of independent rank one matrices due to Rudelson.30

THEOREM 6 (RUDELSON). Let p be a probability distribution over Ω ⊂ d such that supy∈Ω||y||2M and ||E||[yyT]|| ≤ 1. Let y1, ..., yq be independent samples drawn from p with replacement. Then,

To optimize the application of Rudelson's theorem, we choose the {p(u,v)} so that all possible values of Y have the same norm, that is,

for some fixed γ, for all (u, v) ∈ E. An elementary calculation reveals that the value of γ that causes the sum of the probabilities to be one is . Thus if we sample according to probabilities

then we can take M = in Theorem 6. This tells us that q = O(n log n2) samples are sufficient to obtain a (1 + ε)-spectral approximation with high probability, from which Theorem 5 follows.

As stated earlier, the probabilities we have chosen for sampling edges have a natural meaning:

where

is the effective resistance between the vertices u and v.

3.4. Twice-Ramanujan sparsifiers

In a nutshell, Spielman and Srivastava first reduced the sparsification of G = (V, E, w) to the following algebraic problem: compute scalars {su,v≤0|(u, v)∈E} such that ={e|su, v > 0} has small cardinality, and

They then applied sampling, based on effective resistances, to generate the {su,v} and .

Batson et al.7 gave a deterministic polynomial-time algorithm for computing {se} and , and obtained the following theorem, which is essentially the best possible result for spectral sparsification.

THEOREM 7 (BATSON-SPIELMAN-SRIVASTAVA). For every d > 1, every undirected, weighted n-node graph G = (V, E, w) has a

In particular, G has a ((1 + 2ε), 4/ε2)-spectral sparsifier, for every 0 < ε < 1.

At the heart of their construction is the following purely linear algebraic theorem, which may be shown to imply Theorem 7 by an argument similar to that in Section 3.3.

THEOREM 8. Suppose d > 1 and v1, ..., vmn satisfy , where In is the n × n identity matrix. Then, there exist scalars si ≥ 0 with |{i: si ≠ 0}| ≤ dn such that

The proof for Theorem 8 builds the sum ∑isiviviT iteratively, by adding one vector at a time. For = dn, it chooses a sequence of vectors π(1), ..., π() and weights sπ(1),...,sπ(), which in turn defines a sequence of matrices 0 = A0, ..., A, where . Observe that and we always have At At−1. The goal is to control the eigenvalues of At at each step and guarantee that they grow with t in a steady manner, so that the final matrix A = Adn has all eigenvalues within a constant factor of each other and is hence a good approximation to the identity.

Batson, Spielman, and Srivastava use two "barrier" potential functions to guide their choice of π(i) and sπ(i) and ensure steady progress. Specifically, for u, l, and A a symmetric matrix with eigenvalues λ1, ..., λn, they define

When l · In A u · In, small values of these potentials indicate that the eigenvalues of A do not cluster near u or l. This turns out to be a sufficient induction hypothesis to sustain the following iterative process:

(1) Begin by setting the lower barrier l, to −n and the upper barrier, u to n. It can be checked that both potentials are bounded by 1. (2) At each step, increase the upper barrier u by a fixed constant δu and the lower barrier l by another fixed constant δl < δu. It can then be shown that as long as the potentials remain bounded, there must exist at every time t a choice of a vector vπ(i) and a weight sπ(i) so that the addition of to At−1 and the increments ll + δl and uu + δu do not increase either potential and keep all the eigenvalues λi(At) between the barriers. Iterating the above process ensures steady growth of all the eigenvalues and yields Theorem 8.

3.5. Some extensions

In a recent work, de Carli Silva et al.15 extended spectral sparsification to the sums of positive semidefinite matrices that have arbitrary rank. They proved the following theorem.

THEOREM 9. Let B1, ..., Bm be symmetric (or Hermitian) positive semidefinite n × n matrices and . Then for any ε ∈ (0, 1), there exist nonnegative s1, ..., sm, at most O(n2) of which are nonzero, such that

Moreover, {s1, ..., sm} can be computed in O(mn32) time.

Another extension is subgraph spectral sparsification,22 in which one is given a union of two graphs G and W and an integer κ, and asked to find a κ-edge graph Wκ such that G + Wκ is a good spectral sparsifier of G + W. When combined with the best-known construction of low-stretch spanning trees,1 this provides nearly optimal ultra-sparsifiers.

THEOREM 10 (KOLLA, MAKARYCHEV, SABERI, TENG). For each positive integer κ, every n-vertex graph has an O()-spectral approximation with at most (n  1 + κ)-edges.

Ultra-sparsifiers have so few edges that they have a large number of vertices of degree 1 or 2. They are a key component of the algorithms for solving linear equations described in Section 4.1.

### 4. Algorithmic Applications

4.1. Numerical algorithms: Laplacian systems

One of the most fundamental computational problems is that of solving a system of linear equations. One is given an n × n matrix A and an n-dimensional vector b, and is asked to find a vector x such that Ax = b. It is well known that every linear system can be solved by the classic Gaussian elimination method in polynomial time. However, Gaussian elimination usually takes super-linear or even super-quadratic time in the number of non-zero entries of A, making its use impractical for large matrices.

In many real-world applications, the matrix A is sparse and one only requires an approximate solution to the linear system. For example, given a precision parameter ε, we may be asked to produce an such that || Ab ||2 ||b ||2. For sparse positive semi-definite linear systems the fastest general purpose algorithm is the Conjugate Gradient (CG). It essentially solves Ax = b by multiplying a sequence of vectors by A. As the multiplication of a vector by A takes time proportional to the number of non-zero entries in A, CG can run quickly when A is sparse The number of matrix-vector products performed by CG depends on the condition number κ(A) of A, the ratio of its largest eigenvalue to its smallest eigenvalue. It is well-known in numerical analysis19 that it is sufficient to compute O() matrix-vector products to find a solution of accuracy ε.

Preconditioning is the strategy of finding a relatively easily invertible matrix B which is σ-spectrally similar to A, and solving the related system B−1Ax = B−1b. In each iteration, the preconditioned CG algorithm solves a linear system in B and performs a matrix-vector product in A, and only O() iterations are required. If it is easy to solve systems of linear equations in B, then the cost of each iteration is small and this algorithm will run quickly.

In 1990, Vaidya brought graph theory into the picture. Using preconditioners consisting of a maximum spanning tree of a graph plus a small number of carefully chosen edges, Vaidya obtained an O(m1.75 log(1/ε))-time algorithm for solving linear systems in Laplacian matrices with m non-zero entries. The exponent was still too large to be practical, but the idea was powerful. Spielman and Teng33 were able to enhance Vaidya's approach with spectral sparsifiers and low-stretch spanning trees to obtain the first nearly linear time algorithm for solving Laplacian linear systems.

THEOREM 11 (SPIELMAN-TENG). Linear systems in a graph Laplacian LG can be solved to precision ε in time O(m logO(1) n log(1/ε)).

In spite of its strong asymptotic behavior, the large exponent on the log factor makes this algorithm slow in practice. The Spielman-Srivastava sparsification algorithm offered no improvementeffective resistances do give the ideal probabilities with which to sample edges for a sparsifier, but computing them requires solving yet more Laplacian linear systems.

Koutis et al.24 removed this dependency problem by using low-stretch trees to compute less aggressive sampling probabilities which are strictly greater than those suggested by effective resistances. This can be done more quickly, and along with some other elegant ideas and fast data structures, is sufficient to yield a Laplacian linear system solver which runs in time

4.2. Fast sparsification algorithms

The algorithms from Section 3 certified the existence of good sparsifiers, but run quite slowly in practice. Those techniques have been significantly refined, and now there are three major ways to produce sparsifiers quickly.

First, the bottleneck in sampling via effective resistances is approximating the effective resistances themselves. The Laplacian system solver of Koutis, Miller, and Peng described above can be used to calculate those resistances, which can then be used to sample the graph. The best analysis is given by Koutis et al.23 who give an O(m log n log log n log (1/ε)) time algorithm for generating ((1 + ε), O(log3 n2))-spectral sparsifiers.

Second, the decomposition-and-sampling algorithm of Spielman and Teng34 can be sped up by improving the local clusterings used to create a decomposition. In the local clustering problem, one is given a vertex and cluster size as input, and one tries to find a cluster of low conductance near that vertex of size at most the target size, in time proportional to the target cluster size. Faster algorithms for local clustering have been developed by Andersen et al.5 and by Andersen and Peres.6

Third, unions of random spanning trees of a graph G can make good cut-sparsifiers: the union of two random spanning trees is (log n)-cut similar to G,20 while the union of O(log2 n2) random spanning trees, reweighed proportionally to effective resistance, is (1 + ε)-cut similar to G.18 Although it remains to be seen if the union of a small number of random spanning trees can produce a spectral sparsifier, Kapralov and Panigrahy showed that one can build a (1 + ε)-spectral sparsifier of a graph from the union of spanners of O(log4 n4) random subgraphs of G.21

4.3. Network algorithms

Fast algorithms for spectral sparsification and Laplacian systems provide a set of powerful tools for network analysis. In particular, they lead to nearly-linear time algorithms for the following basic graph problems:

APPROXIMATE FIEDLER VECTORS33: Input: G = (V, E, w) and ε > 0. Output: an ε-approximation of the second smallest eigenvalue, λ2(LG), (also known as the Fiedler value) of LG, along with a vector v orthogonal to the all 1s vector such that vTLGv ≤ (1 + ∈)λ2(LG).

ELECTRICAL FLOWS13,32,33: Input: G = (V, E, w) where weights are resistances, s, tV, and ε > 0. Output: an ε-approximation of the electrical flows over all edges when 1 unit of flow is injected into s and extracted from t.

EFFECTIVE RESISTANCE APPROXIMATION32: Input: G = (V, E, w) where weights are resistances and ε > 0. Output: a data structure for computing an ε-approximation of the effective resistance of any pair of vertices in G. Data Structure: O(n log n2) space, and O(log/ε2n) query time, and O(m log2 n log log2 n2) preprocessing time.

LEARNING FROM LABELED DATA ON A GRAPH35: Input a strongly connected (aperiodic) directed graph G = (V, E) and a labeling function y, where y assigns a label from a label set Y = {1, 1} to each vertex of a subset SV and 0 to vertices in V  S, and a parameter μ. Output the function f: V that minimizes

where

and π is the stationary distribution of the random walk on the graph with transition probability function p.

COVER TIME OF RANDOM WALK16: Input: G = (V, E), Output: a constant approximation of the cover time for random walks.

The algorithm for ELECTRICAL FLOWS led to a breakthrough for the following fundamental combinatorial optimization problem, for which the best previously known algorithm ran in time Õ(mn1/2/ε), Õ(mn2/3−1) and Õ(m3/2−1).

MAXIMUM FLOWS AND MINIMUM CUTS13: Input: G = (V, E, w) where w are capacities, s, tV, and ε > 0, Output: an ε-approximation of s-t maximum flow and minimum cut. Algorithm: Õ(mn1/3 · poly (1/ε)) time.

Spectral graph sparsification also played a role in understanding other network phenomena. For example, Chierichetti et al.12 discovered a connection between rumor spreading in a network and the spectral sparsification procedure of Spielman and Teng,34 and applied this connection to bound the speed of rumor spreading that arises in social networks.

### 5. Open Question

The most important open question about spectral sparsification is whether one can design a nearly linear time algorithm that computes (σ, d)-spectral sparsifiers for any constants σ and d. The algorithms based on Theorem 7 are polynomial time, but slow. All of the nearly-linear time algorithms of which we are aware produce sparsifiers with d logarithmic in n.

### 6. Conclusion

Spectral sparsification has proved to be a remarkably useful tool in algorithm design, linear algebra, combinatorial optimization, machine learning, and network analysis. Theorem 8 has already been applied many times within pure mathematics (see, e.g., Naor28). We hope this body of work will encourage more exchange of ideas between numerical analysis, pure mathematics and theoretical computer science, and inspire and enable the development of faster algorithms and novel analyses of network phenomena.

### References

1. Abraham, I., Neiman, O. Using petal-decompositions to build a low stretch spanning tree. In (2012).

2. Achlioptas, D., Mcsherry, F. Fast computation of low-rank matrix approximations., 2 (2007), 9.

3. Alon, N., Milman, V.D. λ1, isoperimetric inequalities for graphs, and superconcentrators., 1 (1985), 7388.

4. Althöfer, I., Das, G., Dobkin, D., Joseph, D., Soares, J. On sparse spanners of weighted graphs. (1993), 81100, 10.1007/BF02189308.

5. Andersen, R., Chung, F., Lang, K. Local graph partitioning using pagerank vectors. In (2006), 475486.

6. Andersen, R., Yuval, P. Finding sparse cuts locally using evolving sets. In (2009), ACM, 235244.

7. Batson, J.D., Spielman, D.A., Srivastava, N. Twice-Ramanujan sparsifiers. 6 (2012), 17041721.

8. Benczúr, A.A., Karger, D.R. Approximating s-t minimum cuts in O(n2) time. In (1996), 4755.

9. Bollobás, B. The isoperimetric number of random regular graphs., 3 (May 1988), 241244.

10. Chandra, A.K., Raghavan, P., Ruzzo, W.L., Smolensky, R., Tiwari, P. The electrical resistance of a graph captures its commute and cover times. In (1989), ACM, New York, NY, USA, 574586.

11. Cheeger, J. A lower bound for smallest eigenvalue of Laplacian. In (1970), Princeton University Press, 195199.

12. Chierichetti, F., Lattanzi, S., Panconesi, A. Rumour spreading and graph conductance. In (2010), 16571663.

13. Christiano, P., Kelner, J.A., Madry, A., Spielman, D.A., Teng, S.H. Electrical flows, Laplacian systems, and faster approximation of maximum flow in undirected graphs. In (2011), 273282.

14. Chung, F.R.K. Spectral graph theory. CBMS Regional Conference Series in Mathematics, American Mathematical Society, 1997.

15. de Carli Silva, M.K., Harvey, N.J.A., Sato, C.M. Sparse sums of positive semidefinite matrices. (2011), abs/1107.0088.

16. Ding, J., Lee, J.R., Peres, Y. Cover times, blanket times, and majorizing measures. In (2011), 6170.

17. Friedman, J. A Proof of Alon's Second Eigenvalue Conjecture and Related Problems. Number 910 in Memoirs of the American Mathematical Society. American Mathematical Society, 2008.

18. Fung, W.S., Hariharan, R., Harvey, N.J.A., Panigrahi, D. A general framework for graph sparsification. In (2011), 7180.

19. Golub, G.H., Van Loan, C.F. Matrix Computations, 2nd edn, Johns Hopkins University Press, 1989.

20. Goyal, N., Rademacher, L., Vempala, S. Expanders via random spanning trees. In (2009), 576585.

21. Kapralov, M., Panigrahy, R. Spectral sparsification via random spav nners. In (2012), 393398.

22. Kolla, A., Makarychev, Y., Saberi, A., Teng, S.H. Subgraph sparsification and nearly optimal ultrasparsifiers. In (2010), 5766.

23. Koutis, I., Levin, A., Peng, R. Improved spectral sparsification and numerical algorithms for SDD matrices. In (2012), 266277.

24. Koutis, I., Miller, G.L., Peng, R. A nearly-m log n time solver for SDD linear systems. In (2011), 590598.

25. Leiserson, C. Fat-trees: universal networks for hardware-efficient supercomputing., 10 (0ctober1985), 892901.

26. Lubotzky, A., Phillips, R., Sarnak, P. Ramanujan graphs., 3 (1988), 261277.

27. Margulis, G.A. Explicit group theoretical constructions of combinatorial schemes and their application to the design of expanders and concentrators., 1 (July 1988), 3946.

28. Naor, A. Sparse quadratic forms and their geometric applications. (2011), 1033.

29. Peleg, D., Ullman, J.D. An optimal synchronizer for the hypercube., 4 (1989), 740747.

30. Rudelson, M. Random vectors in the isotropic position., 1 (1999), 6072.

31. Sinclair, A., Jerrum, M. Approximate counting, uniform generation and rapidly mixing Markov chains. 1 (July 1989), 93133.

32. Spielman, D.A., Srivastava, N. Graph sparsification by effective resistances. 6 (2011), 19131926.

33. Spielman, D.A., Teng, S.H. Nearly-linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. (2008), abs/cs/0607105.

34. Spielman, D.A., Teng, S.H. Spectral sparsification of graphs., 4 (2011), 9811025.

35. Zhou, D., Huang, J., Scholkopf, B. Learning from labeled and unlabeled data on a directed graph. In (2005), 10411048.

36. Zhu, X., Ghahramani, Z., Lafferty, J.D. Semi-supervised learning using Gaussian fields and harmonic functions. In (2003).

### Authors

Joshua Batson, Mathematics, MIT.

Daniel A. Spielman, Computer Science & Applied Mathematics, Yale University.

Nikhil Srivastava, Microsoft Research, Bangalore.

Shang-Hua Teng, Computer Science, USC.

### Footnotes

A previous version of the paper, "Twice-Ramanujan Sparsifiers," was published in the Proceedings of the 41st Annual ACM Symposium on the Theory of Computing (2009), 255262.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

No entries found