Conference Proceedings | Yuichi Yoshida

# Conference Proceedings

## 2019

• Testability of Homomorphism Inadmissibility: Property Testing Meets Database Theory. Hubie Chen and Yuichi Yoshida. PODS 2019.

In this paper, we utilize the perspective of property testing to consider the testability of relational database queries. A primary motivation is the desire to avoid reading an entire database to decide a property thereof. We focus on conjunctive queries, which are the most basic and heavily studied database queries. Each conjunctive query can be represented as a relational structure A such that deciding if the conjunctive query is satisfied by a relational structure B is equivalent to deciding if there exists a homomorphism from A to B. We phrase our results in terms of homomorphisms. Precisely, we study, for each relational structure A, the testability of homomorphism inadmissibility from A. We consider algorithms that have oracle access to an input relational structure B and that distinguish, with high probability, the case where there is no homomorphism from A to B, from the case where one needs to remove a constant fraction of tuples from B in order to suppress all such homomorphisms. We provide a complete characterization of the structures A from which one can test homomorphism inadmissibility with one-sided error by making a constant number of queries to B. Our characterization shows that homomorphism inadmissibility from A is constant-query testable with one-sided error if and only if the core of A is $$\alpha$$-acyclic. We also show that the injective version of the problem is constant-query testable with one-sided error if A is $$\alpha$$-acyclic; this result generalizes existing results for testing subgraph-freeness in the general graph model.

Sublinear-Time Algorithms,Constraint Satisfaction Problems
• Cheeger Inequalities for Submodular Transformations. Yuichi Yoshida. SODA 2019. [arxiv]

The Cheeger inequality for undirected graphs, which relates the conductance of an undirected graph and the second smallest eigenvalue of its normalized Laplacian, is a cornerstone of spectral graph theory. The Cheeger inequality has been extended to directed graphs and hypergraphs using normalized Laplacians for those, that are no longer linear but piecewise linear transformations. In this paper, we introduce the notion of a submodular transformation $$F:\{0,1\}^n\to\mathbb{R}^m$$, which applies $$m$$ submodular functions to the $$n$$-dimensional input vector, and then introduce the notions of its Laplacian and normalized Laplacian. With these notions, we unify and generalize the existing Cheeger inequalities by showing a Cheeger inequality for submodular transformations, which relates the conductance of a submodular transformation and the smallest non-trivial eigenvalue of its normalized Laplacian. This result recovers the Cheeger inequalities for undirected graphs, directed graphs, and hypergraphs, and derives novel Cheeger inequalities for mutual information and directed information. Computing the smallest non-trivial eigenvalue of a normalized Laplacian of a submodular transformation is NP-hard under the small set expansion hypothesis. In this paper, we present a polynomial-time $$O(\log n)$$-approximation algorithm for the symmetric case, which is tight, and a polynomial-time $$O(\log^2 n+\log n \cdot \log m)$$-approximation algorithm for the general case. We expect the algebra concerned with submodular transformations, or submodular algebra, to be useful in the future not only for generalizing spectral graph theory but also for analyzing other problems that involve piecewise linear transformations, e.g., deep learning.

Submodular Functions,Spectral Graph Theory
• Spectral Sparsification of Hypergraphs. Tasuku Soma and Yuichi Yoshida. SODA 2019. [arXiv]

For an undirected/directed hypergraph $$G=(V,E)$$, its Laplacian $$L_G:\mathbb{R}^V\to\mathbb{R}^V$$ is defined such that its quadratic form'' $$\mathbf{x}^\top L_G(\mathbf{x})$$ captures the cut information of $$G$$. In particular, $$\mathbf{1}^\top_S L_G(\mathbf{1}_S)$$ coincides with the cut size of $$S\subseteq V$$, where $$\mathbf{1}_S \in \mathbb{R}^V$$ is the characteristic vector of $$S$$. A weighted subgraph $$H$$ of a hypergraph $$G$$ on a vertex set $$V$$ is said to be an $$\epsilon$$-spectral sparsifier of $$G$$ if $$(1-\epsilon)\mathbf{x}^\top L_H(\mathbf{x})\leq \mathbf{x}^\top L_G(\mathbf{x}) \leq (1+\epsilon)\mathbf{x}^\top L_H(\mathbf{x})$$ holds for every $$\mathbf{x} \in \mathbb{R}^V$$. In this paper, we present a polynomial-time algorithm that, given an undirected/directed hypergraph $$G$$ on $$n$$ vertices, constructs an $$\epsilon$$-spectral sparsifier of $$G$$ with $$O(n^3\log n/\epsilon^2)$$ hyperedges/hyperarcs. The proposed spectral sparsification can be used to improve the time and space complexities of algorithms for solving problems that involve the quadratic form, such as computing the eigenvalues of $$L_G$$, computing the effective resistance between a pair of vertices in $$G$$, semi-supervised learning based on $$L_G$$, and cut problems on $$G$$. In addition, our sparsification result implies that any submodular function $$f:2^V\to \mathbb{R}_+$$ with $$f(\emptyset)=f(V)=0$$ can be concisely represented by a directed hypergraph. Accordingly, we show that, for any distribution, we can properly and agnostically learn submodular functions $$f:2^V\to [0,1]$$ with $$f(\emptyset)=f(V)=0$$, with $$O(n^4\log(n/\epsilon)/\epsilon^4)$$ samples.

Submodular Functions,Spectral Graph Theory

## 2018

• 0/1/all CSPs, Half-Integral A-path Packing, and Linear-Time FPT Algorithms. Yoichi Iwata, Yutaro Yamaguchi, and Yuichi Yoshida. FOCS 2018. [arxiv]

A recent trend in the design of FPT algorithms is exploiting the half-integrality of LP relaxations. In other words, starting with a half-integral optimal solution to an LP relaxation, we assign integral values to variables one-by-one by branch and bound. This technique is general and the resulting time complexity has a low dependency on the parameter. However, the time complexity often becomes a large polynomial in the input size because we need to compute half-integral optimal LP solutions. In this paper, we address this issue by providing an $$O(km)$$-time algorithm for solving the LPs arising from various FPT problems, where $$k$$ is the optimal value and m is the number of edges/constraints. Our algorithm is based on interesting connections among 0/1/all constraints, which has been studied in the field of constraints satisfaction, $$A$$-path packing, which has been studied in the field of combinatorial optimization, and the LPs used in FPT algorithms. With the aid of this algorithm, we obtain improved FPT algorithms for various problems, including Group Feedback Vertex Set, Subset Feedback Vertex Set, Node Multiway Cut, Node Unique Label Cover, and Non-monochromatic Cycle Transversal. The obtained running time for each of these problems is linear in the input size and has the current smallest dependency on the parameter. In particular, these algorithms are the first linear-time FPT algorithms for problems including Group Feedback Vertex Set and Non-monochromatic Cycle Transversal.

Constraint Satisfaction Problems,FPT
• We design a sublinear-time approximation algorithm for quadratic function minimization problems with a better error bound than the previous algorithm by Hayashi and Yoshida (NIPS’16). Our approximation algorithm can be modified to handle the case where the minimization is done over a sphere. The analysis of our algorithms is obtained by combining results from graph limit theory, along with a novel spectral decomposition of matrices. Specifically, we prove that a matrix A can be decomposed into a structured part and a pseudorandom part, where the structured part is a block matrix with a polylogarithmic number of blocks, such that in each block all the entries are the same, and the pseudorandom part has a small spectral norm, achieving better error bound than the existing decomposition theorem of Frieze and Kannan (FOCS’96). As an additional application of the decomposition theorem, we give a sublinear-time approximation algorithm for computing the top singular values of a matrix.

Sublinear-Time Algorithms
• In a cooperative game, the utility of a coalition of players is given by the characteristic function, and the goal is to find a stable value division of the total utility to the players. In real-world applications, however, multiple scenarios could exist, each of which determines a characteristic function, and which scenario is more important is unknown. To handle such situations, the notion of multi-scenario cooperative games and several solution concepts have been proposed. However, computing the value divisions in those solution concepts is intractable in general.

To resolve this issue, we focus on supermodular two-scenario cooperative games in which the number of scenarios is two and the characteristic functions are supermodular and study the computational aspects of a major solution concept called the preference core. First, we show that we can compute the value division in the preference core of a supermodular two-scenario game in polynomial time. Then, we reveal the relations among preference cores with different parameters. Finally, we provide more efficient algorithms for deciding the non-emptiness of the preference core for several specific supermodular two-scenario cooperative games such as the generalized induced subgraph game, airport game, and multicast tree game.

Submodular Functions,Game Theory
• In monotone submodular function maximization, approximation guarantees based on the curvature of the objective function have been extensively studied in the literature. However, the notion of curvature is often pessimistic, and we rarely obtain improved approximation guarantees, even for very simple objective functions. In this paper, we provide a novel approximation guarantee by extracting an $$M^\natural$$-concave function $$h:2^E \to \mathbb{R}_+$$, a notion in discrete convex analysis, from the objective function $$f:2^E\to \mathbb{R}_+$$. We introduce the notion of $$h$$-curvature, which measures how much $$f$$ deviates from $$h$$, and show that we can obtain a $$(1−\gamma/e-\epsilon)$$-approximation to the problem of maximizing $$f$$ under a cardinality constraint in polynomial time for any constant $$\epsilon>0$$. Then, we show that we can obtain nontrivial approximation guarantees for various problems by applying the proposed algorithm.

Submodular Functions
• Spectral Normalization for Generative Adversarial Networks. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. ICLR 2018 (oral). [code] [article by Christian Cosgrove] [Available in pytorch]

One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.

Machine Learning
• Statistically Efficient Estimation for Non-Smooth Probability Densities. Masaaki Imaizumi, Takanori Maehara, and Yuichi Yoshida. AISTATS 2018 (best paper award, oral).

We investigate statistical efficiency of estimators for non-smooth density functions. The density estimation problem appears in various situations, and it is intensively used in statistics and machine learning. The statistical efficiencies of estimators, i.e., their convergence rates, play a central role in advanced statistical analysis. Although estimators and their convergence rates for smooth density functions are well investigated in the literature, those for non-smooth density functions remain elusive despite their importance in application fields. In this paper, we propose new estimators for non-smooth density functions by employing the notion of Szemerédi partitions from graph theory. We derive convergence rates of the proposed estimators. One of them has the optimal convergence rate in minimax sense, and the other has slightly worse convergence rate but runs in polynomial time. Experimental results support the theoretical performance of our esti- mators.

Machine Learning
• Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization. Fanhua Shang, Yuanyuan Liu, Kaiwen Zhou, James Cheng, Kelvin Kai Wing Ng, and Yuichi Yoshida. AISTATS 2018.

In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRG-SD and SAGA-SD as a byproduct. We introduce a coefficient to scale current iterate and to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or even move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regres- sion. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that both of our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.

Machine Learning
• Using k-way Co-occurrences for Learning Word Embeddings. Danushka Bollegala, Yuichi Yoshida, and Ken-ichi Kawarabayashi. AAAI 2018. [arXiv]

Co-occurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning has used co-occurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and co-occurring in the same context. We extend the notion of co-occurrences to cover $$k(\geq 2)$$-way co-occurrences among a set of $$k$$-words. Specifically, we prove a theoretical relationship between the joint probability of $$k(\geq 2)$$ words, and the sum of $$\ell_2$$ norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises $$k$$-way cooccurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller $$k(\leq 5)$$ values, $$k$$-way embeddings perform comparably or better than 2-way embeddings in a range of tasks.

Natural Language Processing

## 2017

• Fitting Low-Rank Tensors in Constant Time. Kohei Hayashi and Yuichi Yoshida. NIPS 2017 (spotlight).

In this paper, we develop an algorithm that approximates the residual error of Tucker decomposition, one of the most popular tensor decomposition methods, with a provable guarantee. Given an order-$$K$$ tensor $$X\in\mathbb{R}^{N_1\times\cdots\times N_K}$$, our algorithm randomly samples a constant number $$s$$ of indices for each mode and creates a mini'' tensor $$\tilde{X}\in\mathbb{R}^{s\times\cdots\times s}$$, whose elements are given by the intersection of the sampled indices on $$X$$. Then, we show that the residual error of the Tucker decomposition of $$\tilde{X}$$ is sufficiently close to that of $$X$$ with high probability. This result implies that we can figure out how much we can fit a low-rank tensor to $$X$$ in constant time, regardless of the size of $$X$$. This is useful for guessing the favorable rank of Tucker decomposition. Finally, we demonstrate how the sampling method works quickly and accurately using multiple real datasets.

Sublinear-Time Algorithms,Machine Learning
• Streaming Algorithms for Maximizing Monotone Submodular Functions under a Knapsack Constraint. Chien-Chung Huang, Naonori Kakimura and Yuichi Yoshida. APPROX 2017.

In this paper, we consider the problem of maximizing a monotone submodular function subject to a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access only to a small fraction of the data stored in primary memory. For this problem, we propose a $$(0.363 - \epsilon)$$-approximation algorithm, requiring only a single pass through the data; moreover, we propose a $$(0.4 - \epsilon)$$-approximation algorithm requiring a constant number of passes through the data. The required memory space of both algorithms depends only on the size of the knapsack capacity and ε.

Submodular Functions
• Landmark indexing for Evaluation of Label-Constrained Reachability Queries. Lucien Valstar, George Fletcher and Yuichi Yoshida. SIGMOD 2017. [slides] [code]

Consider a directed edge-labeled graph, such as a social net- work or a citation network. A fundamental query on such data is to determine if there is a path in the graph from a given source vertex to a given target vertex, using only edges with labels in a restricted subset of the edge labels in the graph. Such label-constrained reachability (LCR) queries play an important role in graph analytics, for example, as a core fragment of the so-called regular path queries which are supported in practical graph query languages such as the W3C’s SPARQL 1.1, Neo4j’s Cypher, and Oracle’s PGQL. Current solutions for LCR evaluation, however, do not scale to large graphs which are increasingly common in a broad range of application domains. In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmark-based indexes for large graphs. We show through extensive experiments that our indexes are significantly smaller than state-of-the-art LCR indexing techniques, while supporting up to orders of magnitude faster query evaluation times. Our complete C++ codebase is available as open source for further research.

Network Algorithms
• Portfolio Optimization for Influence Spread. Naoto Ohsaka and Yuichi Yoshida. WWW 2017. [slides (by Ohsaka)]

Motivated by viral marketing, stochastic diffusion processes that model influence spread on a network have been studied intensively. The primary interest in such models has been to find a seed set of a fixed size that maximizes the expected size of the cascade from it. Practically, however, it is not desirable to have the risk of ending with a small cascade, even if the expected size of the cascade is large. To address this issue, we adopt conditional value at risk (CVaR) as a risk measure, and propose an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR. Using real-world social networks, we demonstrate that the portfolio computed by our algorithm has a significantly better CVaR than seed sets computed by other baseline methods.

Network Algorithms,Submodular Functions
• Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective func-tions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a nontrivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable trade-off between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.

Submodular Functions
• Non-monotone DR-Submodular Function Maximization. Tasuku Soma and Yuichi Yoshida. AAAI 2017. [arXiv] [poster]

We consider non-monotone DR-submodular function maximization, where DR-submodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing non-monotone DR-submodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a $$1/(2+\epsilon)$$-approximation algorithm with a running time of roughly $$O(n/\epsilon \log^2B)$$, where $$n$$ is the size of the ground set, B is the maximum value of a coordinate, and $$\epsilon > 0$$ is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and real-world datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

Submodular Functions
• Computing Least Cores of Supermodular Cooperative Games. Daisuke Hatano and Yuichi Yoshida. AAAI 2017

One of the goals of a cooperative game is to compute a valuedivision to the players from which they have no incentive todeviate. This concept is formalized as the notion of the core.To obtain a value division that motivates players to cooperate to a greater extent or that is more robust under noise, the notions of the strong least core and the weak least core have been considered. In this paper, we characterize the strong and the weak least cores of supermodular cooperative games using the theory of minimizing crossing submodular functions. We then apply our characterizations to two representative supermodular cooperative games, namely, the induced subgraph game generalized to hypergraphs and the airport game. For these games, we derive explicit forms of the strong and weak least core values, and provide polynomial-time algorithms that compute value divisions in the strong and weak least cores.

Submodular Functions,Game Theory
• Random-Radius Ball Method for Estimating Closeness Centrality. Wataru Inariba, Takuya Akiba and Yuichi Yoshida. AAAI 2017.

In the analysis of real-world complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distance-based centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the random-radius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on real-world networks.

Network Algorithms

## 2016

• Minimizing Quadratic Functions in Constant Time. Kohei Hayashi and Yuichi Yoshida. NIPS 2016. [arXiv] [poster]

A sampling-based optimization method for quadratic functions is proposed. Our method approximately solves the following $$n$$-dimensional quadratic minimization problem in constant time, which is independent of $$n$$: $$z^*=\min_{\mathbf{v} \in \mathbb{R}^n}\langle \mathbf{v},A\mathbf{v}\rangle+n\langle \mathbf{v}\mathrm{diag}(\mathbf{d})\mathbf{v}\rangle+n\langle \mathbf{b},\mathbf{v}\rangle$$, where $$A \in \mathbb{R}^{n \times n}$$ a matrix and $$\mathbf{d},\mathbf{b} \in \mathbb{R}^n$$ are vectors. Our theoretical analysis specifies the number of samples $$k(\delta,\epsilon)$$ such that the approximated solution $$z$$ satisfies $$|z-z^*|=O(\epsilon n^2)$$ with probability $$1-\delta$$. The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.

Sublinear-Time Algorithms,Machine Learning
• Testing Assignments to Constraint Satisfaction Problems. Hubie Chen, Matt Valeriote and Yuichi Yoshida. FOCS 2016. [arXiv] [slides]

For a finite relational structure $$A$$, let $$\mathrm{CSP}(A)$$ denote the CSP instances whose constraint relations are taken from $$A$$. The resulting family of problems $$\mathrm{CSP}(A)$$ has been considered heavily in a variety of computational contexts. In this article, we consider this family from the perspective of property testing: given an instance of a CSP and query access to an assignment, one wants to decide whether the assignment satisfies the instance, or is far from so doing. While previous work on this scenario studied concrete templates or restricted classes of structures, this article presents comprehensive classification theorems. Our first contribution is a dichotomy theorem completely characterizing the structures $$A$$ such that $$\mathrm{CSP}(A)$$ is constant-query testable: (i) If $$A$$ has a majority polymorphism and a Maltsev polymorphism, then $$\mathrm{CSP}(A)$$ is constant-query testable with one-sided error. (ii) Else, testing $$\mathrm{CSP}(A)$$ requires a super-constant number of queries. Let $$\exists\mathrm{CSP}(A)$$ denote the extension of $$\mathrm{CSP}(A)$$ to instances which may include existentially quantified variables. Our second contribution is to classify all structures $$A$$ in terms of the number of queries needed to test assignments to instances of $$\exists\mathrm{CSP}(A)$$, with one-sided error. More specifically, we show the following trichotomy (i) If $$A$$ has a majority polymorphism and a Maltsev polymorphism, then $$\exists\mathrm{CSP}(A)$$ is constant-query testable with one-sided error. (ii) Else, if A has a $$(k + 1)$$-ary near-unanimity polymorphism for some $$k \geq 2$$, and no Maltsev polymorphism then $$\exists\mathrm{CSP}(A)$$ is not constant-query testable (even with two-sided error) but is sublinear-query testable with one-sided error. (iii) Else, testing $$\exists\mathrm{CSP}(A)$$ with one-sided error requires a linear number of queries.

Constraint Satisfaction Problems,Sublinear-Time Algorithms
• Dynamic Influence Analysis in Evolving Networks. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and Ken-ichi Kawarabayashi. Proceedings of the VLDB Endowment 2016. [slides (by Ohsaka)] [code]

We propose the first real-time fully-dynamic index data structure designed for influence analysis on evolving networks. With this aim, we carefully redesign the data structure of the state-of-the-art sketching method introduced by Borgs et al., and construct corresponding update algorithms. Using this index, we present algorithms for two kinds of queries, influence estimation and influence maximization, which are strongly motivated by practical applications, such as viral marketing. We provide a thorough theoretical analysis, which guarantees the non-degeneracy of the solution accuracy after an arbitrary number of updates. Furthermore, we introduce a reachability-tree-based technique and a skipping method, which greatly reduce the time consumption required for edge/vertex deletions and vertex additions, respectively, and counter-based random number generators, which improve the space efficiency. Experimental evaluations using real dynamic networks with tens of millions of edges demonstrate the efficiency, scalability, and accuracy of our proposed indexing scheme. Specifically, it can reflect a graph modification within a time of several orders of magnitude smaller than that required to reconstruct an index from scratch, estimate the influence spread of a vertex set accurately within a millisecond, and select highly influential vertices at least ten times faster than state-of-the-art static algorithms.

Network Algorithms
• Efficient Algorithms for Spanning Tree Centrality. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. IJCAI 2016.

In a connected graph, spanning tree centralities of a vertex and an edge measure how crucial they are for the graph to be connected. In this paper, we propose efficient algorithms for estimating spanning tree centralities with theoretical guarantees on their accuracy. We experimentally demonstrate that our methods are orders of magnitude faster than previous methods. Then, we propose a novel centrality notion of a vertex, called aggregated spanning tree centrality, which also considers the number of connected components obtained by removing the vertex. We also give an efficient algorithm for estimating aggregated spanning tree cen- trality. Finally, we experimentally show that those spanning tree centralities are useful to identify vulnerable edges and vertices in infrastructure networks.

Network Algorithms
• The problem of maximizing non-negative monotone submodular functions under a certain constraint has been intensively studied in the last decade. In this paper, we address the problem for functions defined over the integer lattice. Suppose that a non-negative monotone submodular function $$f:\mathbb{Z}^n_+ \to \mathbb{R}_+$$ is given via an evaluation oracle. Assume further that $$f$$ satisfies the diminishing return property, which is not an immediate consequence of the submodularity when the domain is the integer lattice. Then, we show polynomial-time $$(1-1/e-\epsilon)$$-approximation algorithm for cardinality constraints, polymatroid constraints, and knapsack constraints. For a cardinality constraint, we also show a $$(1-1/e-\epsilon)$$-approximation algorithm with slightly worse time complexity that does not rely on the diminishing return property. Our algorithms for a polymatroid constraint and a knapsack constraint first extend the domain of the objective function to the Euclidean space and then run the continuous greedy algorithm. We give two different kinds of continuous extensions, one is for polymatroid constraints and the other is for knapsack constraints, which might be of independent interest.

Submodular Functions
• In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.

Network Algorithms,Submodular Functions,Spectral Graph Theory
• We consider stable signal recovery in $$\ell_q$$ quasi-norm for $$0 < q \leq 1$$. In this problem, given a measurement vector $$y = Ax$$ for some unknown signal vector $$x \in \mathbb{R}^n$$ and a known matrix $$A \in \mathbb{R}^{m \times n}$$, we want to recover $$z \in \mathbb{R}^n$$ with $$\|x - z\|_q = O(\|x - x^*\|_q)$$ from a measurement vector, where $$x^*$$ is the s-sparse vector closest to $$x$$ in $$\ell_q$$ quasi-norm. Although a small value of $$q$$ is favorable for measuring the distance to sparse vectors, previous methods for $$q < 1$$ involve $$\ell_q$$ quasi-norm minimization which is computationally intractable. In this paper, we overcome this issue by using the sum-of-squares method, and give the first polynomial-time stable recovery scheme for a large class of matrices $$A$$ in $$\ell_q$$ quasi-norm for any fixed constant $$0 < q \leq 1$$.

• Let $$\{f_i:\mathbb{F}_p^i\to \{0,1\}\}$$ be a sequence of functions, where $$p$$ is a fixed prime and $$\mathbb{F}_p$$ is the finite field of order $$p$$. The limit of the sequence can be syntactically defined using the notion of ultralimit. Inspired by the Gowers norm, we introduce a metric over limits of function sequences, and study properties of it. One application of this metric is that it provides a simpler characterization of affine-invariant parameters of functions that are constant-query estimable than the previous one obtained by Yoshida (STOC'14). Using this characterization, we show that the property of being a function of a constant number of low-degree polynomials and a constant number of factored polynomials (of arbitrary degrees) is constant-query testable if it is closed under blowing-up. Examples of this property include the property of having a constant spectral norm and degree-structural properties with rank conditions.

Boolean Functions,Sublinear-Time Algorithms
• Improved Approximation Algorithms for k-Submodular Function Maximization. Satoru Iwata, Shin-ichi Tanigawa and Yuichi Yoshida. SODA 2016. [arXiv]

This paper presents a polynomial-time $$1/2$$-approximation algorithm for maximizing nonnegative k-submodular functions. This improves upon the previous $$\max\{1/3, 1/(1 + a)\}$$-approximation by Ward and Živný [18], where $$a = \max\{1,\sqrt{(k-1)/4}\}$$. We also show that for monotone $$k$$-submodular functions there is a polynomial-time $$k/(2k – 1)$$-approximation algorithm while for any $$\epsilon > 0$$ a $$((k + 1)/2k + \epsilon)$$-approximation algorithm for maximizing monotone $$k$$-submodular functions would require exponentially many queries. In particular, our hardness result implies that our algorithms are asymptotically tight. We also extend the approach to provide constant factor approximation algorithms for maximizing skewbisubmodular functions, which were recently introduced as generalizations of bisubmodular functions.

Submodular Functions

## 2015

• We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly $$O(n\log(nr)\log r)$$, where $$n$$ is the size of the ground set and $$r$$ is the maximum value of a coordinate. The dependency on $$r$$ is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.

Submodular Functions
• A $$k$$-submodular function is a generalization of a submodular function, where the input consists of $$k$$ disjoint subsets, instead of a single subset, of the domain. Many machine learning problems, including influence maximization with $$k$$ kinds of topics and sensor placement with $$k$$ kinds of sensors, can be naturally modeled as the problem of maximizing monotone $$k$$-submodular functions.In this paper, we give constant-factor approximation algorithms for maximizing monotone $$k$$-submodular functions subject to several size constraints.The running time of our algorithms are almost linear in the domain size.We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.

Submodular Functions
• Fully Dynamic Betweenness Centrality Maintenance on Massive Networks. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. Proceedings of the VLDB Endowment 2015. [code]

Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Among several importance measures, betweenness centrality, in particular, plays key roles in many real applications. Considerable effort has been made for developing algorithms for static settings. However, real networks today are highly dynamic and are evolving rapidly, and scalable dynamic methods that can instantly reflect graph changes into centrality values are required. In this paper, we present the first fully dynamic method for managing betweenness centrality of all vertices in a large dynamic network. Its main data structure is the weighted hyperedge representation of shortest paths called hypergraph sketch. We carefully design dynamic update procedure with theoretical accuracy guarantee. To accelerate updates, we further propose two auxiliary data structures called two-ball index and special-purpose reachability index. Experimental results using real networks demonstrate its high scalability and efficiency. In particular, it can reflect a graph change in less than a millisecond on average for a large-scale web graph with 106M vertices and 3.7B edges, which is several orders of magnitude larger than the limits of previous dynamic methods.

Network Algorithms
• On the Equivalence among Problems of Bounded Width. Yoichi Iwata and Yuichi Yoshida. ESA 2015. [arXiv]

In this paper, we introduce a methodology, called decomposition-based reductions, for showing the equivalence among various problems of bounded-width. First, we show that the following are equivalent for any $$α > 0$$:

• SAT can be solved in $$O^*(2^{\alpha \mathrm{tw}})$$ time,
• 3-SAT can be solved in $$O^*(2^{\alpha \mathrm{tw}})$$ time,
• Max 2-SAT can be solved in $$O^*(2^{\alpha\mathrm{tw}})$$ time,
• Independent Set can be solved in $$O^*(2^{\alpha\mathrm{tw}})$$ time, and
• Independent Set can be solved in $$O^*(2^{\alpha\mathrm{cw}})$$ time,

where $$\mathrm{tw}$$ and $$\mathrm{cw}$$ are the tree-width and clique-width of the instance, respectively. Then, we introduce a new parameterized complexity class EPNL, which includes Set Cover and TSP, and show that SAT, 3-SAT, Max 2-SAT, and Independent Set parameterized by path-width are EPNL-complete. This implies that if one of these \textup{EPNL}-complete problems can be solved in $$O^*(c^k)$$ time, then any problem in \textup{EPNL} can be solved in $$O^*(c^k)$$ time.

Combinatorial Optimization
• Learning Word Representations from Relational Graphs. Danushka Bollegala, Takanori Maehara, Yuichi Yoshida and Ken-ichi Kawarabayashi. AAAI 2015. [arXiv]

Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter-connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.

Natural Language Processing
• Distributed Multiplicative Weights Methods for DCOP. Daisuke Hatano and Yuichi Yoshida. AAAI 2015.

We introduce a new framework for solving distributed constraint optimization problems that extend the domain of each variable into a simplex.We propose two methods for searching the extended domain for good assignments.The first one relaxes the problem using linear programming, finds the optimum LP solution, and rounds it to an assignment.The second one plays a cost-minimization game, finds a certain kind of equilibrium, and rounds it to an assignment.Both methods are realized by performing the multiplicative weights method in a distributed manner.We experimentally demonstrate that our methods have good scalability,and in particular, the second method outperforms existing algorithms in terms of solution quality and efficiency.

Constraint Satisfaction Problems
• Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling. Takuya Akiba, Takanori Hayashi, Nozomi Nori, Yoichi Iwata and Yuichi Yoshida. AAAI 2015.

We propose an indexing scheme for top-$$k$$ shortest-path distance queries on graphs, which is useful in a wide range of important applications such as network-aware search and link prediction. While considerable effort has been made for efficiently answering standard (top-1) distance queries, none of previous methods can be directly extended for top-k distance queries. We propose a new framework for top-$$k$$ distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the simple but effective recent notion of pruned landmark labeling. Extensive experimental results on real social and web graphs show the scalability, efficiency and robustness of our method. Moreover, we demonstrate the usefulness of top-$$k$$ distance queries through an application to link prediction.

Network Algorithms

## 2014

• Robust Approximation of Temporal CSP. Suguru Tamaki and Yuichi Yoshida. APPROX 2014.

A temporal constraint language $$\Gamma$$ is a set of relations with first-order definitions in $$(\mathbb{Q};<)$$. Let $$\mathrm{CSP}(\Gamma)$$ denote the set of constraint satisfaction problem instances with relations from $$\Gamma$$. $$\mathrm{CSP}(\Gamma)$$ admits robust approximation if, for any $$\varepsilon \geq 0$$, given a $$(1-\varepsilon)$$-satisfiable instance of $$\mathrm{CSP}(\Gamma)$$, we can compute an assignment that satisfies at least a $$(1-f(\varepsilon))$$-fraction of constraints in polynomial time. Here, $$f(\varepsilon)$$ is some function satisfying $$f(0)=0$$ and $$\lim_{\varepsilon \to 0}f(\varepsilon)=0$$.

Firstly, we give a qualitative characterization of robust approximability: Assuming the Unique Games Conjecture, we give a necessary and sufficient condition on $$\Gamma$$ under which $$\mathrm{CSP}(\Gamma)$$ admits robust approximation. Secondly, we give a quantitative characterization of robust approximability: Assuming the Unique Games Conjecture, we precisely characterize how $$f(\varepsilon)$$ depends on $$\varepsilon$$ for each $$\Gamma$$. We show that our robust approximation algorithms can be run in almost linear time.

Constraint Satisfaction Problems
• Betweenness centrality measures the importance of a vertex by quantifying the number of times it acts as a midpoint of the shortest paths between other vertices. This measure is widely used in network analysis. In many applications, we wish to choose the $$k$$ vertices with the maximum adaptive betweenness centrality, which is the betweenness centrality without considering the shortest paths that have been taken into account by already-chosen vertices.

All previous methods are designed to compute the betweenness centrality in a fixed graph. Thus, to solve the above task, we have to run these methods $$k$$ times. In this paper, we present a method that directly solves the task, with an almost linear runtime no matter how large the value of $$k$$. Our method first constructs a hypergraph that encodes the betweenness centrality, and then computes the adaptive betweenness centrality by examining this graph. Our technique can be utilized to handle other centrality measures.

We theoretically prove that our method is very accurate, and experimentally confirm that it is three orders of magnitude faster than previous methods. Relying on the scalability of our method, we experimentally demonstrate that strategies based on adaptive betweenness centrality are effective in important applications studied in the network science and database communities.

Network Algorithms
• Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and Ken-Ichi Kawarabayashi. AAAI 2014. [slides (by Ohsaka)] [code]

Influence maximization is a problem to find small sets of highly influential individuals in a social network to maximize the spread of influence under stochastic cascade models of propagation. Although the problem has been well-studied, it is still highly challenging to find solutions of high quality in large-scale networks of the day. While Monte-Carlo-simulation-based methods produce near-optimal solutions with a theoretical guarantee, they are prohibitively slow for large graphs. As a result, many heuristic methods without any theoretical guarantee have been developed, but all of them substantially compromise solution quality. To address this issue, we propose a new method for the influence maximization problem. Unlike other recent heuristic methods, the proposed method is a Monte-Carlo-simulation-based method, and thus it consistently produces solutions of high quality with the theoretical guarantee. On the other hand, unlike other previous Monte-Carlo-simulation-based methods, it runs as fast as other state-of-the-art methods, and can be applied to large networks of the day. Through our extensive experiments, we demonstrate the scalability and the solution quality of the proposed method.

Network Algorithms
• We consider the problem of testing if two input forests are isomorphic or are far from being so. An algorithm is called an $$\epsilon$$-tester for forest-isomorphism if given an oracle access to two forests $$G$$ and $$H$$ in the adjacency list model, with high probability, accepts if $$G$$ and $$H$$ are isomorphic and rejects if we must modify at least $$\epsilon n$$ edges to make $$G$$ isomorphic to $$H$$. We show an $$\epsilon$$-tester for forest-isomorphism with a query complexity $$\mathrm{polylog}(n)$$ and a lower bound of $$\Omega(\sqrt{\log n/\log \log n})$$. Further, with the aid of the tester, we show that every graph property is testable in the adjacency list model with $$\mathrm{polylog}(n)$$ queries if the input graph is a forest.

Sublinear-Time Algorithms
• Let $$P$$ be a property of function $$\mathbb{F}^n_p \to \{0, 1\}$$ for a fixed prime $$p$$. An algorithm is called a tester for $$P$$ if, given a query access to the input function $$f$$, with high probability, it accepts when $$f$$ satisfies $$P$$ and rejects when $$f$$ is "far" from satisfying $$P$$. In this paper, we give a characterization of affine-invariant properties that are (two-sided error) testable with a constant number of queries. The characterization is stated in terms of decomposition theorems, which roughly claim that any function can be decomposed into a structured part that is a function of a constant number of polynomials, and a pseudo-random part whose Gowers norm is small. We first give an algorithm that tests whether the structured part of the input function has a specific form. Then we show that an affine-invariant property is testable with a constant number of queries if and only if it can be reduced to the problem of testing whether the structured part of the input function is close to one of a constant number of candidates.

Boolean Functions,Sublinear-Time Algorithms
• We propose two dynamic indexing schemes for shortest-path and distance queries on large time-evolving graphs, which are useful in a wide range of important applications such as real-time network-aware search and network evolution analysis. To the best of our knowledge, these methods are the first practical exact indexing methods to efficiently process distance queries and dynamic graph updates.

We first propose a dynamic indexing scheme for queries on the last snapshot. The scalability and efficiency of its offline indexing algorithm and query algorithm are competitive even with previous static methods. Meanwhile, the method is dynamic, that is, it can incrementally update indices as the graph changes over time. Then, we further design another dynamic indexing scheme that can also answer two kinds of historical queries with regard to not only the latest snapshot but also previous snapshots.

Through extensive experiments on real and synthetic evolving networks, we show the scalability and efficiency of our methods. Specifically, they can construct indices from large graphs with millions of vertices, answer queries in microseconds, and update indices in milliseconds.

Network Algorithms
• We consider approximation schemes for the maximum constraint satis- faction problems and the maximum assignment problems. Though they are NP-Hard in general, if the instance is “dense” or “locally dense”, then they are known to have approximation schemes that run in polynomial time or quasi-polynomial time. In this paper, we give a unified method of showing these approximation schemes based on the Sherali-Adams linear programming relaxation hierarchy. We also use our linear programming-based framework to show new algorithmic results on the optimization version of the hypergraph isomorphism problem.

Constraint Satisfaction Problems
• Parameterized Testability. Kazuo Iwama and Yuichi Yoshida. ITCS 2014. [slides]

This paper studies property testing for NP optimization problems with parameter $$k$$ under the general graph model with an augmentation of random edge sampling capability. It is shown that a variety of such problems, including $$k$$-Vertex Cover, $$k$$-Feedback Vertex Set, $$k$$-Multicut, $$k$$-path-freeness and $$k$$-Dominating Set, are constant-time testable if $$k$$ is constant. It should be noted that the first four problems are fixed parameter tractable (FPT) and it turns out that algorithmic techniques for their FPT algorithms (branch-and-bound search, color coding, etc.) are also useful for our testers. $$k$$-Dominating Set is $$W[2]$$-hard, but we can still test the property in constant time since the definition of $$\epsilon$$-farness makes the problem trivial for non-sparse graphs that are the source of hardness for the original optimization problem. We also consider $$k$$-Odd Cycle Transversal, which is another well-known FPT problem, but we only give a sublinear-time tester when $$k$$ is a constant.

Sublinear-Time Algorithms,FPT
• Linear Time FPT Algorithms via Network Flow. Yoichi Iwata, Keigo Oka and Yuichi Yoshida. SODA 2014. [arXiv] [video (talk by Iwata)]

In the area of parameterized complexity, to cope with NP-Hard problems, we introduce a parameter $$k$$ besides the input size $$n$$, and we aim to design algorithms (called FPT algorithms) that run in $$O(f(k)n^d)$$ time for some function $$f(k)$$ and constant $$d$$. Though FPT algorithms have been successfully designed for many problems, typically they are not sufficiently fast because of huge $$f(k)$$ and $$d$$.

In this paper, we give FPT algorithms with small $$f(k)$$ and d for many important problems including Odd Cycle Transversal and Almost 2-SAT. More specifically, we can choose $$f(k)$$ as a single exponential $$(4^k)$$ and d as one, that is, linear in the input size. To the best of our knowledge, our algorithms achieve linear time complexity for the first time for these problems.

To obtain our algorithms for these problems, we consider a large class of integer programs, called BIP2. Then we show that, in linear time, we can reduce BIP2 to Vertex Cover Above LP preserving the parameter $$k$$, and we can compute an optimal LP solution for Vertex Cover Above LP using network flow.

Then, we perform an exhaustive search by fixing half-integral values in the optimal LP solution for Vertex Cover Above LP. A bottleneck here is that we need to recompute an LP optimal solution after branching. To address this issue, we exploit network flow to update the optimal LP solution in linear time.

FPT

## 2013

• Answering reachability queries on directed graphs is ubiquitous in many applications involved with graph-shaped data as one of the most fundamental and important operations. However, it is still highly challenging to efficiently process them on large-scale graphs. Transitive-closure-based methods consume prohibitively large index space, and online-search-based methods answer queries too slowly. Labeling-based methods attain both small index size and query time, but previous indexing algorithms are not scalable at all for processing large graphs of the day.

In this paper, we propose new labeling-based methods for reachability queries, referred to as pruned landmark labeling and pruned path labeling. They follow the frameworks of 2-hop cover and 3-hop cover, but their indexing algorithms are based on the recent notion of pruned labeling and improve the indexing time by several orders of magnitude, resulting in applicability to large graphs with tens of millions of vertices and edges. Our experimental results show that they attain remarkable trade-offs between fast query time, small index size and scalability, which previous methods have never been able to achieve. Furthermore, we also discuss the ingredients of the efficiency of our methods by a novel theoretical analysis based on the graph minor theory.

Network Algorithms
• Capturing sets of closely related vertices from large networks is an essential task in many applications such as social network analysis, bioinformatics, and web link research. Decomposing a graph into $$k$$-core components is a standard and efficient method for this task, but obtained clusters might not be well-connected. The idea of using maximal $$k$$-edge-connected subgraphs was recently proposed to address this issue. Although we can obtain better clusters with this idea, the state-of-the-art method is not efficient enough to process large networks with millions of vertices.

In this paper, we propose a new method to decompose a graph into maximal $$k$$-edge-connected components, based on random contraction of edges. Our method is simple to implement but improves performance drastically. We experimentally show that our method can successfully decompose large networks and it is thousands times faster than the previous method. Also, we theoretically explain why our method is efficient in practice. To see the importance of maximal $$k$$-edge-connected subgraphs, we also conduct experiments using real-world networks to show that many $$k$$-core components have small edge-connectivity and they can be decomposed into a lot of maximal $$k$$-edge-connected subgraphs.

Network Algorithms
• Mining for Analogous Tuples from an Entity-Relation Graph. Danushka Bollegala, Mitsuru Kushimoto, Yuichi Yoshida and Ken-ichi Kawarabayashi. IJCAI 2013.

The ability to recognize analogies is an important factor that is closely related to human intelligence. Verbal analogies have been used for evaluating both examinees at university entrance exams as well as algorithms for measuring relational similarity. However, relational similarity measures proposed so far are confined to measuring the similarity between pairs of words. Unfortunately, such pairwise approaches ignore the rich relational structure that exists in real-world knowledge bases containing millions of entities and semantic relations. We propose a method to efficiently identify analogous entity tuples from a given entity-relation graph. First, we present an efficient approach for extracting potential analogous tuples from a given entity-relation graph. Second, to measure the structural similarity between two tuples, we propose two types of kernel functions: vertex-feature kernels, and edge-feature kernels. Moreover, we combine those kernels to construct composite kernels that simultaneously consider both vertex and edge features. Experimental results show that our proposed method accurately identifies analogous tuples and significantly outperforms a state-of-the-art pairwise relational similarity measure, extended to tuples.

Natural Language Processing
• Testing Linear-Invariant Function Isomorphism. Karl Wimmer and Yuichi Yoshida. ICALP 2013.

A function $$f:\mathbb{F}^n_2 \to \{-1,1\}$$ is called linear-isomorphic to $$g$$ if $$f = g \circ A$$ for some non-singular matrix $$A$$. In the $$g$$-isomorphism problem, we want a randomized algorithm that distinguishes whether an input function $$f$$ is linear-isomorphic to $$g$$ or far from being so.

We show that the query complexity to test $$g$$-isomorphism is essentially determined by the spectral norm of $$g$$. That is, if $$g$$ is close to having spectral norm $$s$$, then we can test g-isomorphism with $$\mathrm{poly}(s)$$ queries, and if $$g$$ is far from having spectral norm $$s$$, then we cannot test $$g$$-isomorphism with $$o(\log s)$$ queries. The upper bound is almost tight since there is indeed a function $$g$$ close to having spectral norm s whereas testing $$g$$-isomorphism requires $$\Omega(s)$$ queries. As far as we know, our result is the first characterization of this type for functions. Our upper bound is essentially the Kushilevitz-Mansour learning algorithm, modified for use in the implicit setting.

Exploiting our upper bound, we show that any property is testable if it can be well-approximated by functions with small spectral norm. We also extend our algorithm to the setting where $$A$$ is allowed to be singular.

Boolean Functions,Sublinear-Time Algorithms
• An Algebraic Characterization of Testable CSPs. Arnab Bhattacharyya and Yuichi Yoshida. ICALP 2013. [slides]

Given an instance $$\mathcal{I}$$ of a CSP, a tester for $$\mathcal{I}$$ distinguishes assignments satisfying $$\mathcal{I}$$ from those which are far from any assignment satisfying $$\mathcal{I}$$. The efficiency of a tester is measured by its query complexity, the number of variable assignments queried by the algorithm. In this paper, we characterize the hardness of testing Boolean CSPs in terms of the algebra generated by the relations used to form constraints. In terms of computational complexity, we show that if a non-trivial Boolean CSP is sublinear-query testable (resp., not sublinear-query testable), then the CSP is in NL (resp., P-complete, ⊕L-complete or NL-complete) and that if a sublinear-query testable Boolean CSP is constant-query testable (resp., not constant-query testable), then counting the number of solutions of the CSP is in P (resp., #P-complete).

Also, we conjecture that a CSP instance is testable in sublinear time if its Gaifman graph has bounded treewidth. We confirm the conjecture when a near-unanimity operation is a polymorphism of the CSP.

Constraint Satisfaction Problems,Sublinear-Time Algorithms
• We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

Network Algorithms
• Testing a property $$P$$ of graphs in the bounded-degree model deals with the following problem: given a graph $$G$$ of bounded degree $$d$$, we should distinguish (with probability $$2/3$$, say) between the case that $$G$$ satisfies $$P$$ and the case that one should add/remove at least $$\epsilon dn$$ edges of $$G$$ to make it satisfy $$P$$. In sharp contrast to property testing of dense graphs, which is relatively well understood, only few properties are known to be testable with a constant number of queries in the bounded-degree model. In particular, no global monotone (i.e,closed under edge deletions) property that expander graphs can satisfy has been shown to be testable in constant time so far.

In this paper, we identify for the first time a natural family of global monotone property that expander graphs can satisfy and can be efficiently tested in the bounded degree model. Specifically, we show that, for any integer $$t \geq 1$$, $$K_t$$-subdivision-freeness is testable with a constant number of queries in the bounded-degree model. This property was not previously known to be testable even with $$o(n)$$ queries. Note that an expander graph with all degree less than $$t-1$$ does not have a $$K_t$$-subdivision.

The proof is based on a novel combination of some results that develop the framework of partitioning oracles, together with structural graph theory results that develop the seminal graph minor theory by Robertson and Seymour. As far as we aware, this is the first result that bridges property testing and structural graph theory. Although we know a rough structure for graphs without $$H$$-minors from the famous graph minor theory by Robertson and Seymour, there is no corresponding structure theorem for graphs without $$H$$-subdivisions so far, even $$K_5$$-subdivision-free graphs. Therefore, subdivisions and minors are very different in a graph structural sense.

Sublinear-Time Algorithms
• We study the constraint satisfaction problem over the point algebra. In this problem, an instance consists of a set of variables and a set of binary constraints of forms $$(x < y)$$, $$(x \leq y)$$, $$(x \neq y)$$ or $$(x = y)$$. Then, the objective is to assign integers to variables so as to satisfy as many constraints as possible.This problem contains many important problems such as Correlation Clustering, Maximum Acyclic Subgraph, and Feedback Arc Set. We first give an exact algorithm that runs in $$O^*(3^{\frac{\log 5}{\log 6}n})$$ time, which improves the previous best $$O^*(3^n)$$ obtained by a standard dynamic programming. Our algorithm combines the dynamic programming with the split-and-list technique. The split-and-list technique involves matrix products and we make use of sparsity of matrices to speed up the computation. As for approximation, we give a $$0.4586$$-approximation algorithm when the objective is maximizing the number of satisfied constraints, and give an $$O(\log n \log \log n)$$-approximation algorithm when the objective is minimizing the number of unsatisfied constraints.

Constraint Satisfaction Problems

## 2012

• We propose a constant-time algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edge-weighted digraph $$G$$ of average degree $$d$$ whose weights are real values in $$[0, 1]$$ is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in $$G$$ with an absolute error of at most εn with query complexity $$O(d/\epsilon^3)$$, where $$n$$ is the number of vertices. We also show a lower bound of $$\Omega(d/\epsilon^2)$$. Additionally, our algorithm can be modified to run with query complexity $$O(1/\epsilon^4)$$ for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with $$d = \Omega(n)$$ edges. In contrast, we show that it requires $$\Omega(n)$$ queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.

Sublinear-Time Algorithms
• Given a Boolean function $$f$$, the $$f$$-isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to $$f$$ up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions $$f$$ we can test $$f$$-isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functions -- functions invariant to the reordering of all but a constant number of their variables -- are efficiently isomorphism-testable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphism-testable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearly-optimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testing-by-implicit-learning approach to complete the proof that partially symmetric functions are efficiently isomorphism-testable.

Boolean Functions,Sublinear-Time Algorithms
• Given an undirected graph $$G = (V,E)$$ and positive edge weights $$\{w_e\}_{e \in E}$$, a linear arrangement is a permutation $$\pi: V \to [n]$$. The value of the arrangement is $$\mathrm{val}(G,\pi):=\frac{1}{n}\sum_{e=\{u,v\}}w_e|\pi(u)-\pi(v)|$$. In the minimum linear arrangement problem (MLA), the goal is to find a linear arrangement $$\pi^*$$ that achieves $$\mathrm{val}(G,\pi^*) = \mathrm{MLA}(G): = \min_{\pi}\mathrm{val}(G,\pi)$$.

In this paper, we show that for any $$\epsilon > 0$$ and positive integer $$r$$, there is an $$O(nr/\epsilon)$$-time randomized algorithm which, given a graph $$G$$, returns a permutation $$\pi$$ such that $\mathrm{val}(G,\pi) \leq \left(1 + \frac{2}{(1-\epsilon)\lambda_{r+1}(\mathcal{L})}\right)\mathrm{MLA}(G) + O\left(\frac{\log n}{\sqrt{n}}\sum_{e \in E}w_e\right)$ with high probability. Here $$\mathcal{L}$$ is the normalized Laplacian of $$G$$ and $$\lambda_r(\mathcal{L})$$ is the $$r$$-th eigenvalue of $$\mathcal{L}$$. Our algorithm gives a constant factor approximation for regular graphs that are weak expanders.

Constraint Satisfaction Problems
• Constant-Time Algorithms for Sparsity Matroids. Hiro Ito, Shin-ichi Tanigawa and Yuichi Yoshida. ICALP 2012. [slides]

A graph $$G = (V, E)$$ is called $$(k, \ell)$$-sparse if $$|F| ≤ k|V(F)| -\ell$$ for any $$F \subseteq E$$ with $$F \neq \emptyset$$. Here, $$V(F)$$ denotes the set of vertices incident to $$F$$. A graph $$G = (V,E)$$ is called $$(k,\ell)$$-full if $$G$$ contains a $$(k,\ell)$$-sparse subgraph with $$|V|$$ vertices and $$k|V| - \ell$$ edges. The family of edge sets of $$(k,\ell)$$-sparse subgraphs forms a family of independent sets of a matroid on $$E$$, known as the sparsity matroid of $$G$$. In this paper, we give a constant-time algorithm that approximates the rank of the sparsity matroid associated with a degree-bounded undirected graph. This algorithm leads to a constant-time tester for $$(k,\ell)$$-fullness in the bounded-degree model, (i.e., we can decide with high probability whether the input graph satisfies a property or far from it). Depending on the values of $$k$$ and $$\ell$$, our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed in a unified manner.

Based on this result, we also propose a constant-time tester for $$(k,\ell)$$-edge-connected-orientability in the bounded-degree model, where an undirected graph $$G$$ is called $$(k,\ell)$$-edge-connected-orientable if there exists an orientation $$\vec{G}$$ of $$G$$ with a vertex $$r\in V$$ such that $$\vec{G}$$ contains $$k$$ arc-disjoint dipaths from r to each vertex $$v \in V$$ and $$\ell$$ arc-disjoint dipaths from each vertex $$v \in V$$ to $$r$$.

A tester is called a one-sided error tester for $$P$$ if it always accepts a graph satisfying $$P$$. We show, for any $$k \geq 2$$ and (proper) $$\ell \geq 0$$, every one-sided error tester for $$(k,\ell)$$-fullness and $$(k,\ell)$$-edge-connected-orientability requires $$\Omega(n)$$ queries.

Sublinear-Time Algorithms,Submodular Functions
• Testing List H-Homomorphisms. Yuichi Yoshida. CCC 2012. [slides]

Let $$H$$ be an undirected graph. In the List $$H$$-Homomorphism Problem, given an undirected graph $$G$$ with a list constraint $$L(v) \subseteq V(H)$$ for each variable $$v∈V(G)$$, the objective is to find a list $$H$$-homomorphism $$f:V(G)\to V(H)$$, that is, $$f(v)\in L(v)$$ for every $$v\in V(G)$$ and $$(f(u),f(v)) \in E(H)$$ whenever $$(u,v)\in E(G)$$. We consider the following problem: given a map $$f:V(G) \to V(H)$$ as an oracle access, the objective is to decide with high probability whether $$f$$ is a list $$H$$-homomorphism or \textit{far} from any list $$H$$-homomorphisms. The efficiency of an algorithm is measured by the number of accesses to $$f$$. In this paper, we classify graphs $$H$$ with respect to the query complexity for testing list $$H$$-homomorphisms and show the following trichotomy holds: (i) List $$H$$-homomorphisms are testable with a constant number of queries if and only if $$H$$ is a reflexive complete graph or an irreflexive complete bipartite graph. (ii) List $$H$$-homomorphisms are testable with a sublinear number of queries if and only if $$H$$ is a bi-arc graph. (iii) Testing list $$H$$-homomorphisms requires a linear number of queries if $$H$$ is not a bi-arc graph.

Constraint Satisfaction Problems,Sublinear-Time Algorithms
• Constant-Time Approximation Algorithms for the Knapsack Problem. Hiro Ito, Susumu Kiyoshima and Yuichi Yoshida. TAMC 2012.

In this paper, we give a constant-time approximation algorithm for the knapsack problem. Using weighted sampling, with which we can sample items with probability proportional to their profits, our algorithm runs with query complexity $$O(\epsilon^{- 4} \log \epsilon^{-1})$$, and it approximates the optimal profit with probability at least $$2/3$$ up to error at most an $$\epsilon$$-fraction of the total profit. For the subset sum problem, which is a special case of the knapsack problem, we can improve the query complexity to $$O(\epsilon^{-1} \log \epsilon^{-1})$$.

Sublinear-Time Algorithms
• Algorithms and Complexity of Generalized River Crossing Problems. Stefan Langerman, Hiro Ito and Yuichi Yoshida. FUN 2012.

Three men, each with a sister, must cross a river using a boat which can carry only two people, so that a woman whose brother is not present is never left in the company of another man. This is a very famous problem appeared in Latin book “Problems to Sharpen the Young,” one of the earliest collections on recreational mathematics. This paper considers a generalization of such “River-Crossing Problems.” It shows that the problem is NP-hard if the boat size is three, and a large class of sub-problems can be solved in polynomial time if the boat size is two. It’s also conjectured that determining whether a river crossing problem has a solution without bounding the number of transportations, can be solved in polynomial time even when the size of the boat is large.

• Linear programming, width-1 CSPs, and robust satisfaction. Gabor Kun, Ryan O’Donnell, Suguru Tamaki, Yuichi Yoshida and Yuan Zhou. ITCS 2012.

We say that an algorithm robustly decides a constraint satisfaction problem $$\Pi$$ if it distinguishes at-least-$$(1-\epsilon)$$-satisfiable instances from less-than-$$(1 - r(\epsilon))$$-satisfiable instances for some function $$r(\epsilon)$$ with $$r(\epsilon) \to 0$$ as $$\epsilon \to 0$$. In this paper we show that the canonical linear programming relaxation robustly decides $$\Pi$$ if and only if $$\Pi$$ has "width 1" (in the sense of Feder and Vardi).

Constraint Satisfaction Problems
• Studies on Constant-Time Algorithms for Bounded-Degree Graphs and Constraint Satisfaction Problems. Yuichi Yoshida. Ph.D. thesis, Kyoto University, 2012.

We are concerned with constant-time algorithms, which aim for solving decision problems or optimization problems in time independent of sizes of inputs. To obtain such efficient algorithms, we relax the concept of decision and optimization as follows. An algorithm is said to test a property if, with high probability, it decides whether an input satisfies the property or we must modify an $$\epsilon$$-fraction of the input to make it satisfy the property. An algorithm is said to $$(\alpha, \epsilon n)$$-approximate an optimization problem if, with high probability, it outputs an $$(\alpha, \epsilon n)$$-approximation to the optimal value. Here, a value $$x$$ is called an $$(\alpha, \epsilon n)$$-approximation to $$x^*$$ if $$\alpha x^* - \epsilon n \leq x \leq x^*$$ where $$n$$ is the size of an input. By relaxing this way, various problems become solvable in constant time.

Constant-time algorithms were first studied for algebraic properties of functions for applications to computational complexity. However, because of their theoretical richness and potential to practical applications, they have been studied for many different areas in the last decade. In this thesis, we concentrate on problems on bounded-degree graphs and constraint satisfaction problems (CSPs). In this setting, each vertex (or variable) is incident to a constant number of edges (or constraints). Bounded-degree instances are practically important since real-world problems often give rise to such instances, and they are also theoretically important since we become able to study various fundamental problems that are trivial in general instances.

We first give a constant-time $$(1, \epsilon n)$$-approximation algorithm for the maximum matching problem. Compared to previous works, our algorithm runs exponentially faster with respect to $$1/\epsilon$$. Using the algorithm as a building block, we show various related problems are also testable in constant time. A graph $$G = (V , E )$$ is called $$(k,l)$$-sparse if $$|F| \leq k|V(F)|-l$$ for any non-empty $$F \subseteq E$$. Here, $$V(F)$$ denotes the set of vertices incident to $$F$$. We give a constant-time algorithm that tests a graph $$G = (V, E)$$ has a $$(k, l)$$-sparse subgraph with $$|V|$$ vertices and $$k|V | − l$$ edges. Depending on values of $$k$$ and $$l$$, our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed. Based on this result, we also give a constant-time testing algorithm for (k, l)-edge-connected- orientability, where a graph $$G$$ is called $$(k, l)$$-edge-connected-orientable if there exists an orientation $$\vec{G}$$ of $$G$$ with a vertex $$r \in V$$ such that $$\vec{G}$$ contains $$k$$ arc-disjoint dipaths from $$r$$ to each vertex $$v \in V$$ and $$l$$ arc-disjoint dipaths from each vertex $$v \in V$$ to $$r$$.

Secondly, we consider constant-time algorithms for CSPs. Let $$\Lambda$$ be a set of relations. Then, an instance of $$\mathrm{CSP}(\Lambda)$$ is a pair of variables and constraints over the variables such that each constraint belongs to $$\Lambda$$. In the problem, called $$\mathrm{MaxCSP}(\Lambda)$$, given an instance of $$\mathrm{CSP}(\Lambda)$$, we want to compute an assignment that maximizes the number of satisfied constraints. Here, we give the “best” constant-time approxi- mation algorithm for every $$\mathrm{MaxCSP}(\Lambda)$$. Specifically, for each $$\Lambda$$, there exists some constant $$\alpha_\Lambda$$ with the following properties. (1) for any $$\epsilon > 0$$, there exists a constant-time $$(\alpha_\Lambda - \epsilon, \epsilon n)$$-approximation algorithm for $$\mathrm{MaxCSP}(\Lambda)$$. (2) for any $$\epsilon > 0$$, there exists $$\delta > 0$$ such that no constant-time $$(\alpha_\Lambda + \epsilon, \delta n)$$-approximation algorithm for $$\mathrm{MaxCSP}(\Lambda)$$ exists. Here, $$\alpha_\Lambda$$ coincides with the integrality gap of a standard LP relaxation for $$\mathrm{MaxCSP}(\Lambda)$$.

We consider testing satisfiability of CSP instances. Using the result above, we can derive the characterization of the constant-time testability of $$\mathrm{CSP}(\Lambda)$$. Let $$\mathrm{opt}(I)$$ and $$\mathrm{lp}(I)$$ denote the optimal value and the LP value for an instance $$I$$, respectively. Then, we have the followings. (i) if any instance of $$\mathrm{CSP}(\Lambda)$$ with $$lp(I) \geq 1−\epsilon$$ satisfies $$\mathrm{opt}(I) \geq 1 − r(\epsilon)$$ where $$\lim_{\epsilon \to 0} r(\epsilon) = 0$$, then $$CSP(\Lambda)$$ is testable in constant time. (ii) if there is an instance $$I$$ of $$\mathrm{CSP}(\Lambda)$$ with $$\mathrm{lp}(I) = 1$$ and $$\mathrm{opt}(I) < 1$$, then $$\mathrm{CSP}(\Lambda)$$ is not testable in constant time.

To achieve a combinatorial characterization of the testability of $$\mathrm{CSP}(\Lambda)$$, we show a close connection between constant-time testability and the notion of width 1. We say $$\mathrm{CSP}(\Lambda)$$ has width 1 if satisfiability of instances of $$\mathrm{CSP}(\Lambda)$$ can be decided by a certain simple propagation algorithm. Then, we prove that any CSP of width 1 satisfies (i), which means that any such CSP is testable in constant time.

In a subsequent work (Kun et al. ITCS 2012), which includes the part of the present thesis, we have shown that any CSP of width more than 1 satisfies (ii). Thus, a CSP is constant-time testable if and only if it has width 1.

Sublinear-Time Algorithms

## 2011

• Algorithms for Finding a Maximum Non-k-Linked Graph. Yusuke Kobayashi and Yuichi Yoshida. ESA 2011.

A graph with at least $$2k$$ vertices is said to be k-linked if for any ordered $$k$$-tuples $$(s_1, \ldots,s_k)$$ and $$(t_1, \ldots, t_k)$$ of $$2k$$ distinct vertices, there exist pairwise vertex-disjoint paths $$P_1,\ldots, P_k$$ such that $$P_i$$ connects si and ti for $$i = 1, \ldots, k$$. For a given graph $$G$$, we consider the problem of finding a maximum induced subgraph of $$G$$ that is not $$k$$-linked. This problem is a common generalization of computing the vertex-connectivity and testing the $$k$$-linkedness of $$G$$, and it is closely related to the concept of $$H$$-linkedness. In this paper, we give the first polynomial-time algorithm for the case of $$k = 2$$, whereas a similar problem that finds a maximum induced subgraph without 2-vertex-disjoint paths connecting fixed terminal pairs is NP-hard. For the case of general $$k$$, we give an $$(8k − 2)$$-additive approximation algorithm. We also investigate the computational complexities of the edge-disjoint case and the directed case.

Combinatorial Optimization
• Property Testing for Cyclic Groups and Beyond. Francois Le Gall and Yuichi Yoshida. COCOON 2011.

This paper studies the problem of testing if an input (Γ,¿), where Γ is a finite set of unknown size and ¿ is a binary operation over Γ given as an oracle, is close to a specified class of groups. Friedl et al. (Proc. of STOC, 2005) have constructed an efficient tester using poly(log|Γ|) queries for the class of abelian groups. We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: Ω(|Γ|1/6) queries are necessary to test if the input is close to a cyclic group, and Ω(|Γ| c ) queries for some constant c are necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed integer k¿1. We also show that knowledge of the size of the ground set Γ helps only for k=1, in which case we construct an efficient tester using poly(log|Γ|) queries; for any other value k¿2 the query complexity remains Ω(|Γ| c ). All our upper and lower bounds hold for both the edit distance and the Hamming distance. These are, to the best of our knowledge, the first nontrivial lower bounds for such group-theoretic problems in the property testing model and, in particular, they imply the first exponential separations between the classical and quantum query complexities of testing closeness to classes of groups.

Sublinear-Time Algorithms
• In this paper, we consider lower bounds on the query complexity for testing CSPs in the bounded-degree model. We mainly consider Boolean CSPs allowing literals. First, for any "symmetric" predicate P : {0, 1}k → {0,1} except EQU where k ≥ 3, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances (|P-1 (0)|/2k - ϵ)-far from satisfiability requires Ω(n1/2+δ) queries where n is the number of variables and δ >; 0 is a constant that depends on P and e. This breaks a natural lower bound Ω(n1/2), which is obtained by the birthday paradox. We also show that every one-sided error tester requires Ω(n) queries for such P. These results are hereditary in the sense that the same results hold for any predicate Q such that P-1 (1) ⊆ Q-1(1). For EQU, we give a one-sided error tester whose query complexity is O̅(n1/2). Also, for 2-XOR (or, equivalently E2LIN2), we show an Ω(n1/2+δ) lower bound for distinguishing instances between e-close to and (1/2 -ϵ)-far from satisfiability. Next, for the general k-CSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances (1 - 2k/2k - ϵ)-far from satisfiability requires Ω(n) queries. The matching NP-hardness is not known, even assuming the Unique Games Conjecture or the d-to-1 Conjecture. As a corollary, for Maximum Independent Set on graphs with n vertices and a degree bound d, we show that every approximation algorithm within a factor d/poly log d and an additive error of ϵn requires Ω(n) queries. Previously, only super-constant lower bounds were known.

Sublinear-Time Algorithms,Constraint Satisfaction Problems
• Raghavendra (STOC 2008) gave an elegant and surprising result: if Khot's Unique Games Conjecture (STOC 2002) is true, then for every constraint satisfaction problem (CSP), the best approximation ratio is attained by a certain simple semidefinite programming and a rounding scheme for it.

In this paper, we show that similar results hold for constant-time approximation algorithms in the bounded-degree model. Specifically, we present the following: (i) For every CSP, we construct an oracle that serves an access, in constant time, to a nearly optimal solution to a basic LP relaxation of the CSP. (ii) Using the oracle, we give a constant-time rounding scheme that achieves an approximation ratio coincident with the integrality gap of the basic LP. (iii) Finally, we give a generic conversion from integrality gaps of basic LPs to hardness results. All of those results are unconditional. Therefore, for every bounded-degree CSP, we give the best constant-time approximation algorithm among all.

A CSP instance is called ε-far from satisfiability if we must remove at least an ε-fraction of constraints to make it satisfiable. A CSP is called testable if there is a constant-time algorithm that distinguishes satisfiable instances from ε-far instances with probability at least 2/3. Using the results above, we also derive, under a technical assumption, an equivalent condition under which a CSP is testable in the bounded-degree model.

Sublinear-Time Algorithms,Constraint Satisfaction Problems

## 2010

• A Query Efficient Non-Adaptive Long Code Test with Perfect Completeness. Suguru Tamaki and Yuichi Yoshida. RANDOM 2010.

Long Code testing is a fundamental problem in the area of property testing and hardness of approximation. In this paper, we study how small the soundness $$s$$ of the Long Code test with perfect completeness can be by using non-adaptive $$q$$ queries. We show that $$s = (2q + 3)/2^q$$ is possible, where $$q$$ is of the form math formula for $$2^{k}-1$$ any integer $$k > 2$$.

Sublinear-Time Algorithms,Boolean Functions
• Testing Outerplanarity of Bounded Degree Graphs. Yuichi Yoshida and Hiro Ito. RANDOM 2010.

We present an efficient algorithm for testing outerplanarity of graphs in the bounded-degree model. In this model, given a graph $$G$$ with degree bound $$d$$, we should distinguish with high probability the case that $$G$$ is outerplanar from the case that modifying at least an $$\epsilon$$-fraction of the edge set is necessary to make $$G$$ outerplanar. Our algorithm runs in time polynomial in $$d$$ and $$1/\epsilon$$ only. To achieve the time complexity, we exploit the tree-like structure inherent to an outerplanar graph using the microtree/macrotree decomposition of a tree. As a corollary, we show that the property of being a cactus is testable in time polynomial in $$d$$ and $$1/\epsilon$$.

Sublinear-Time Algorithms
• Conjunctive Filter: Breaking the Entropy Barrier. Daisuke Okanohara and Yuichi Yoshida. ALENEX 2010.

We consider a problem for storing a map that associates a key with a set of values. To store $$n$$ values from the universe of size $$m$$, it requires log 2(m/n) bits of space, which can be approximated as $$(1.44 + n) \log 2 m/n$$ bits when n m. If we allow ε fraction of errors in outputs, we can store it with roughly n log21/ε bits, which matches the entropy bound. Bloom filter is a wellknown example for such data structures. Our objective is to break this entropy bound and construct more space-efficient data structures. In this paper, we propose a novel data structure called a conjunctive filter, which supports conjunctive queries on k distinct keys for fixed k. Although a conjunctive filter cannot return the set of values itself associated with a queried key, it can perform conjunctive queries with O(1/√m) fraction of errors. Also, the consumed space is n/k log2m bits and it is significantly smaller than the entropy bound n/2 log2m when k ≥ 3. We will show that many problems can be solved by using a conjunctive filter such as full-text search and database join queries. Also, we conducted experiments using a real-world data set, and show that a conjunctive filter answers conjunctive queries almost correctly using about 1/2 ∼ 1/4 space as the entropy bound.

• In this paper, we study lower bounds on the query complexity of testing algorithms for various problems. Given an oracle that returns information of an input object, a testing algorithm distinguishes the case that the object has a given property $$P$$ from the case that it has a large distance to having $$P$$ with probability at least $$2/3$$. The query complexity of an algorithm is measured by the number of accesses to the oracle. We introduce two reductions that preserve the query complexity. One is derived from the gap-preserving local reduction and the other is from the L-reduction. By using the former reduction, we show linear lower bounds on the query complexity for testing basic NP-complete properties, i.e., 3-edge-colorability, directed Hamiltonian path/cycle, undirected Hamiltonian path/cycle, 3-dimensional matching and NP-complete generalized satisfiability problems. Also, using the second reduction, we show a linear lower bound on the query complexity of approximating the size of the maximum 3-dimensional matching.

Sublinear-Time Algorithms

## 2009

• An Improved Constant-Time Approximation Algorithm for Maximum Matchings. Yuichi Yoshida, Masaki Yamamoto and Hiro Ito. STOC 2009

This paper studies approximation algorithms for problems on degree-bounded graphs. Let $$n$$ and $$d$$ be the number of vertices and the degree bound, respectively. This paper presents an algorithm to approximate the size of some maximal independent set with additive error $$\epsilon n$$ whose running time is $$O(d^2)$$. Using this algorithm, it also shows that there are approximation algorithms for many other problems, e.g., the maximum matching problem, the minimum vertex cover problem, and the minimum set cover problem, that run exponentially faster than existing algorithms with respect to d and 1/ε. Its approximation algorithm for the maximum matching problem can be transformed to a testing algorithm for the property of having a perfect matching with two-sided error. On the contrary, it also shows that every one-sided error tester for the property requires at least $$\Omega(n)$$ queries.

Sublinear-Time Algorithms
• Fast Calculation of Levenshtein Distance on the Cell Processor. Shun Sakuraba and Yuichi Yoshida. SACSIS 2009. (in Japanese)

Cell Challenge 2009の規定課題において、Cell Broadband Engineを用いて文字列の編集距 離を計算するプログラムを作成するという課題が出題された。本稿では我々が規定課題において考案した最適化技法について報告する。基本となるアルゴリズムはビット並列化法と呼ばれるものであり、実行委員会によるサンプルプログラムと比べて約 315 倍の高速化を達成した。

## 2008

• Property Testing on k-Vertex Connectivity of Graphs. Yuichi Yoshida and Hiro Ito. ICALP 2008.

Wepresentanalgorithmfortestingthek-vertex-connectivity of graphs with given maximum degree. The time complexity of the algorithm is independent of the number of vertices and edges of graphs. A graph $$G$$ with $$n$$ vertices and maximum degree at most $$d$$ is called $$\epsilon$$-far from $$k$$-vertex-connectivity when at least $$\frac{\epsilon dn}{2}$$ edges must be added to or removed from $$G$$ to obtain a $$k$$-vertex-connected graph with maximum degree at most $$d$$. The algorithm always accepts every graph that is $$k$$-vertex-connected and rejects every graph that is $$\epsilon$$-far from $$k$$-vertex-connectivity with a probability of at least $$2/3$$. The algorithm runs in $$O\left(d \left(\frac{c}{\epsilon d}\right)^k \log \frac{1}{\epsilon d}\right)$$ time ($$c > 1$$ is a constant) for given $$(k − 1)$$-vertex-connected graphs, and $$O\left( d \left(\frac{ck}{\epsilon d}\right)^k \log \frac{k}{\epsilon d}\right)$$ time ($$c > 1$$ is a constant) for given general graphs. It is the first constant-time $$k$$-vertex-connectivity testing algorithm for general $$k \geq 4$$.

Sublinear-Time Algorithms