Conference Proceedings
2018

Statistically Efficient Estimation for NonSmooth Probability Densities. Masaaki Imaizumi, Takanori Maehara, and Yuichi Yoshida. AISTATS 2018.
We investigate statistical efficiency of estimators for nonsmooth density functions. The density estimation problem appears in various situations, and it is intensively used in statistics and machine learning. The statistical efficiencies of estimators, i.e., their convergence rates, play a central role in advanced statistical analysis. Although estimators and their convergence rates for smooth density functions are well investigated in the literature, those for nonsmooth density functions remain elusive despite their importance in application fields. In this paper, we propose new estimators for nonsmooth density functions by employing the notion of Szemerédi partitions from graph theory. We derive convergence rates of the proposed estimators. One of them has the optimal convergence rate in minimax sense, and the other has slightly worse convergence rate but runs in polynomial time. Experimental results support the theoretical performance of our esti mators.

Guaranteed Sufficient Decrease for Stochastic Variance Reduced Gradient Optimization. Fanhua Shang, Yuanyuan Liu, Kaiwen Zhou, James Cheng, Kelvin Kai Wing Ng, and Yuichi Yoshida. AISTATS 2018.
In this paper, we propose a novel sufficient decrease technique for stochastic variance reduced gradient descent methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion, which yields sufficient decrease versions of stochastic variance reduction algorithms such as SVRGSD and SAGASD as a byproduct. We introduce a coefficient to scale current iterate and to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or even move in the opposite direction, and then give two specific update rules of the coefficient for Lasso and ridge regres sion. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that both of our algorithms attain linear convergence rates. We also provide the convergence guarantees of our algorithms for nonstrongly convex problems. Our experimental results further verify that our algorithms achieve significantly better performance than their counterparts.

Using kway Cooccurrences for Learning Word Embeddings. Danushka Bollegala, Yuichi Yoshida, and Kenichi Kawarabayashi. AAAI 2018.
Cooccurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning has used cooccurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and cooccurring in the same context. We extend the notion of cooccurrences to cover \(k(\geq 2)\)way cooccurrences among a set of \(k\)words. Specifically, we prove a theoretical relationship between the joint probability of \(k(\geq 2)\) words, and the sum of \(\ell_2\) norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises \(k\)way cooccurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller \(k(\leq 5)\) values, \(k\)way embeddings perform comparably or better than 2way embeddings in a range of tasks.
2017

Fitting LowRank Tensors in Constant Time. Kohei Hayashi and Yuichi Yoshida. NIPS 2017.
In this paper, we develop an algorithm that approximates the residual error of Tucker decomposition, one of the most popular tensor decomposition methods, with a provable guarantee. Given an order\(K\) tensor \(X\in\mathbb{R}^{N_1\times\cdots\times N_K}\), our algorithm randomly samples a constant number \(s\) of indices for each mode and creates a ``mini'' tensor \(\tilde{X}\in\mathbb{R}^{s\times\cdots\times s}\), whose elements are given by the intersection of the sampled indices on \(X\). Then, we show that the residual error of the Tucker decomposition of \(\tilde{X}\) is sufficiently close to that of \(X\) with high probability. This result implies that we can figure out how much we can fit a lowrank tensor to \(X\) in constant time, regardless of the size of \(X\). This is useful for guessing the favorable rank of Tucker decomposition. Finally, we demonstrate how the sampling method works quickly and accurately using multiple real datasets.

Streaming Algorithms for Maximizing Monotone Submodular Functions under a Knapsack Constraint. ChienChung Huang, Naonori Kakimura and Yuichi Yoshida. APPROX 2017.
In this paper, we consider the problem of maximizing a monotone submodular function subject to a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access only to a small fraction of the data stored in primary memory. For this problem, we propose a \((0.363  \epsilon)\)approximation algorithm, requiring only a single pass through the data; moreover, we propose a \((0.4  \epsilon)\)approximation algorithm requiring a constant number of passes through the data. The required memory space of both algorithms depends only on the size of the knapsack capacity and ε.

Landmark indexing for Evaluation of LabelConstrained Reachability Queries. Lucien Valstar, George Fletcher and Yuichi Yoshida. SIGMOD 2017. [slides] [code]
Consider a directed edgelabeled graph, such as a social net work or a citation network. A fundamental query on such data is to determine if there is a path in the graph from a given source vertex to a given target vertex, using only edges with labels in a restricted subset of the edge labels in the graph. Such labelconstrained reachability (LCR) queries play an important role in graph analytics, for example, as a core fragment of the socalled regular path queries which are supported in practical graph query languages such as the W3C’s SPARQL 1.1, Neo4j’s Cypher, and Oracle’s PGQL. Current solutions for LCR evaluation, however, do not scale to large graphs which are increasingly common in a broad range of application domains. In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmarkbased indexes for large graphs. We show through extensive experiments that our indexes are significantly smaller than stateoftheart LCR indexing techniques, while supporting up to orders of magnitude faster query evaluation times. Our complete C++ codebase is available as open source for further research.

Portfolio Optimization for Influence Spread. Naoto Ohsaka and Yuichi Yoshida. WWW 2017. [slides (by Ohsaka)]
Motivated by viral marketing, stochastic diffusion processes that model influence spread on a network have been studied intensively. The primary interest in such models has been to find a seed set of a fixed size that maximizes the expected size of the cascade from it. Practically, however, it is not desirable to have the risk of ending with a small cascade, even if the expected size of the cascade is large. To address this issue, we adopt conditional value at risk (CVaR) as a risk measure, and propose an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR. Using realworld social networks, we demonstrate that the portfolio computed by our algorithm has a significantly better CVaR than seed sets computed by other baseline methods.

Regret Ratio Minimization in Multiobjective Submodular Function Maximization. Tasuku Soma and Yuichi Yoshida. AAAI 2017. [slides (by Soma)]
Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective functions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a nontrivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable tradeoff between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.

Nonmonotone DRSubmodular Function Maximization. Tasuku Soma and Yuichi Yoshida. AAAI 2017. [arXiv] [poster]
We consider nonmonotone DRsubmodular function maximization, where DRsubmodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing nonmonotone DRsubmodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a \(1/(2+\epsilon)\)approximation algorithm with a running time of roughly \(O(n/\epsilon \log^2B)\), where \(n\) is the size of the ground set, B is the maximum value of a coordinate, and \(\epsilon > 0\) is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and realworld datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

Computing Least Cores of Supermodular Cooperative Games. Daisuke Hatano and Yuichi Yoshida. AAAI 2017
One of the goals of a cooperative game is to compute a valuedivision to the players from which they have no incentive todeviate. This concept is formalized as the notion of the core.To obtain a value division that motivates players to cooperate to a greater extent or that is more robust under noise, the notions of the strong least core and the weak least core have been considered. In this paper, we characterize the strong and the weak least cores of supermodular cooperative games using the theory of minimizing crossing submodular functions. We then apply our characterizations to two representative supermodular cooperative games, namely, the induced subgraph game generalized to hypergraphs and the airport game. For these games, we derive explicit forms of the strong and weak least core values, and provide polynomialtime algorithms that compute value divisions in the strong and weak least cores.

RandomRadius Ball Method for Estimating Closeness Centrality. Wataru Inariba, Takuya Akiba and Yuichi Yoshida. AAAI 2017.
In the analysis of realworld complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distancebased centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the randomradius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on realworld networks.
2016

Minimizing Quadratic Functions in Constant Time. Kohei Hayashi and Yuichi Yoshida. NIPS 2016. [arXiv] [poster]
A samplingbased optimization method for quadratic functions is proposed. Our method approximately solves the following \(n\)dimensional quadratic minimization problem in constant time, which is independent of \(n\): \(z^*=\min_{\mathbf{v} \in \mathbb{R}^n}\langle \mathbf{v},A\mathbf{v}\rangle+n\langle \mathbf{v}\mathrm{diag}(\mathbf{d})\mathbf{v}\rangle+n\langle \mathbf{b},\mathbf{v}\rangle \), where \(A \in \mathbb{R}^{n \times n}\) a matrix and \(\mathbf{d},\mathbf{b} \in \mathbb{R}^n\) are vectors. Our theoretical analysis specifies the number of samples \(k(\delta,\epsilon)\) such that the approximated solution \(z\) satisfies \(zz^*=O(\epsilon n^2)\) with probability \(1\delta\). The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.

Testing Assignments to Constraint Satisfaction Problems. Hubie Chen, Matt Valeriote and Yuichi Yoshida. FOCS 2016. [arXiv] [slides]
For a finite relational structure \(A\), let \(\mathrm{CSP}(A)\) denote the CSP instances whose constraint relations are taken from \(A\). The resulting family of problems \(\mathrm{CSP}(A)\) has been considered heavily in a variety of computational contexts. In this article, we consider this family from the perspective of property testing: given an instance of a CSP and query access to an assignment, one wants to decide whether the assignment satisfies the instance, or is far from so doing. While previous work on this scenario studied concrete templates or restricted classes of structures, this article presents comprehensive classification theorems. Our first contribution is a dichotomy theorem completely characterizing the structures \(A\) such that \(\mathrm{CSP}(A)\) is constantquery testable: (i) If \(A\) has a majority polymorphism and a Maltsev polymorphism, then \(\mathrm{CSP}(A)\) is constantquery testable with onesided error. (ii) Else, testing \(\mathrm{CSP}(A)\) requires a superconstant number of queries. Let \(\exists\mathrm{CSP}(A)\) denote the extension of \(\mathrm{CSP}(A)\) to instances which may include existentially quantified variables. Our second contribution is to classify all structures \(A\) in terms of the number of queries needed to test assignments to instances of \(\exists\mathrm{CSP}(A)\), with onesided error. More specifically, we show the following trichotomy (i) If \(A\) has a majority polymorphism and a Maltsev polymorphism, then \(\exists\mathrm{CSP}(A)\) is constantquery testable with onesided error. (ii) Else, if A has a \((k + 1)\)ary nearunanimity polymorphism for some \(k \geq 2\), and no Maltsev polymorphism then \(\exists\mathrm{CSP}(A)\) is not constantquery testable (even with twosided error) but is sublinearquery testable with onesided error. (iii) Else, testing \(\exists\mathrm{CSP}(A)\) with onesided error requires a linear number of queries.

Dynamic Influence Analysis in Evolving Networks. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and Kenichi Kawarabayashi. Proceedings of the VLDB Endowment 2016. [slides (by Ohsaka)] [code]
We propose the first realtime fullydynamic index data structure designed for influence analysis on evolving networks. With this aim, we carefully redesign the data structure of the stateoftheart sketching method introduced by Borgs et al., and construct corresponding update algorithms. Using this index, we present algorithms for two kinds of queries, influence estimation and influence maximization, which are strongly motivated by practical applications, such as viral marketing. We provide a thorough theoretical analysis, which guarantees the nondegeneracy of the solution accuracy after an arbitrary number of updates. Furthermore, we introduce a reachabilitytreebased technique and a skipping method, which greatly reduce the time consumption required for edge/vertex deletions and vertex additions, respectively, and counterbased random number generators, which improve the space efficiency. Experimental evaluations using real dynamic networks with tens of millions of edges demonstrate the efficiency, scalability, and accuracy of our proposed indexing scheme. Specifically, it can reflect a graph modification within a time of several orders of magnitude smaller than that required to reconstruct an index from scratch, estimate the influence spread of a vertex set accurately within a millisecond, and select highly influential vertices at least ten times faster than stateoftheart static algorithms.

Efficient Algorithms for Spanning Tree Centrality. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. IJCAI 2016.
In a connected graph, spanning tree centralities of a vertex and an edge measure how crucial they are for the graph to be connected. In this paper, we propose efficient algorithms for estimating spanning tree centralities with theoretical guarantees on their accuracy. We experimentally demonstrate that our methods are orders of magnitude faster than previous methods. Then, we propose a novel centrality notion of a vertex, called aggregated spanning tree centrality, which also considers the number of connected components obtained by removing the vertex. We also give an efficient algorithm for estimating aggregated spanning tree cen trality. Finally, we experimentally show that those spanning tree centralities are useful to identify vulnerable edges and vertices in infrastructure networks.

Maximizing Monotone Submodular Functions over the Integer Lattice. Tasuku Soma and Yuichi Yoshida. IPCO 2016. [arXiv] [slides (by Soma)]
The problem of maximizing nonnegative monotone submodular functions under a certain constraint has been intensively studied in the last decade. In this paper, we address the problem for functions defined over the integer lattice. Suppose that a nonnegative monotone submodular function \(f:\mathbb{Z}^n_+ \to \mathbb{R}_+\) is given via an evaluation oracle. Assume further that \(f\) satisfies the diminishing return property, which is not an immediate consequence of the submodularity when the domain is the integer lattice. Then, we show polynomialtime \((11/e\epsilon)\)approximation algorithm for cardinality constraints, polymatroid constraints, and knapsack constraints. For a cardinality constraint, we also show a \((11/e\epsilon)\)approximation algorithm with slightly worse time complexity that does not rely on the diminishing return property. Our algorithms for a polymatroid constraint and a knapsack constraint first extend the domain of the objective function to the Euclidean space and then run the continuous greedy algorithm. We give two different kinds of continuous extensions, one is for polymatroid constraints and the other is for knapsack constraints, which might be of independent interest.

Nonlinear Laplacian for Digraphs and its Applications to Network Analysis. Yuichi Yoshida. WSDM 2016. [slides] [poster] [code]
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheegerlike inequality, which relates the conductance of a digraph and the smallest nonzero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of realworld networks and obtain encouraging results.

Nonconvex Compressed Sensing with the SumofSquares Method. Tasuku Soma and Yuichi Yoshida. SODA 2016. [slides (by Soma)]
We consider stable signal recovery in \(\ell_q\) quasinorm for \(0 < q \leq 1\). In this problem, given a measurement vector \(y = Ax\) for some unknown signal vector \(x \in \mathbb{R}^n\) and a known matrix \(A \in \mathbb{R}^{m \times n}\), we want to recover \(z \in \mathbb{R}^n\) with \(\x  z\_q = O(\x  x^*\_q)\) from a measurement vector, where \(x^*\) is the ssparse vector closest to \(x\) in \(\ell_q\) quasinorm. Although a small value of \(q\) is favorable for measuring the distance to sparse vectors, previous methods for \(q < 1\) involve \(\ell_q\) quasinorm minimization which is computationally intractable. In this paper, we overcome this issue by using the sumofsquares method, and give the first polynomialtime stable recovery scheme for a large class of matrices \(A\) in \(\ell_q\) quasinorm for any fixed constant \(0 < q \leq 1\).

Gowers Norm, Function Limits, and Parameter Estimation. Yuichi Yoshida. SODA 2016. [arXiv] [slides]
Let \(\{f_i:\mathbb{F}_p^i\to \{0,1\}\}\) be a sequence of functions, where \(p\) is a fixed prime and \(\mathbb{F}_p\) is the finite field of order \(p\). The limit of the sequence can be syntactically defined using the notion of ultralimit. Inspired by the Gowers norm, we introduce a metric over limits of function sequences, and study properties of it. One application of this metric is that it provides a simpler characterization of affineinvariant parameters of functions that are constantquery estimable than the previous one obtained by Yoshida (STOC'14). Using this characterization, we show that the property of being a function of a constant number of lowdegree polynomials and a constant number of factored polynomials (of arbitrary degrees) is constantquery testable if it is closed under blowingup. Examples of this property include the property of having a constant spectral norm and degreestructural properties with rank conditions.

Improved Approximation Algorithms for kSubmodular Function Maximization. Satoru Iwata, Shinichi Tanigawa and Yuichi Yoshida. SODA 2016. [arXiv]
This paper presents a polynomialtime \(1/2\)approximation algorithm for maximizing nonnegative ksubmodular functions. This improves upon the previous \(\max\{1/3, 1/(1 + a)\}\)approximation by Ward and Živný [18], where \(a = \max\{1,\sqrt{(k1)/4}\}\). We also show that for monotone \(k\)submodular functions there is a polynomialtime \(k/(2k – 1)\)approximation algorithm while for any \(\epsilon > 0\) a \(((k + 1)/2k + \epsilon)\)approximation algorithm for maximizing monotone \(k\)submodular functions would require exponentially many queries. In particular, our hardness result implies that our algorithms are asymptotically tight. We also extend the approach to provide constant factor approximation algorithms for maximizing skewbisubmodular functions, which were recently introduced as generalizations of bisubmodular functions.
2015

A Generalization of Submodular Cover via the Diminishing Return Property on the Integer Lattice. Tasuku Soma and Yuichi Yoshida. NIPS 2015. [slides (by Soma) (in Japanese)]
We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a logfactor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly \(O(n\log(nr)\log r)\), where \(n\) is the size of the ground set and \(r\) is the maximum value of a coordinate. The dependency on \(r\) is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.

Monotone kSubmodular Function Maximization with Size Constraints. Naoto Ohsaka and Yuichi Yoshida. NIPS 2015. [poster (by Ohsaka)]
A \(k\)submodular function is a generalization of a submodular function, where the input consists of \(k\) disjoint subsets, instead of a single subset, of the domain. Many machine learning problems, including influence maximization with \(k\) kinds of topics and sensor placement with \(k\) kinds of sensors, can be naturally modeled as the problem of maximizing monotone \(k\)submodular functions.In this paper, we give constantfactor approximation algorithms for maximizing monotone \(k\)submodular functions subject to several size constraints.The running time of our algorithms are almost linear in the domain size.We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.

Fully Dynamic Betweenness Centrality Maintenance on Massive Networks. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. Proceedings of the VLDB Endowment 2015. [code]
Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Among several importance measures, betweenness centrality, in particular, plays key roles in many real applications. Considerable effort has been made for developing algorithms for static settings. However, real networks today are highly dynamic and are evolving rapidly, and scalable dynamic methods that can instantly reflect graph changes into centrality values are required. In this paper, we present the first fully dynamic method for managing betweenness centrality of all vertices in a large dynamic network. Its main data structure is the weighted hyperedge representation of shortest paths called hypergraph sketch. We carefully design dynamic update procedure with theoretical accuracy guarantee. To accelerate updates, we further propose two auxiliary data structures called twoball index and specialpurpose reachability index. Experimental results using real networks demonstrate its high scalability and efficiency. In particular, it can reflect a graph change in less than a millisecond on average for a largescale web graph with 106M vertices and 3.7B edges, which is several orders of magnitude larger than the limits of previous dynamic methods.

On the Equivalence among Problems of Bounded Width. Yoichi Iwata and Yuichi Yoshida. ESA 2015. [arXiv]
In this paper, we introduce a methodology, called decompositionbased reductions, for showing the equivalence among various problems of boundedwidth. First, we show that the following are equivalent for any \(α > 0\):
 SAT can be solved in \(O^*(2^{\alpha \mathrm{tw}})\) time,
 3SAT can be solved in \(O^*(2^{\alpha \mathrm{tw}})\) time,
 Max 2SAT can be solved in \(O^*(2^{\alpha\mathrm{tw}})\) time,
 Independent Set can be solved in \(O^*(2^{\alpha\mathrm{tw}})\) time, and
 Independent Set can be solved in \(O^*(2^{\alpha\mathrm{cw}})\) time,
where \(\mathrm{tw}\) and \(\mathrm{cw}\) are the treewidth and cliquewidth of the instance, respectively. Then, we introduce a new parameterized complexity class EPNL, which includes Set Cover and TSP, and show that SAT, 3SAT, Max 2SAT, and Independent Set parameterized by pathwidth are EPNLcomplete. This implies that if one of these \textup{EPNL}complete problems can be solved in \(O^*(c^k)\) time, then any problem in \textup{EPNL} can be solved in \(O^*(c^k)\) time.

Learning Word Representations from Relational Graphs. Danushka Bollegala, Takanori Maehara, Yuichi Yoshida and Kenichi Kawarabayashi. AAAI 2015. [arXiv]
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are interconnected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the cooccurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words cooccur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.

Distributed Multiplicative Weights Methods for DCOP. Daisuke Hatano and Yuichi Yoshida. AAAI 2015.
We introduce a new framework for solving distributed constraint optimization problems that extend the domain of each variable into a simplex.We propose two methods for searching the extended domain for good assignments.The first one relaxes the problem using linear programming, finds the optimum LP solution, and rounds it to an assignment.The second one plays a costminimization game, finds a certain kind of equilibrium, and rounds it to an assignment.Both methods are realized by performing the multiplicative weights method in a distributed manner.We experimentally demonstrate that our methods have good scalability,and in particular, the second method outperforms existing algorithms in terms of solution quality and efficiency.

Efficient Topk ShortestPath Distance Queries on Large Networks by Pruned Landmark Labeling. Takuya Akiba, Takanori Hayashi, Nozomi Nori, Yoichi Iwata and Yuichi Yoshida. AAAI 2015.
We propose an indexing scheme for top\(k\) shortestpath distance queries on graphs, which is useful in a wide range of important applications such as networkaware search and link prediction. While considerable effort has been made for efficiently answering standard (top1) distance queries, none of previous methods can be directly extended for topk distance queries. We propose a new framework for top\(k\) distance queries based on 2hop cover and then present an efficient indexing algorithm based on the simple but effective recent notion of pruned landmark labeling. Extensive experimental results on real social and web graphs show the scalability, efficiency and robustness of our method. Moreover, we demonstrate the usefulness of top\(k\) distance queries through an application to link prediction.
2014

Robust Approximation of Temporal CSP. Suguru Tamaki and Yuichi Yoshida. APPROX 2014.
A temporal constraint language \(\Gamma\) is a set of relations with firstorder definitions in \((\mathbb{Q};<)\). Let \(\mathrm{CSP}(\Gamma)\) denote the set of constraint satisfaction problem instances with relations from \(\Gamma\). \(\mathrm{CSP}(\Gamma)\) admits robust approximation if, for any \(\varepsilon \geq 0\), given a \((1\varepsilon)\)satisfiable instance of \(\mathrm{CSP}(\Gamma)\), we can compute an assignment that satisfies at least a \((1f(\varepsilon))\)fraction of constraints in polynomial time. Here, \(f(\varepsilon)\) is some function satisfying \(f(0)=0\) and \(\lim_{\varepsilon \to 0}f(\varepsilon)=0\).
Firstly, we give a qualitative characterization of robust approximability: Assuming the Unique Games Conjecture, we give a necessary and sufficient condition on \(\Gamma\) under which \(\mathrm{CSP}(\Gamma)\) admits robust approximation. Secondly, we give a quantitative characterization of robust approximability: Assuming the Unique Games Conjecture, we precisely characterize how \(f(\varepsilon)\) depends on \(\varepsilon\) for each \(\Gamma\). We show that our robust approximation algorithms can be run in almost linear time.

Almost LinearTime Algorithms for Adaptive Betweenness Centrality using Hypergraph Sketches. Yuichi Yoshida. KDD 2014. [slides] [code]
Betweenness centrality measures the importance of a vertex by quantifying the number of times it acts as a midpoint of the shortest paths between other vertices. This measure is widely used in network analysis. In many applications, we wish to choose the \(k\) vertices with the maximum adaptive betweenness centrality, which is the betweenness centrality without considering the shortest paths that have been taken into account by alreadychosen vertices.
All previous methods are designed to compute the betweenness centrality in a fixed graph. Thus, to solve the above task, we have to run these methods \(k\) times. In this paper, we present a method that directly solves the task, with an almost linear runtime no matter how large the value of \(k\). Our method first constructs a hypergraph that encodes the betweenness centrality, and then computes the adaptive betweenness centrality by examining this graph. Our technique can be utilized to handle other centrality measures.
We theoretically prove that our method is very accurate, and experimentally confirm that it is three orders of magnitude faster than previous methods. Relying on the scalability of our method, we experimentally demonstrate that strategies based on adaptive betweenness centrality are effective in important applications studied in the network science and database communities.

Fast and Accurate Influence Maximization on Large Networks with Pruned MonteCarlo Simulations. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and KenIchi Kawarabayashi. AAAI 2014. [slides (by Ohsaka)] [code]
Influence maximization is a problem to find small sets of highly influential individuals in a social network to maximize the spread of influence under stochastic cascade models of propagation. Although the problem has been wellstudied, it is still highly challenging to find solutions of high quality in largescale networks of the day. While MonteCarlosimulationbased methods produce nearoptimal solutions with a theoretical guarantee, they are prohibitively slow for large graphs. As a result, many heuristic methods without any theoretical guarantee have been developed, but all of them substantially compromise solution quality. To address this issue, we propose a new method for the influence maximization problem. Unlike other recent heuristic methods, the proposed method is a MonteCarlosimulationbased method, and thus it consistently produces solutions of high quality with the theoretical guarantee. On the other hand, unlike other previous MonteCarlosimulationbased methods, it runs as fast as other stateoftheart methods, and can be applied to large networks of the day. Through our extensive experiments, we demonstrate the scalability and the solution quality of the proposed method.

Testing ForestIsomorphism in the Adjacency List Model. Mitsuru Kusumoto and Yuichi Yoshida. ICALP 2014. [arXiv] [slides (by Kusumoto)]
We consider the problem of testing if two input forests are isomorphic or are far from being so. An algorithm is called an \(\epsilon\)tester for forestisomorphism if given an oracle access to two forests \(G\) and \(H\) in the adjacency list model, with high probability, accepts if \(G\) and \(H\) are isomorphic and rejects if we must modify at least \(\epsilon n\) edges to make \(G\) isomorphic to \(H\). We show an \(\epsilon\)tester for forestisomorphism with a query complexity \(\mathrm{polylog}(n)\) and a lower bound of \(\Omega(\sqrt{\log n/\log \log n})\). Further, with the aid of the tester, we show that every graph property is testable in the adjacency list model with \(\mathrm{polylog}(n)\) queries if the input graph is a forest.

A Characterization of Locally Testable AffineInvariant Properties via Decomposition Theorems. Yuichi Yoshida. STOC 2014. [arXiv] [slides]
Let \(P\) be a property of function \(\mathbb{F}^n_p \to \{0, 1\}\) for a fixed prime \(p\). An algorithm is called a tester for \(P\) if, given a query access to the input function \(f\), with high probability, it accepts when \(f\) satisfies \(P\) and rejects when \(f\) is "far" from satisfying \(P\). In this paper, we give a characterization of affineinvariant properties that are (twosided error) testable with a constant number of queries. The characterization is stated in terms of decomposition theorems, which roughly claim that any function can be decomposed into a structured part that is a function of a constant number of polynomials, and a pseudorandom part whose Gowers norm is small. We first give an algorithm that tests whether the structured part of the input function has a specific form. Then we show that an affineinvariant property is testable with a constant number of queries if and only if it can be reduced to the problem of testing whether the structured part of the input function is close to one of a constant number of candidates.

Dynamic and Historical ShortestPath Distance Queries on Large Evolving Networks by Pruned Landmark Labeling. Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. WWW 2014.
We propose two dynamic indexing schemes for shortestpath and distance queries on large timeevolving graphs, which are useful in a wide range of important applications such as realtime networkaware search and network evolution analysis. To the best of our knowledge, these methods are the first practical exact indexing methods to efficiently process distance queries and dynamic graph updates.
We first propose a dynamic indexing scheme for queries on the last snapshot. The scalability and efficiency of its offline indexing algorithm and query algorithm are competitive even with previous static methods. Meanwhile, the method is dynamic, that is, it can incrementally update indices as the graph changes over time. Then, we further design another dynamic indexing scheme that can also answer two kinds of historical queries with regard to not only the latest snapshot but also previous snapshots.
Through extensive experiments on real and synthetic evolving networks, we show the scalability and efficiency of our methods. Specifically, they can construct indices from large graphs with millions of vertices, answer queries in microseconds, and update indices in milliseconds.

Approximation Schemes via SheraliAdams Hierarchy for Dense Constraint Satisfaction Problems and Assignment Problems. Yuichi Yoshida and Yuan Zhou. ITCS 2014.
We consider approximation schemes for the maximum constraint satis faction problems and the maximum assignment problems. Though they are NPHard in general, if the instance is “dense” or “locally dense”, then they are known to have approximation schemes that run in polynomial time or quasipolynomial time. In this paper, we give a unified method of showing these approximation schemes based on the SheraliAdams linear programming relaxation hierarchy. We also use our linear programmingbased framework to show new algorithmic results on the optimization version of the hypergraph isomorphism problem.

Parameterized Testability. Kazuo Iwama and Yuichi Yoshida. ITCS 2014. [slides]
This paper studies property testing for NP optimization problems with parameter \(k\) under the general graph model with an augmentation of random edge sampling capability. It is shown that a variety of such problems, including \(k\)Vertex Cover, \(k\)Feedback Vertex Set, \(k\)Multicut, \(k\)pathfreeness and \(k\)Dominating Set, are constanttime testable if \(k\) is constant. It should be noted that the first four problems are fixed parameter tractable (FPT) and it turns out that algorithmic techniques for their FPT algorithms (branchandbound search, color coding, etc.) are also useful for our testers. \(k\)Dominating Set is \(W[2]\)hard, but we can still test the property in constant time since the definition of \(\epsilon\)farness makes the problem trivial for nonsparse graphs that are the source of hardness for the original optimization problem. We also consider \(k\)Odd Cycle Transversal, which is another wellknown FPT problem, but we only give a sublineartime tester when \(k\) is a constant.

Linear Time FPT Algorithms via Network Flow. Yoichi Iwata, Keigo Oka and Yuichi Yoshida. SODA 2014. [arXiv] [video (talk by Iwata)]
In the area of parameterized complexity, to cope with NPHard problems, we introduce a parameter \(k\) besides the input size \(n\), and we aim to design algorithms (called FPT algorithms) that run in \(O(f(k)n^d)\) time for some function \(f(k)\) and constant \(d\). Though FPT algorithms have been successfully designed for many problems, typically they are not sufficiently fast because of huge \(f(k)\) and \(d\).
In this paper, we give FPT algorithms with small \(f(k)\) and d for many important problems including Odd Cycle Transversal and Almost 2SAT. More specifically, we can choose \(f(k)\) as a single exponential \((4^k)\) and d as one, that is, linear in the input size. To the best of our knowledge, our algorithms achieve linear time complexity for the first time for these problems.
To obtain our algorithms for these problems, we consider a large class of integer programs, called BIP2. Then we show that, in linear time, we can reduce BIP2 to Vertex Cover Above LP preserving the parameter \(k\), and we can compute an optimal LP solution for Vertex Cover Above LP using network flow.
Then, we perform an exhaustive search by fixing halfintegral values in the optimal LP solution for Vertex Cover Above LP. A bottleneck here is that we need to recompute an LP optimal solution after branching. To address this issue, we exploit network flow to update the optimal LP solution in linear time.
2013

Fast and Scalable Reachability Queries on Graphs by Pruned Labeling with Landmarks and Paths. Yosuke Yano, Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. CIKM 2013. [slides (by Yano)] [code]
Answering reachability queries on directed graphs is ubiquitous in many applications involved with graphshaped data as one of the most fundamental and important operations. However, it is still highly challenging to efficiently process them on largescale graphs. Transitiveclosurebased methods consume prohibitively large index space, and onlinesearchbased methods answer queries too slowly. Labelingbased methods attain both small index size and query time, but previous indexing algorithms are not scalable at all for processing large graphs of the day.
In this paper, we propose new labelingbased methods for reachability queries, referred to as pruned landmark labeling and pruned path labeling. They follow the frameworks of 2hop cover and 3hop cover, but their indexing algorithms are based on the recent notion of pruned labeling and improve the indexing time by several orders of magnitude, resulting in applicability to large graphs with tens of millions of vertices and edges. Our experimental results show that they attain remarkable tradeoffs between fast query time, small index size and scalability, which previous methods have never been able to achieve. Furthermore, we also discuss the ingredients of the efficiency of our methods by a novel theoretical analysis based on the graph minor theory.

LinearTime Enumeration of Maximal kEdgeConnected Subgraphs in Large Networks by Random Contraction. Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. CIKM 2013.
Capturing sets of closely related vertices from large networks is an essential task in many applications such as social network analysis, bioinformatics, and web link research. Decomposing a graph into \(k\)core components is a standard and efficient method for this task, but obtained clusters might not be wellconnected. The idea of using maximal \(k\)edgeconnected subgraphs was recently proposed to address this issue. Although we can obtain better clusters with this idea, the stateoftheart method is not efficient enough to process large networks with millions of vertices.
In this paper, we propose a new method to decompose a graph into maximal \(k\)edgeconnected components, based on random contraction of edges. Our method is simple to implement but improves performance drastically. We experimentally show that our method can successfully decompose large networks and it is thousands times faster than the previous method. Also, we theoretically explain why our method is efficient in practice. To see the importance of maximal \(k\)edgeconnected subgraphs, we also conduct experiments using realworld networks to show that many \(k\)core components have small edgeconnectivity and they can be decomposed into a lot of maximal \(k\)edgeconnected subgraphs.

Mining for Analogous Tuples from an EntityRelation Graph. Danushka Bollegala, Mitsuru Kushimoto, Yuichi Yoshida and Kenichi Kawarabayashi. IJCAI 2013.
The ability to recognize analogies is an important factor that is closely related to human intelligence. Verbal analogies have been used for evaluating both examinees at university entrance exams as well as algorithms for measuring relational similarity. However, relational similarity measures proposed so far are confined to measuring the similarity between pairs of words. Unfortunately, such pairwise approaches ignore the rich relational structure that exists in realworld knowledge bases containing millions of entities and semantic relations. We propose a method to efficiently identify analogous entity tuples from a given entityrelation graph. First, we present an efficient approach for extracting potential analogous tuples from a given entityrelation graph. Second, to measure the structural similarity between two tuples, we propose two types of kernel functions: vertexfeature kernels, and edgefeature kernels. Moreover, we combine those kernels to construct composite kernels that simultaneously consider both vertex and edge features. Experimental results show that our proposed method accurately identifies analogous tuples and significantly outperforms a stateoftheart pairwise relational similarity measure, extended to tuples.

Testing LinearInvariant Function Isomorphism. Karl Wimmer and Yuichi Yoshida. ICALP 2013.
A function \(f:\mathbb{F}^n_2 \to \{1,1\}\) is called linearisomorphic to \(g\) if \(f = g \circ A\) for some nonsingular matrix \(A\). In the \(g\)isomorphism problem, we want a randomized algorithm that distinguishes whether an input function \(f\) is linearisomorphic to \(g\) or far from being so.
We show that the query complexity to test \(g\)isomorphism is essentially determined by the spectral norm of \(g\). That is, if \(g\) is close to having spectral norm \(s\), then we can test gisomorphism with \(\mathrm{poly}(s)\) queries, and if \(g\) is far from having spectral norm \(s\), then we cannot test \(g\)isomorphism with \(o(\log s)\) queries. The upper bound is almost tight since there is indeed a function \(g\) close to having spectral norm s whereas testing \(g\)isomorphism requires \(\Omega(s)\) queries. As far as we know, our result is the first characterization of this type for functions. Our upper bound is essentially the KushilevitzMansour learning algorithm, modified for use in the implicit setting.
Exploiting our upper bound, we show that any property is testable if it can be wellapproximated by functions with small spectral norm. We also extend our algorithm to the setting where \(A\) is allowed to be singular.

An Algebraic Characterization of Testable CSPs. Arnab Bhattacharyya and Yuichi Yoshida. ICALP 2013. [slides]
Given an instance \(\mathcal{I}\) of a CSP, a tester for \(\mathcal{I}\) distinguishes assignments satisfying \(\mathcal{I}\) from those which are far from any assignment satisfying \(\mathcal{I}\). The efficiency of a tester is measured by its query complexity, the number of variable assignments queried by the algorithm. In this paper, we characterize the hardness of testing Boolean CSPs in terms of the algebra generated by the relations used to form constraints. In terms of computational complexity, we show that if a nontrivial Boolean CSP is sublinearquery testable (resp., not sublinearquery testable), then the CSP is in NL (resp., Pcomplete, ⊕Lcomplete or NLcomplete) and that if a sublinearquery testable Boolean CSP is constantquery testable (resp., not constantquery testable), then counting the number of solutions of the CSP is in P (resp., #Pcomplete).
Also, we conjecture that a CSP instance is testable in sublinear time if its Gaifman graph has bounded treewidth. We confirm the conjecture when a nearunanimity operation is a polymorphism of the CSP.

Fast Exact ShortestPath Distance Queries on Large Networks by Pruned Landmark Labeling. Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. SIGMOD 2013. [arXiv]
We propose a new exact method for shortestpath distance queries on largescale networks. Our method precomputes distance labels for vertices by performing a breadthfirst search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadthfirst searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadthfirst searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of largescale realworld networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

Testing SubdivisionFreeness: – Property Testing Meets Structural Graph Theory –. Kenichi Kawarabayashi and Yuichi Yoshida. STOC 2013.
Testing a property \(P\) of graphs in the boundeddegree model deals with the following problem: given a graph \(G\) of bounded degree \(d\), we should distinguish (with probability \(2/3\), say) between the case that \(G\) satisfies \(P\) and the case that one should add/remove at least \(\epsilon dn\) edges of \(G\) to make it satisfy \(P\). In sharp contrast to property testing of dense graphs, which is relatively well understood, only few properties are known to be testable with a constant number of queries in the boundeddegree model. In particular, no global monotone (i.e,_{closed} under edge deletions) property that expander graphs can satisfy has been shown to be testable in constant time so far.
In this paper, we identify for the first time a natural family of global monotone property that expander graphs can satisfy and can be efficiently tested in the bounded degree model. Specifically, we show that, for any integer \(t \geq 1\), \(K_t\)subdivisionfreeness is testable with a constant number of queries in the boundeddegree model. This property was not previously known to be testable even with \(o(n)\) queries. Note that an expander graph with all degree less than \(t1\) does not have a \(K_t\)subdivision.
The proof is based on a novel combination of some results that develop the framework of partitioning oracles, together with structural graph theory results that develop the seminal graph minor theory by Robertson and Seymour. As far as we aware, this is the first result that bridges property testing and structural graph theory. Although we know a rough structure for graphs without \(H\)minors from the famous graph minor theory by Robertson and Seymour, there is no corresponding structure theorem for graphs without \(H\)subdivisions so far, even \(K_5\)subdivisionfree graphs. Therefore, subdivisions and minors are very different in a graph structural sense.

Exact and Approximation Algorithms for the Constraint Satisfaction Problem over the Point Algebra. Iwata Yoichi and Yuichi Yoshida. STACS 2013.
We study the constraint satisfaction problem over the point algebra. In this problem, an instance consists of a set of variables and a set of binary constraints of forms \((x < y)\), \((x \leq y)\), \((x \neq y)\) or \((x = y)\). Then, the objective is to assign integers to variables so as to satisfy as many constraints as possible.This problem contains many important problems such as Correlation Clustering, Maximum Acyclic Subgraph, and Feedback Arc Set. We first give an exact algorithm that runs in \(O^*(3^{\frac{\log 5}{\log 6}n})\) time, which improves the previous best \(O^*(3^n)\) obtained by a standard dynamic programming. Our algorithm combines the dynamic programming with the splitandlist technique. The splitandlist technique involves matrix products and we make use of sparsity of matrices to speed up the computation. As for approximation, we give a \(0.4586\)approximation algorithm when the objective is maximizing the number of satisfied constraints, and give an \(O(\log n \log \log n)\)approximation algorithm when the objective is minimizing the number of unsatisfied constraints.
2012

ConstantTime Approximation Algorithms for the Optimum Branching Problem on Sparse Graphs. Mitsuru Kusumoto, Yuichi Yoshida and Hiro Ito. ICNC 2012.
We propose a constanttime algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edgeweighted digraph \(G\) of average degree \(d\) whose weights are real values in \([0, 1]\) is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in \(G\) with an absolute error of at most εn with query complexity \(O(d/\epsilon^3)\), where \(n\) is the number of vertices. We also show a lower bound of \(\Omega(d/\epsilon^2)\). Additionally, our algorithm can be modified to run with query complexity \(O(1/\epsilon^4)\) for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with \(d = \Omega(n)\) edges. In contrast, we show that it requires \(\Omega(n)\) queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.

Partially Symmetric Functions are Efficiently IsomorphismTestable. Eric Blais, Amit Weinstein and Yuichi Yoshida. FOCS 2012. [arXiv] [slides (by Weinstein)] [video (talk by Weinstein)]
Given a Boolean function \(f\), the \(f\)isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to \(f\) up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions \(f\) we can test \(f\)isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functions  functions invariant to the reordering of all but a constant number of their variables  are efficiently isomorphismtestable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphismtestable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearlyoptimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testingbyimplicitlearning approach to complete the proof that partially symmetric functions are efficiently isomorphismtestable.

Approximation Guarantees for the Minimum Linear Arrangement Problem by Higher Eigenvalues. Suguru Tamaki and Yuichi Yoshida. APPROX 2012.
Given an undirected graph \(G = (V,E)\) and positive edge weights \(\{w_e\}_{e \in E}\), a linear arrangement is a permutation \(\pi: V \to [n]\). The value of the arrangement is \(\mathrm{val}(G,\pi):=\frac{1}{n}\sum_{e=\{u,v\}}w_e\pi(u)\pi(v)\). In the minimum linear arrangement problem (MLA), the goal is to find a linear arrangement \(\pi^*\) that achieves \(\mathrm{val}(G,\pi^*) = \mathrm{MLA}(G): = \min_{\pi}\mathrm{val}(G,\pi)\).
In this paper, we show that for any \(\epsilon > 0\) and positive integer \(r\), there is an \(O(nr/\epsilon)\)time randomized algorithm which, given a graph \(G\), returns a permutation \(\pi\) such that \[ \mathrm{val}(G,\pi) \leq \left(1 + \frac{2}{(1\epsilon)\lambda_{r+1}(\mathcal{L})}\right)\mathrm{MLA}(G) + O\left(\frac{\log n}{\sqrt{n}}\sum_{e \in E}w_e\right) \] with high probability. Here \(\mathcal{L}\) is the normalized Laplacian of \(G\) and \(\lambda_r(\mathcal{L})\) is the \(r\)th eigenvalue of \(\mathcal{L}\). Our algorithm gives a constant factor approximation for regular graphs that are weak expanders.

ConstantTime Algorithms for Sparsity Matroids. Hiro Ito, Shinichi Tanigawa and Yuichi Yoshida. ICALP 2012. [slides]
A graph \(G = (V, E)\) is called \((k, \ell)\)sparse if \(F ≤ kV(F) \ell\) for any \(F \subseteq E\) with \(F \neq \emptyset\). Here, \(V(F)\) denotes the set of vertices incident to \(F\). A graph \(G = (V,E)\) is called \((k,\ell)\)full if \(G\) contains a \((k,\ell)\)sparse subgraph with \(V\) vertices and \(kV  \ell\) edges. The family of edge sets of \((k,\ell)\)sparse subgraphs forms a family of independent sets of a matroid on \(E\), known as the sparsity matroid of \(G\). In this paper, we give a constanttime algorithm that approximates the rank of the sparsity matroid associated with a degreebounded undirected graph. This algorithm leads to a constanttime tester for \((k,\ell)\)fullness in the boundeddegree model, (i.e., we can decide with high probability whether the input graph satisfies a property or far from it). Depending on the values of \(k\) and \(\ell\), our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed in a unified manner.
Based on this result, we also propose a constanttime tester for \((k,\ell)\)edgeconnectedorientability in the boundeddegree model, where an undirected graph \(G\) is called \((k,\ell)\)edgeconnectedorientable if there exists an orientation \(\vec{G}\) of \(G\) with a vertex \(r\in V\) such that \(\vec{G}\) contains \(k\) arcdisjoint dipaths from r to each vertex \(v \in V\) and \(\ell\) arcdisjoint dipaths from each vertex \(v \in V\) to \(r\).
A tester is called a onesided error tester for \(P\) if it always accepts a graph satisfying \(P\). We show, for any \(k \geq 2\) and (proper) \(\ell \geq 0\), every onesided error tester for \((k,\ell)\)fullness and \((k,\ell)\)edgeconnectedorientability requires \(\Omega(n)\) queries.

Testing List HHomomorphisms. Yuichi Yoshida. CCC 2012. [slides]
Let \(H\) be an undirected graph. In the List \(H\)Homomorphism Problem, given an undirected graph \(G\) with a list constraint \(L(v) \subseteq V(H)\) for each variable \(v∈V(G)\), the objective is to find a list \(H\)homomorphism \(f:V(G)\to V(H)\), that is, \(f(v)\in L(v)\) for every \(v\in V(G)\) and \((f(u),f(v)) \in E(H)\) whenever \((u,v)\in E(G)\). We consider the following problem: given a map \(f:V(G) \to V(H)\) as an oracle access, the objective is to decide with high probability whether \(f\) is a list \(H\)homomorphism or \textit{far} from any list \(H\)homomorphisms. The efficiency of an algorithm is measured by the number of accesses to \(f\). In this paper, we classify graphs \(H\) with respect to the query complexity for testing list \(H\)homomorphisms and show the following trichotomy holds: (i) List \(H\)homomorphisms are testable with a constant number of queries if and only if \(H\) is a reflexive complete graph or an irreflexive complete bipartite graph. (ii) List \(H\)homomorphisms are testable with a sublinear number of queries if and only if \(H\) is a biarc graph. (iii) Testing list \(H\)homomorphisms requires a linear number of queries if \(H\) is not a biarc graph.

ConstantTime Approximation Algorithms for the Knapsack Problem. Hiro Ito, Susumu Kiyoshima and Yuichi Yoshida. TAMC 2012.
In this paper, we give a constanttime approximation algorithm for the knapsack problem. Using weighted sampling, with which we can sample items with probability proportional to their profits, our algorithm runs with query complexity \(O(\epsilon^{ 4} \log \epsilon^{1})\), and it approximates the optimal profit with probability at least \(2/3\) up to error at most an \(\epsilon\)fraction of the total profit. For the subset sum problem, which is a special case of the knapsack problem, we can improve the query complexity to \(O(\epsilon^{1} \log \epsilon^{1})\).

Algorithms and Complexity of Generalized River Crossing Problems. Stefan Langerman, Hiro Ito and Yuichi Yoshida. FUN 2012.
Three men, each with a sister, must cross a river using a boat which can carry only two people, so that a woman whose brother is not present is never left in the company of another man. This is a very famous problem appeared in Latin book “Problems to Sharpen the Young,” one of the earliest collections on recreational mathematics. This paper considers a generalization of such “RiverCrossing Problems.” It shows that the problem is NPhard if the boat size is three, and a large class of subproblems can be solved in polynomial time if the boat size is two. It’s also conjectured that determining whether a river crossing problem has a solution without bounding the number of transportations, can be solved in polynomial time even when the size of the boat is large.

Linear programming, width1 CSPs, and robust satisfaction. Gabor Kun, Ryan O’Donnell, Suguru Tamaki, Yuichi Yoshida and Yuan Zhou. ITCS 2012.
We say that an algorithm robustly decides a constraint satisfaction problem \(\Pi\) if it distinguishes atleast\((1\epsilon)\)satisfiable instances from lessthan\((1  r(\epsilon))\)satisfiable instances for some function \(r(\epsilon)\) with \(r(\epsilon) \to 0\) as \(\epsilon \to 0\). In this paper we show that the canonical linear programming relaxation robustly decides \(\Pi\) if and only if \(\Pi\) has "width 1" (in the sense of Feder and Vardi).

Studies on ConstantTime Algorithms for BoundedDegree Graphs and Constraint Satisfaction Problems. Yuichi Yoshida. Ph.D. thesis, Kyoto University, 2012.
We are concerned with constanttime algorithms, which aim for solving decision problems or optimization problems in time independent of sizes of inputs. To obtain such efficient algorithms, we relax the concept of decision and optimization as follows. An algorithm is said to test a property if, with high probability, it decides whether an input satisfies the property or we must modify an \(\epsilon\)fraction of the input to make it satisfy the property. An algorithm is said to \((\alpha, \epsilon n)\)approximate an optimization problem if, with high probability, it outputs an \((\alpha, \epsilon n)\)approximation to the optimal value. Here, a value \(x\) is called an \((\alpha, \epsilon n)\)approximation to \(x^*\) if \(\alpha x^*  \epsilon n \leq x \leq x^*\) where \(n\) is the size of an input. By relaxing this way, various problems become solvable in constant time.
Constanttime algorithms were first studied for algebraic properties of functions for applications to computational complexity. However, because of their theoretical richness and potential to practical applications, they have been studied for many different areas in the last decade. In this thesis, we concentrate on problems on boundeddegree graphs and constraint satisfaction problems (CSPs). In this setting, each vertex (or variable) is incident to a constant number of edges (or constraints). Boundeddegree instances are practically important since realworld problems often give rise to such instances, and they are also theoretically important since we become able to study various fundamental problems that are trivial in general instances.
We first give a constanttime \((1, \epsilon n)\)approximation algorithm for the maximum matching problem. Compared to previous works, our algorithm runs exponentially faster with respect to \(1/\epsilon\). Using the algorithm as a building block, we show various related problems are also testable in constant time. A graph \(G = (V , E )\) is called \((k,l)\)sparse if \(F \leq kV(F)l\) for any nonempty \(F \subseteq E\). Here, \(V(F)\) denotes the set of vertices incident to \(F\). We give a constanttime algorithm that tests a graph \(G = (V, E)\) has a \((k, l)\)sparse subgraph with \(V\) vertices and \(kV  − l\) edges. Depending on values of \(k\) and \(l\), our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed. Based on this result, we also give a constanttime testing algorithm for (k, l)edgeconnected orientability, where a graph \(G\) is called \((k, l)\)edgeconnectedorientable if there exists an orientation \(\vec{G}\) of \(G\) with a vertex \(r \in V\) such that \(\vec{G}\) contains \(k\) arcdisjoint dipaths from \(r\) to each vertex \(v \in V\) and \(l\) arcdisjoint dipaths from each vertex \(v \in V\) to \(r\).
Secondly, we consider constanttime algorithms for CSPs. Let \(\Lambda\) be a set of relations. Then, an instance of \(\mathrm{CSP}(\Lambda)\) is a pair of variables and constraints over the variables such that each constraint belongs to \(\Lambda\). In the problem, called \(\mathrm{MaxCSP}(\Lambda)\), given an instance of \(\mathrm{CSP}(\Lambda)\), we want to compute an assignment that maximizes the number of satisfied constraints. Here, we give the “best” constanttime approxi mation algorithm for every \(\mathrm{MaxCSP}(\Lambda)\). Specifically, for each \(\Lambda\), there exists some constant \(\alpha_\Lambda\) with the following properties. (1) for any \(\epsilon > 0\), there exists a constanttime \((\alpha_\Lambda  \epsilon, \epsilon n)\)approximation algorithm for \(\mathrm{MaxCSP}(\Lambda)\). (2) for any \(\epsilon > 0\), there exists \(\delta > 0\) such that no constanttime \((\alpha_\Lambda + \epsilon, \delta n)\)approximation algorithm for \(\mathrm{MaxCSP}(\Lambda)\) exists. Here, \(\alpha_\Lambda\) coincides with the integrality gap of a standard LP relaxation for \(\mathrm{MaxCSP}(\Lambda)\).
We consider testing satisfiability of CSP instances. Using the result above, we can derive the characterization of the constanttime testability of \(\mathrm{CSP}(\Lambda)\). Let \(\mathrm{opt}(I)\) and \(\mathrm{lp}(I)\) denote the optimal value and the LP value for an instance \(I\), respectively. Then, we have the followings. (i) if any instance of \(\mathrm{CSP}(\Lambda)\) with \(lp(I) \geq 1−\epsilon\) satisfies \(\mathrm{opt}(I) \geq 1 − r(\epsilon)\) where \(\lim_{\epsilon \to 0} r(\epsilon) = 0\), then \(CSP(\Lambda)\) is testable in constant time. (ii) if there is an instance \(I\) of \(\mathrm{CSP}(\Lambda)\) with \(\mathrm{lp}(I) = 1\) and \(\mathrm{opt}(I) < 1\), then \(\mathrm{CSP}(\Lambda)\) is not testable in constant time.
To achieve a combinatorial characterization of the testability of \(\mathrm{CSP}(\Lambda)\), we show a close connection between constanttime testability and the notion of width 1. We say \(\mathrm{CSP}(\Lambda)\) has width 1 if satisfiability of instances of \(\mathrm{CSP}(\Lambda)\) can be decided by a certain simple propagation algorithm. Then, we prove that any CSP of width 1 satisfies (i), which means that any such CSP is testable in constant time.
In a subsequent work (Kun et al. ITCS 2012), which includes the part of the present thesis, we have shown that any CSP of width more than 1 satisfies (ii). Thus, a CSP is constanttime testable if and only if it has width 1.
2011

Algorithms for Finding a Maximum NonkLinked Graph. Yusuke Kobayashi and Yuichi Yoshida. ESA 2011.
A graph with at least \(2k\) vertices is said to be klinked if for any ordered \(k\)tuples \((s_1, \ldots,s_k)\) and \((t_1, \ldots, t_k)\) of \(2k\) distinct vertices, there exist pairwise vertexdisjoint paths \(P_1,\ldots, P_k\) such that \(P_i\) connects si and ti for \(i = 1, \ldots, k\). For a given graph \(G\), we consider the problem of finding a maximum induced subgraph of \(G\) that is not \(k\)linked. This problem is a common generalization of computing the vertexconnectivity and testing the \(k\)linkedness of \(G\), and it is closely related to the concept of \(H\)linkedness. In this paper, we give the first polynomialtime algorithm for the case of \(k = 2\), whereas a similar problem that finds a maximum induced subgraph without 2vertexdisjoint paths connecting fixed terminal pairs is NPhard. For the case of general \(k\), we give an \((8k − 2)\)additive approximation algorithm. We also investigate the computational complexities of the edgedisjoint case and the directed case.

Property Testing for Cyclic Groups and Beyond. Francois Le Gall and Yuichi Yoshida. COCOON 2011.
This paper studies the problem of testing if an input (Γ,¿), where Γ is a finite set of unknown size and ¿ is a binary operation over Γ given as an oracle, is close to a specified class of groups. Friedl et al. (Proc. of STOC, 2005) have constructed an efficient tester using poly(logΓ) queries for the class of abelian groups. We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: Ω(Γ1/6) queries are necessary to test if the input is close to a cyclic group, and Ω(Γ c ) queries for some constant c are necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed integer k¿1. We also show that knowledge of the size of the ground set Γ helps only for k=1, in which case we construct an efficient tester using poly(logΓ) queries; for any other value k¿2 the query complexity remains Ω(Γ c ). All our upper and lower bounds hold for both the edit distance and the Hamming distance. These are, to the best of our knowledge, the first nontrivial lower bounds for such grouptheoretic problems in the property testing model and, in particular, they imply the first exponential separations between the classical and quantum query complexities of testing closeness to classes of groups.

Lower Bounds on the Query Complexity for Testing Bounded Degree CSPs. Yuichi Yoshida. CCC 2011. [slides]
In this paper, we consider lower bounds on the query complexity for testing CSPs in the boundeddegree model. We mainly consider Boolean CSPs allowing literals. First, for any "symmetric" predicate P : {0, 1}k → {0,1} except EQU where k ≥ 3, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances (P1 (0)/2k  ϵ)far from satisfiability requires Ω(n1/2+δ) queries where n is the number of variables and δ >; 0 is a constant that depends on P and e. This breaks a natural lower bound Ω(n1/2), which is obtained by the birthday paradox. We also show that every onesided error tester requires Ω(n) queries for such P. These results are hereditary in the sense that the same results hold for any predicate Q such that P1 (1) ⊆ Q1(1). For EQU, we give a onesided error tester whose query complexity is O̅(n1/2). Also, for 2XOR (or, equivalently E2LIN2), we show an Ω(n1/2+δ) lower bound for distinguishing instances between eclose to and (1/2 ϵ)far from satisfiability. Next, for the general kCSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances (1  2k/2k  ϵ)far from satisfiability requires Ω(n) queries. The matching NPhardness is not known, even assuming the Unique Games Conjecture or the dto1 Conjecture. As a corollary, for Maximum Independent Set on graphs with n vertices and a degree bound d, we show that every approximation algorithm within a factor d/poly log d and an additive error of ϵn requires Ω(n) queries. Previously, only superconstant lower bounds were known.

Raghavendra (STOC 2008) gave an elegant and surprising result: if Khot's Unique Games Conjecture (STOC 2002) is true, then for every constraint satisfaction problem (CSP), the best approximation ratio is attained by a certain simple semidefinite programming and a rounding scheme for it.
In this paper, we show that similar results hold for constanttime approximation algorithms in the boundeddegree model. Specifically, we present the following: (i) For every CSP, we construct an oracle that serves an access, in constant time, to a nearly optimal solution to a basic LP relaxation of the CSP. (ii) Using the oracle, we give a constanttime rounding scheme that achieves an approximation ratio coincident with the integrality gap of the basic LP. (iii) Finally, we give a generic conversion from integrality gaps of basic LPs to hardness results. All of those results are unconditional. Therefore, for every boundeddegree CSP, we give the best constanttime approximation algorithm among all.
A CSP instance is called εfar from satisfiability if we must remove at least an εfraction of constraints to make it satisfiable. A CSP is called testable if there is a constanttime algorithm that distinguishes satisfiable instances from εfar instances with probability at least 2/3. Using the results above, we also derive, under a technical assumption, an equivalent condition under which a CSP is testable in the boundeddegree model.
2010

A Query Efficient NonAdaptive Long Code Test with Perfect Completeness. Suguru Tamaki and Yuichi Yoshida. RANDOM 2010.
Long Code testing is a fundamental problem in the area of property testing and hardness of approximation. In this paper, we study how small the soundness \(s\) of the Long Code test with perfect completeness can be by using nonadaptive \(q\) queries. We show that \(s = (2q + 3)/2^q\) is possible, where \(q\) is of the form math formula for \(2^{k}1\) any integer \(k > 2\).

Testing Outerplanarity of Bounded Degree Graphs. Yuichi Yoshida and Hiro Ito. RANDOM 2010.
We present an efficient algorithm for testing outerplanarity of graphs in the boundeddegree model. In this model, given a graph \(G\) with degree bound \(d\), we should distinguish with high probability the case that \(G\) is outerplanar from the case that modifying at least an \(\epsilon\)fraction of the edge set is necessary to make \(G\) outerplanar. Our algorithm runs in time polynomial in \(d\) and \(1/\epsilon\) only. To achieve the time complexity, we exploit the treelike structure inherent to an outerplanar graph using the microtree/macrotree decomposition of a tree. As a corollary, we show that the property of being a cactus is testable in time polynomial in \(d\) and \(1/\epsilon\).

Conjunctive Filter: Breaking the Entropy Barrier. Daisuke Okanohara and Yuichi Yoshida. ALENEX 2010.
We consider a problem for storing a map that associates a key with a set of values. To store \(n\) values from the universe of size \(m\), it requires log 2(m/n) bits of space, which can be approximated as \((1.44 + n) \log 2 m/n\) bits when n m. If we allow ε fraction of errors in outputs, we can store it with roughly n log21/ε bits, which matches the entropy bound. Bloom filter is a wellknown example for such data structures. Our objective is to break this entropy bound and construct more spaceefficient data structures. In this paper, we propose a novel data structure called a conjunctive filter, which supports conjunctive queries on k distinct keys for fixed k. Although a conjunctive filter cannot return the set of values itself associated with a queried key, it can perform conjunctive queries with O(1/√m) fraction of errors. Also, the consumed space is n/k log2m bits and it is significantly smaller than the entropy bound n/2 log2m when k ≥ 3. We will show that many problems can be solved by using a conjunctive filter such as fulltext search and database join queries. Also, we conducted experiments using a realworld data set, and show that a conjunctive filter answers conjunctive queries almost correctly using about 1/2 ∼ 1/4 space as the entropy bound.

QueryNumber Preserving Reductions and Linear Lower Bounds for Testing. Yuichi Yoshida and Hiro Ito. IEICE 2010.
In this paper, we study lower bounds on the query complexity of testing algorithms for various problems. Given an oracle that returns information of an input object, a testing algorithm distinguishes the case that the object has a given property \(P\) from the case that it has a large distance to having \(P\) with probability at least \(2/3\). The query complexity of an algorithm is measured by the number of accesses to the oracle. We introduce two reductions that preserve the query complexity. One is derived from the gappreserving local reduction and the other is from the Lreduction. By using the former reduction, we show linear lower bounds on the query complexity for testing basic NPcomplete properties, i.e., 3edgecolorability, directed Hamiltonian path/cycle, undirected Hamiltonian path/cycle, 3dimensional matching and NPcomplete generalized satisfiability problems. Also, using the second reduction, we show a linear lower bound on the query complexity of approximating the size of the maximum 3dimensional matching.
2009

An Improved ConstantTime Approximation Algorithm for Maximum Matchings. Yuichi Yoshida, Masaki Yamamoto and Hiro Ito. STOC 2009
This paper studies approximation algorithms for problems on degreebounded graphs. Let \(n\) and \(d\) be the number of vertices and the degree bound, respectively. This paper presents an algorithm to approximate the size of some maximal independent set with additive error \(\epsilon n\) whose running time is \(O(d^2)\). Using this algorithm, it also shows that there are approximation algorithms for many other problems, e.g., the maximum matching problem, the minimum vertex cover problem, and the minimum set cover problem, that run exponentially faster than existing algorithms with respect to d and 1/ε. Its approximation algorithm for the maximum matching problem can be transformed to a testing algorithm for the property of having a perfect matching with twosided error. On the contrary, it also shows that every onesided error tester for the property requires at least \(\Omega(n)\) queries.

Fast Calculation of Levenshtein Distance on the Cell Processor. Shun Sakuraba and Yuichi Yoshida. SACSIS 2009. (in Japanese)
Cell Challenge 2009の規定課題において、Cell Broadband Engineを用いて文字列の編集距 離を計算するプログラムを作成するという課題が出題された。本稿では我々が規定課題において考案した最適化技法について報告する。基本となるアルゴリズムはビット並列化法と呼ばれるものであり、実行委員会によるサンプルプログラムと比べて約 315 倍の高速化を達成した。
2008

Property Testing on kVertex Connectivity of Graphs. Yuichi Yoshida and Hiro Ito. ICALP 2008.
Wepresentanalgorithmfortestingthekvertexconnectivity of graphs with given maximum degree. The time complexity of the algorithm is independent of the number of vertices and edges of graphs. A graph \(G\) with \(n\) vertices and maximum degree at most \(d\) is called \(\epsilon\)far from \(k\)vertexconnectivity when at least \(\frac{\epsilon dn}{2}\) edges must be added to or removed from \(G\) to obtain a \(k\)vertexconnected graph with maximum degree at most \(d\). The algorithm always accepts every graph that is \(k\)vertexconnected and rejects every graph that is \(\epsilon\)far from \(k\)vertexconnectivity with a probability of at least \(2/3\). The algorithm runs in \(O\left(d \left(\frac{c}{\epsilon d}\right)^k \log \frac{1}{\epsilon d}\right)\) time (\(c > 1\) is a constant) for given \((k − 1)\)vertexconnected graphs, and \(O\left( d \left(\frac{ck}{\epsilon d}\right)^k \log \frac{k}{\epsilon d}\right)\) time (\(c > 1\) is a constant) for given general graphs. It is the first constanttime \(k\)vertexconnectivity testing algorithm for general \(k \geq 4\).