Publications
2017

An Iterative Approach for the Global Estimation of Sentence Similarity. Tomoyuki Kajiwara, Danushka Bollegala, Yuichi Yoshida and Kenichi Kawarabayashi. PLOS ONE 2017.
Measuring the similarity between two sentences is often difficult due to their small lexical overlap. Instead of focusing on the sets of features in two given sentences between which we must measure similarity, we propose a sentence similarity method that considers two types of constraints that must be satisfied by all pairs of sentences in a given corpus. Namely, (a) if two sentences share many features in common, then it is likely that the remaining features in each sentence are also related, and (b) if two sentences contain many related features, then those two sentences are themselves similar. The two constraints are utilized in an iterative bootstrapping procedure that simultaneously updates both word and sentence similarity scores. Experimental results on SemEval 2015 Task 2 dataset show that the proposed iterative approach for measuring sentence semantic similarity is significantly better than the noniterative counterparts.

Streaming Algorithms for Maximizing Monotone Submodular Functions under a Knapsack Constraint. ChienChung Huang, Naonori Kakimura and Yuichi Yoshida. APPROX 2017.
In this paper, we consider the problem of maximizing a monotone submodular function subject to a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access only to a small fraction of the data stored in primary memory. For this problem, we propose a \((0.363  \epsilon)\)approximation algorithm, requiring only a single pass through the data; moreover, we propose a \((0.4  \epsilon)\)approximation algorithm requiring a constant number of passes through the data. The required memory space of both algorithms depends only on the size of the knapsack capacity and ε.

Landmark indexing for Evaluation of LabelConstrained Reachability Queries. Lucien Valstar, George Fletcher and Yuichi Yoshida. SIGMOD 2017. [slides] [code]
Consider a directed edgelabeled graph, such as a social net work or a citation network. A fundamental query on such data is to determine if there is a path in the graph from a given source vertex to a given target vertex, using only edges with labels in a restricted subset of the edge labels in the graph. Such labelconstrained reachability (LCR) queries play an important role in graph analytics, for example, as a core fragment of the socalled regular path queries which are supported in practical graph query languages such as the W3C’s SPARQL 1.1, Neo4j’s Cypher, and Oracle’s PGQL. Current solutions for LCR evaluation, however, do not scale to large graphs which are increasingly common in a broad range of application domains. In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmarkbased indexes for large graphs. We show through extensive experiments that our indexes are significantly smaller than stateoftheart LCR indexing techniques, while supporting up to orders of magnitude faster query evaluation times. Our complete C++ codebase is available as open source for further research.

Portfolio Optimization for Influence Spread. Naoto Ohsaka and Yuichi Yoshida. WWW 2017. [slides (by Ohsaka)]
Motivated by viral marketing, stochastic diffusion processes that model influence spread on a network have been studied intensively. The primary interest in such models has been to find a seed set of a fixed size that maximizes the expected size of the cascade from it. Practically, however, it is not desirable to have the risk of ending with a small cascade, even if the expected size of the cascade is large. To address this issue, we adopt conditional value at risk (CVaR) as a risk measure, and propose an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR. Using realworld social networks, we demonstrate that the portfolio computed by our algorithm has a significantly better CVaR than seed sets computed by other baseline methods.

Regret Ratio Minimization in Multiobjective Submodular Function Maximization. Tasuku Soma and Yuichi Yoshida. AAAI 2017. [slides (by Soma)]
Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective functions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a nontrivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable tradeoff between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.

Nonmonotone DRSubmodular Function Maximization. Tasuku Soma and Yuichi Yoshida. AAAI 2017. [arXiv] [poster]
We consider nonmonotone DRsubmodular function maximization, where DRsubmodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing nonmonotone DRsubmodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a \(1/(2+\epsilon)\)approximation algorithm with a running time of roughly \(O(n/\epsilon \log^2B)\), where \(n\) is the size of the ground set, B is the maximum value of a coordinate, and \(\epsilon > 0\) is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and realworld datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

Computing Least Cores of Supermodular Cooperative Games. Daisuke Hatano and Yuichi Yoshida. AAAI 2017
One of the goals of a cooperative game is to compute a valuedivision to the players from which they have no incentive todeviate. This concept is formalized as the notion of the core.To obtain a value division that motivates players to cooperate to a greater extent or that is more robust under noise, the notions of the strong least core and the weak least core have been considered. In this paper, we characterize the strong and the weak least cores of supermodular cooperative games using the theory of minimizing crossing submodular functions. We then apply our characterizations to two representative supermodular cooperative games, namely, the induced subgraph game generalized to hypergraphs and the airport game. For these games, we derive explicit forms of the strong and weak least core values, and provide polynomialtime algorithms that compute value divisions in the strong and weak least cores.

RandomRadius Ball Method for Estimating Closeness Centrality. Wataru Inariba, Takuya Akiba and Yuichi Yoshida. AAAI 2017.
In the analysis of realworld complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distancebased centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the randomradius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on realworld networks.
2016

Minimizing Quadratic Functions in Constant Time. Kohei Hayashi and Yuichi Yoshida. NIPS 2016. [arXiv] [poster]
A samplingbased optimization method for quadratic functions is proposed. Our method approximately solves the following \(n\)dimensional quadratic minimization problem in constant time, which is independent of \(n\): \(z^*=\min_{\mathbf{v} \in \mathbb{R}^n}\langle \mathbf{v},A\mathbf{v}\rangle+n\langle \mathbf{v}\mathrm{diag}(\mathbf{d})\mathbf{v}\rangle+n\langle \mathbf{b},\mathbf{v}\rangle \), where \(A \in \mathbb{R}^{n \times n}\) a matrix and \(\mathbf{d},\mathbf{b} \in \mathbb{R}^n\) are vectors. Our theoretical analysis specifies the number of samples \(k(\delta,\epsilon)\) such that the approximated solution \(z\) satisfies \(zz^*=O(\epsilon n^2)\) with probability \(1\delta\). The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.

Testing List HHomomorphisms. Yuichi Yoshida. Computational Complexity 2016. [slides]
In the List \(H\)Homomorphism Problem, for a graph \(H\) that is a parameter of the problem, an instance consists of an undirected graph \(G\) with a list constraint \(L(v) \subseteq V(H)\) for each variable \(v \in V(G)\), and the objective is to determine whether there is a list \(H\)homomorphism \(f:V(G) \to V(H)\), that is, \(f(v) \in L(v)\) for every \(v \in V(G)\) and \((f(u),f(v)) \in E(H)\) whenever \((u,v) \in E(G)\). We consider the problem of testing list \(H\)homomorphisms in the following weighted setting: An instance consists of an undirected graph \(G\), list constraints \(L\), weights imposed on the vertices of \(G\), and a map \(f:V(G) \to V(H)\) given as an oracle access. The objective is to determine whether \(f\) is a list \(H\)homomorphism or far from any list \(H\)homomorphism. The farness is measured by the total weight of vertices \(v \in V(G)\) for which \(f(v)\) must be changed so as to make \(f\) a list \(H\)homomorphism. In this paper, we classify graphs H with respect to the number of queries to \(f\) required to test the list \(H\)homomorphisms. Specifically, we show that (i) list \(H\)homomorphisms are testable with a constant number of queries if and only if \(H\) is a reflexive complete graph or an irreflexive complete bipartite graph and (ii) list \(H\)homomorphisms are testable with a sublinear number of queries if and only if \(H\) is a biarc graph.

Cycle and Flow Trusses in Directed Networks. Taro Takaguchi and Yuichi Yoshda. Royal Society Open Science 2016. [arXiv] [code]
When we represent realworld systems as networks, the directions of links often convey valuable information. Finding module structures that respect link directions is one of the most important tasks for analysing directed networks. Although many notions of a directed module have been proposed, no consensus has been reached. This lack of consensus results partly because there might exist distinct types of modules in a single directed network, whereas most previous studies focused on an independent criterion for modules. To address this issue, we propose a generic notion of the socalled truss structures in directed networks. Our definition of truss is able to extract two distinct types of trusses, named the cycle truss and the flow truss, from a unified framework. By applying the method for finding trusses to empirical networks obtained from a wide range of research fields, we find that most real networks contain both cycle and flow trusses. In addition, the abundance of (and the overlap between) the two types of trusses may be useful to characterize module structures in a wide variety of empirical networks. Our findings shed light on the importance of simultaneously considering different types of modules in directed networks.

Testing Assignments to Constraint Satisfaction Problems. Hubie Chen, Matt Valeriote and Yuichi Yoshida. FOCS 2016. [arXiv] [slides]
For a finite relational structure \(A\), let \(\mathrm{CSP}(A)\) denote the CSP instances whose constraint relations are taken from \(A\). The resulting family of problems \(\mathrm{CSP}(A)\) has been considered heavily in a variety of computational contexts. In this article, we consider this family from the perspective of property testing: given an instance of a CSP and query access to an assignment, one wants to decide whether the assignment satisfies the instance, or is far from so doing. While previous work on this scenario studied concrete templates or restricted classes of structures, this article presents comprehensive classification theorems. Our first contribution is a dichotomy theorem completely characterizing the structures \(A\) such that \(\mathrm{CSP}(A)\) is constantquery testable: (i) If \(A\) has a majority polymorphism and a Maltsev polymorphism, then \(\mathrm{CSP}(A)\) is constantquery testable with onesided error. (ii) Else, testing \(\mathrm{CSP}(A)\) requires a superconstant number of queries. Let \(\exists\mathrm{CSP}(A)\) denote the extension of \(\mathrm{CSP}(A)\) to instances which may include existentially quantified variables. Our second contribution is to classify all structures \(A\) in terms of the number of queries needed to test assignments to instances of \(\exists\mathrm{CSP}(A)\), with onesided error. More specifically, we show the following trichotomy (i) If \(A\) has a majority polymorphism and a Maltsev polymorphism, then \(\exists\mathrm{CSP}(A)\) is constantquery testable with onesided error. (ii) Else, if A has a \((k + 1)\)ary nearunanimity polymorphism for some \(k \geq 2\), and no Maltsev polymorphism then \(\exists\mathrm{CSP}(A)\) is not constantquery testable (even with twosided error) but is sublinearquery testable with onesided error. (iii) Else, testing \(\exists\mathrm{CSP}(A)\) with onesided error requires a linear number of queries.

Testing Properties of Functions on Finite Groups. Kenta Oono and Yuichi Yoshida. Random Structures & Algorithms 2016. [arXiv]
We study testing properties of functions on finite groups. First we consider functions of the form math formula, where \(G\) is a finite group. We show that conjugate invariance, homomorphism, and the property of being proportional to an irreducible character is testable with a constant number of queries to \(f\), where a character is a crucial notion in representation theory. Our proof relies on representation theory and harmonic analysis on finite groups. Next we consider functions of the form math formula, where \(d\) is a fixed constant and math formula is the family of \(d\) by \(d\) matrices with each element in math formula. For a function math formula, we show that the unitary isomorphism to \(g\) is testable with a constant number of queries to \(f\), where we say that \(f\) and \(g\) are unitary isomorphic if there exists a unitary matrix \(U\) such that math formula for any math formula.

Halfintegrality, LPbranching and FPT Algorithms. Yoichi Iwata, Magnus Wahlström and Yuichi Yoshida. SIAM Journal on Computing 2016. [arXiv]
A recent trend in parameterized algorithms is the application of polytope tools to fixedparameter tractable (FPT) algorithms [e.g., Cygan et al., 2011; Narayanaswamy et al., 2012]. Although this approach has yielded significant speedups for a range of important problems, it requires the underlying polytope to have very restrictive properties, including halfintegrality and NemhauserTrotterstyle persistence properties. To date, these properties are essentially known to hold only for two classes of polytopes, covering the cases of Vertex Cover [Nemhauser and Trotter, 1975] and Node Multiway Cut [Garg et al., 2004]. Taking a slightly different approach, we view halfintegrality as a discrete relaxation of a problem, e.g., a relaxation of the search space from \(\{0,1\}^V\) to \(\{0,1/2,1\}^V\) such that the new problem admits a polynomialtime exact solution. Using tools from constraint satisfaction problems [in particular Thapper and Živný, 2012] to study the existence of such relaxations, we are able to provide a much broader class of halfintegral polytopes with the required properties. Our results unify and significantly extend the previously known cases, and yield a range of new and improved FPT algorithms, including an \(O^*(\Sigma^{2k})\)time algorithm for nodedeletion Unique Label Cover and an \(O^*(4^k)\)time algorithm for Group Feedback Vertex Set where the group is given by oracle access. The latter result also implies the first singleexponential time FPT algorithm for Subset Feedback Vertex Set, answering an open question of Cygan et al., 2016]. Additionally, we propose a networkflowbased approach to solve several cases of the relaxation problem. This gives the first lineartime FPT algorithm to edgedeletion Unique Label Cover.

Dynamic Influence Analysis in Evolving Networks. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and Kenichi Kawarabayashi. Proceedings of the VLDB Endowment 2016. [slides (by Ohsaka)] [code]
We propose the first realtime fullydynamic index data structure designed for influence analysis on evolving networks. With this aim, we carefully redesign the data structure of the stateoftheart sketching method introduced by Borgs et al., and construct corresponding update algorithms. Using this index, we present algorithms for two kinds of queries, influence estimation and influence maximization, which are strongly motivated by practical applications, such as viral marketing. We provide a thorough theoretical analysis, which guarantees the nondegeneracy of the solution accuracy after an arbitrary number of updates. Furthermore, we introduce a reachabilitytreebased technique and a skipping method, which greatly reduce the time consumption required for edge/vertex deletions and vertex additions, respectively, and counterbased random number generators, which improve the space efficiency. Experimental evaluations using real dynamic networks with tens of millions of edges demonstrate the efficiency, scalability, and accuracy of our proposed indexing scheme. Specifically, it can reflect a graph modification within a time of several orders of magnitude smaller than that required to reconstruct an index from scratch, estimate the influence spread of a vertex set accurately within a millisecond, and select highly influential vertices at least ten times faster than stateoftheart static algorithms.

Efficient Algorithms for Spanning Tree Centrality. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. IJCAI 2016.
In a connected graph, spanning tree centralities of a vertex and an edge measure how crucial they are for the graph to be connected. In this paper, we propose efficient algorithms for estimating spanning tree centralities with theoretical guarantees on their accuracy. We experimentally demonstrate that our methods are orders of magnitude faster than previous methods. Then, we propose a novel centrality notion of a vertex, called aggregated spanning tree centrality, which also considers the number of connected components obtained by removing the vertex. We also give an efficient algorithm for estimating aggregated spanning tree cen trality. Finally, we experimentally show that those spanning tree centralities are useful to identify vulnerable edges and vertices in infrastructure networks.

Maximizing Monotone Submodular Functions over the Integer Lattice. Tasuku Soma and Yuichi Yoshida. IPCO 2016. [arXiv] [slides (by Soma)]
The problem of maximizing nonnegative monotone submodular functions under a certain constraint has been intensively studied in the last decade. In this paper, we address the problem for functions defined over the integer lattice. Suppose that a nonnegative monotone submodular function \(f:\mathbb{Z}^n_+ \to \mathbb{R}_+\) is given via an evaluation oracle. Assume further that \(f\) satisfies the diminishing return property, which is not an immediate consequence of the submodularity when the domain is the integer lattice. Then, we show polynomialtime \((11/e\epsilon)\)approximation algorithm for cardinality constraints, polymatroid constraints, and knapsack constraints. For a cardinality constraint, we also show a \((11/e\epsilon)\)approximation algorithm with slightly worse time complexity that does not rely on the diminishing return property. Our algorithms for a polymatroid constraint and a knapsack constraint first extend the domain of the objective function to the Euclidean space and then run the continuous greedy algorithm. We give two different kinds of continuous extensions, one is for polymatroid constraints and the other is for knapsack constraints, which might be of independent interest.

Nonlinear Laplacian for Digraphs and its Applications to Network Analysis. Yuichi Yoshida. WSDM 2016. [slides] [poster] [code]
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheegerlike inequality, which relates the conductance of a digraph and the smallest nonzero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of realworld networks and obtain encouraging results.

Nonconvex Compressed Sensing with the SumofSquares Method. Tasuku Soma and Yuichi Yoshida. SODA 2016. [slides (by Soma)]
We consider stable signal recovery in \(\ell_q\) quasinorm for \(0 < q \leq 1\). In this problem, given a measurement vector \(y = Ax\) for some unknown signal vector \(x \in \mathbb{R}^n\) and a known matrix \(A \in \mathbb{R}^{m \times n}\), we want to recover \(z \in \mathbb{R}^n\) with \(\x  z\_q = O(\x  x^*\_q)\) from a measurement vector, where \(x^*\) is the ssparse vector closest to \(x\) in \(\ell_q\) quasinorm. Although a small value of \(q\) is favorable for measuring the distance to sparse vectors, previous methods for \(q < 1\) involve \(\ell_q\) quasinorm minimization which is computationally intractable. In this paper, we overcome this issue by using the sumofsquares method, and give the first polynomialtime stable recovery scheme for a large class of matrices \(A\) in \(\ell_q\) quasinorm for any fixed constant \(0 < q \leq 1\).

Gowers Norm, Function Limits, and Parameter Estimation. Yuichi Yoshida. SODA 2016. [arXiv] [slides]
Let \(\{f_i:\mathbb{F}_p^i\to \{0,1\}\}\) be a sequence of functions, where \(p\) is a fixed prime and \(\mathbb{F}_p\) is the finite field of order \(p\). The limit of the sequence can be syntactically defined using the notion of ultralimit. Inspired by the Gowers norm, we introduce a metric over limits of function sequences, and study properties of it. One application of this metric is that it provides a simpler characterization of affineinvariant parameters of functions that are constantquery estimable than the previous one obtained by Yoshida (STOC'14). Using this characterization, we show that the property of being a function of a constant number of lowdegree polynomials and a constant number of factored polynomials (of arbitrary degrees) is constantquery testable if it is closed under blowingup. Examples of this property include the property of having a constant spectral norm and degreestructural properties with rank conditions.

Improved Approximation Algorithms for kSubmodular Function Maximization. Satoru Iwata, Shinichi Tanigawa and Yuichi Yoshida. SODA 2016. [arXiv]
This paper presents a polynomialtime \(1/2\)approximation algorithm for maximizing nonnegative ksubmodular functions. This improves upon the previous \(\max\{1/3, 1/(1 + a)\}\)approximation by Ward and Živný [18], where \(a = \max\{1,\sqrt{(k1)/4}\}\). We also show that for monotone \(k\)submodular functions there is a polynomialtime \(k/(2k – 1)\)approximation algorithm while for any \(\epsilon > 0\) a \(((k + 1)/2k + \epsilon)\)approximation algorithm for maximizing monotone \(k\)submodular functions would require exponentially many queries. In particular, our hardness result implies that our algorithms are asymptotically tight. We also extend the approach to provide constant factor approximation algorithms for maximizing skewbisubmodular functions, which were recently introduced as generalizations of bisubmodular functions.

Coverage Centralities for Temporal Networks. Taro Takaguchi, Yosuke Yano and Yuichi Yoshida. The European Physical Journal B 2016. [arXiv]
Structure of real networked systems, such as social relationship, can be modeled as temporal networks in which each edge appears only at the prescribed time. Understanding the structure of temporal networks requires quantifying the importance of a temporal vertex, which is a pair of vertex index and time. In this paper, we define two centrality measures of a temporal vertex based on the fastest temporal paths which use the temporal vertex. The definition is free from parameters and robust against the change in time scale on which we focus. In addition, we can efficiently compute these centrality values for all temporal vertices. Using the two centrality measures, we reveal that distributions of these centrality values of realworld temporal networks are heterogeneous. For various datasets, we also demonstrate that a majority of the highly central temporal vertices are located within a narrow time window around a particular time. In other words, there is a bottleneck time at which most information sent in the temporal network passes through a small number of temporal vertices, which suggests an important role of these temporal vertices in spreading phenomena.
2015

A Generalization of Submodular Cover via the Diminishing Return Property on the Integer Lattice. Tasuku Soma and Yuichi Yoshida. NIPS 2015. [slides (by Soma) (in Japanese)]
We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a logfactor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly \(O(n\log(nr)\log r)\), where \(n\) is the size of the ground set and \(r\) is the maximum value of a coordinate. The dependency on \(r\) is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.

Monotone kSubmodular Function Maximization with Size Constraints. Naoto Ohsaka and Yuichi Yoshida. NIPS 2015. [poster (by Ohsaka)]
A \(k\)submodular function is a generalization of a submodular function, where the input consists of \(k\) disjoint subsets, instead of a single subset, of the domain. Many machine learning problems, including influence maximization with \(k\) kinds of topics and sensor placement with \(k\) kinds of sensors, can be naturally modeled as the problem of maximizing monotone \(k\)submodular functions.In this paper, we give constantfactor approximation algorithms for maximizing monotone \(k\)submodular functions subject to several size constraints.The running time of our algorithms are almost linear in the domain size.We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.

Testing Outerplanarity of Bounded Degree Graphs. Hiro Ito and Yuichi Yoshida. Algorithmica 2015.
We present an efficient algorithm for testing outerplanarity of graphs in the boundeddegree model. In this model, given a graph \(G\) with degree bound \(d\), we should distinguish with high probability the case that \(G\) is outerplanar from the case that modifying at least an \(\epsilon\)fraction of the edge set is necessary to make \(G\) outerplanar. Our algorithm runs in time polynomial in \(d\) and \(1/\epsilon\) only. To achieve the time complexity, we exploit the treelike structure inherent to an outerplanar graph using the microtree/macrotree decomposition of a tree. As a corollary, we show that the property of being a cactus is testable in time polynomial in \(d\) and \(1/\epsilon\).

Fully Dynamic Betweenness Centrality Maintenance on Massive Networks. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. Proceedings of the VLDB Endowment 2015. [code]
Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Among several importance measures, betweenness centrality, in particular, plays key roles in many real applications. Considerable effort has been made for developing algorithms for static settings. However, real networks today are highly dynamic and are evolving rapidly, and scalable dynamic methods that can instantly reflect graph changes into centrality values are required. In this paper, we present the first fully dynamic method for managing betweenness centrality of all vertices in a large dynamic network. Its main data structure is the weighted hyperedge representation of shortest paths called hypergraph sketch. We carefully design dynamic update procedure with theoretical accuracy guarantee. To accelerate updates, we further propose two auxiliary data structures called twoball index and specialpurpose reachability index. Experimental results using real networks demonstrate its high scalability and efficiency. In particular, it can reflect a graph change in less than a millisecond on average for a largescale web graph with 106M vertices and 3.7B edges, which is several orders of magnitude larger than the limits of previous dynamic methods.

A Query Efficient NonAdaptive Long Code Test with Perfect Completeness. Suguru Tamaki and Yuichi Yoshida. Random Structures & Algorithms 2015.
Long Code testing is a fundamental problem in the area of property testing and hardness of approximation. In this paper, we study how small the soundness s of the Long Code test with perfect completeness can be by using nonadaptive \(q\) queries. We show that \(s = (2q + 3)/2^q\) is possible, where \(q\) is of the form \(2^k1\) for any integer \(k > 2\).

On the Equivalence among Problems of Bounded Width. Yoichi Iwata and Yuichi Yoshida. ESA 2015. [arXiv]
In this paper, we introduce a methodology, called decompositionbased reductions, for showing the equivalence among various problems of boundedwidth. First, we show that the following are equivalent for any \(α > 0\):
 SAT can be solved in \(O^*(2^{\alpha \mathrm{tw}})\) time,
 3SAT can be solved in \(O^*(2^{\alpha \mathrm{tw}})\) time,
 Max 2SAT can be solved in \(O^*(2^{\alpha\mathrm{tw}})\) time,
 Independent Set can be solved in \(O^*(2^{\alpha\mathrm{tw}})\) time, and
 Independent Set can be solved in \(O^*(2^{\alpha\mathrm{cw}})\) time,
where \(\mathrm{tw}\) and \(\mathrm{cw}\) are the treewidth and cliquewidth of the instance, respectively. Then, we introduce a new parameterized complexity class EPNL, which includes Set Cover and TSP, and show that SAT, 3SAT, Max 2SAT, and Independent Set parameterized by pathwidth are EPNLcomplete. This implies that if one of these \textup{EPNL}complete problems can be solved in \(O^*(c^k)\) time, then any problem in \textup{EPNL} can be solved in \(O^*(c^k)\) time.

Partially Symmetric Functions are Efficiently IsomorphismTestable. Eric Blais, Amit Weinstein and Yuichi Yoshida. SIAM Journal on Computing 2015. [arXiv] [slides (by Weinstein)] [video (talk by Weinstein)]
Given a Boolean function \(f\), the \(f\)isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to \(f\) up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions \(f\) we can test \(f\)isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functionsfunctions invariant to the reordering of all but a constant number of their variablesare efficiently isomorphism testable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphismtestable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearly optimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testingbyimplicitlearning approach to complete the proof that partially symmetric functions are efficiently isomorphism testable.

Testing Supermodularcut Condition. Shinichi Tanigawa and Yuichi Yoshida. Algorithmica 2015.
For a function \(f:2^V \to \mathbb{Z}_+\) on a finite set \(V\) with \(f(\emptyset)=f(V)=0\), a digraph \(D=(V,A)\) is called \(f\)connected if it satisfies the \(f\)cut condition, that is, \(\delta_D(X) \geq f(X)\) for any \(X \subseteq V\), where \(\delta_D(X)\) is the number of arcs from \(X\) to \(V \setminus X\). We show that, for any crossing supermodular function \(f\), the \(f\)connectivity can be tested with a constant number of queries in the general digraph model with average degree bound. As immediate corollaries, we obtain constanttime testers for \(k\)edgeconnectivity, rooted\((k,l)\)edgeconnectivity, and the property of having \(k\) arcdisjoint arborescences. We also give a corresponding result for the undirected case.

Generalized River Crossing Problems. Hiro Ito, Stefan Langerman and Yuichi Yoshida. Theory of Computing Systems 2015.
Three men, each with a sister, must cross a river using a boat that can carry only two people in such a way that a sister is never left in the company of another man if her brother is not present. This very famous problem appeared in the Latin book “Problems to Sharpen the Young,” one of the earliest collections of recreational mathematics. This paper considers a generalization of such “river crossing problems” and provides a new formulation that can treat wide variations. The main result is that, if there is no upper bound on the number of transportations (river crossings), a large class of subproblems can be solved in polynomial time even when the passenger capacity of the boat is arbitrarily large. The authors speculated this fact at FUN 2012. On the other hand, this paper also demonstrates that, if an upper bound on the number of transportations is given, the problem is NPhard even when the boat capacity is three, although a large class of subproblems can be solved in polynomial time if the boat capacity is two.

Learning Word Representations from Relational Graphs. Danushka Bollegala, Takanori Maehara, Yuichi Yoshida and Kenichi Kawarabayashi. AAAI 2015. [arXiv]
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are interconnected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the cooccurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words cooccur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.

Distributed Multiplicative Weights Methods for DCOP. Daisuke Hatano and Yuichi Yoshida. AAAI 2015.
We introduce a new framework for solving distributed constraint optimization problems that extend the domain of each variable into a simplex.We propose two methods for searching the extended domain for good assignments.The first one relaxes the problem using linear programming, finds the optimum LP solution, and rounds it to an assignment.The second one plays a costminimization game, finds a certain kind of equilibrium, and rounds it to an assignment.Both methods are realized by performing the multiplicative weights method in a distributed manner.We experimentally demonstrate that our methods have good scalability,and in particular, the second method outperforms existing algorithms in terms of solution quality and efficiency.

Efficient Topk ShortestPath Distance Queries on Large Networks by Pruned Landmark Labeling. Takuya Akiba, Takanori Hayashi, Nozomi Nori, Yoichi Iwata and Yuichi Yoshida. AAAI 2015.
We propose an indexing scheme for top\(k\) shortestpath distance queries on graphs, which is useful in a wide range of important applications such as networkaware search and link prediction. While considerable effort has been made for efficiently answering standard (top1) distance queries, none of previous methods can be directly extended for topk distance queries. We propose a new framework for top\(k\) distance queries based on 2hop cover and then present an efficient indexing algorithm based on the simple but effective recent notion of pruned landmark labeling. Extensive experimental results on real social and web graphs show the scalability, efficiency and robustness of our method. Moreover, we demonstrate the usefulness of top\(k\) distance queries through an application to link prediction.
2014

Robust Approximation of Temporal CSP. Suguru Tamaki and Yuichi Yoshida. APPROX 2014.
A temporal constraint language \(\Gamma\) is a set of relations with firstorder definitions in \((\mathbb{Q};<)\). Let \(\mathrm{CSP}(\Gamma)\) denote the set of constraint satisfaction problem instances with relations from \(\Gamma\). \(\mathrm{CSP}(\Gamma)\) admits robust approximation if, for any \(\varepsilon \geq 0\), given a \((1\varepsilon)\)satisfiable instance of \(\mathrm{CSP}(\Gamma)\), we can compute an assignment that satisfies at least a \((1f(\varepsilon))\)fraction of constraints in polynomial time. Here, \(f(\varepsilon)\) is some function satisfying \(f(0)=0\) and \(\lim_{\varepsilon \to 0}f(\varepsilon)=0\).
Firstly, we give a qualitative characterization of robust approximability: Assuming the Unique Games Conjecture, we give a necessary and sufficient condition on \(\Gamma\) under which \(\mathrm{CSP}(\Gamma)\) admits robust approximation. Secondly, we give a quantitative characterization of robust approximability: Assuming the Unique Games Conjecture, we precisely characterize how \(f(\varepsilon)\) depends on \(\varepsilon\) for each \(\Gamma\). We show that our robust approximation algorithms can be run in almost linear time.

Almost LinearTime Algorithms for Adaptive Betweenness Centrality using Hypergraph Sketches. Yuichi Yoshida. KDD 2014. [slides] [code]
Betweenness centrality measures the importance of a vertex by quantifying the number of times it acts as a midpoint of the shortest paths between other vertices. This measure is widely used in network analysis. In many applications, we wish to choose the \(k\) vertices with the maximum adaptive betweenness centrality, which is the betweenness centrality without considering the shortest paths that have been taken into account by alreadychosen vertices.
All previous methods are designed to compute the betweenness centrality in a fixed graph. Thus, to solve the above task, we have to run these methods \(k\) times. In this paper, we present a method that directly solves the task, with an almost linear runtime no matter how large the value of \(k\). Our method first constructs a hypergraph that encodes the betweenness centrality, and then computes the adaptive betweenness centrality by examining this graph. Our technique can be utilized to handle other centrality measures.
We theoretically prove that our method is very accurate, and experimentally confirm that it is three orders of magnitude faster than previous methods. Relying on the scalability of our method, we experimentally demonstrate that strategies based on adaptive betweenness centrality are effective in important applications studied in the network science and database communities.

Suppressing Epidemics on Networks by Exploiting Observer Nodes. Taro Takaguchi, Takehisa Hasegawa and Yuichi Yoshida. Physical Review E 2014. [arXiv]
To control infection spreading on networks, we investigate the effect of observer nodes that recognize infection in a neighboring node and make the rest of the neighbor nodes immune. We numerically show that random placement of observer nodes works better on networks with clustering than on locally treelike networks, implying that our model is promising for realistic social networks. The efficiency of several heuristic schemes for observer placement is also examined for synthetic and empirical networks. In parallel with numerical simulations of epidemic dynamics, we also show that the effect of observer placement can be assessed by the size of the largest connected component of networks remaining after removing observer nodes and links between their neighboring nodes.

Fast and Accurate Influence Maximization on Large Networks with Pruned MonteCarlo Simulations. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and KenIchi Kawarabayashi. AAAI 2014. [slides (by Ohsaka)] [code]
Influence maximization is a problem to find small sets of highly influential individuals in a social network to maximize the spread of influence under stochastic cascade models of propagation. Although the problem has been wellstudied, it is still highly challenging to find solutions of high quality in largescale networks of the day. While MonteCarlosimulationbased methods produce nearoptimal solutions with a theoretical guarantee, they are prohibitively slow for large graphs. As a result, many heuristic methods without any theoretical guarantee have been developed, but all of them substantially compromise solution quality. To address this issue, we propose a new method for the influence maximization problem. Unlike other recent heuristic methods, the proposed method is a MonteCarlosimulationbased method, and thus it consistently produces solutions of high quality with the theoretical guarantee. On the other hand, unlike other previous MonteCarlosimulationbased methods, it runs as fast as other stateoftheart methods, and can be applied to large networks of the day. Through our extensive experiments, we demonstrate the scalability and the solution quality of the proposed method.

Testing ForestIsomorphism in the Adjacency List Model. Mitsuru Kusumoto and Yuichi Yoshida. ICALP 2014. [arXiv] [slides (by Kusumoto)]
We consider the problem of testing if two input forests are isomorphic or are far from being so. An algorithm is called an \(\epsilon\)tester for forestisomorphism if given an oracle access to two forests \(G\) and \(H\) in the adjacency list model, with high probability, accepts if \(G\) and \(H\) are isomorphic and rejects if we must modify at least \(\epsilon n\) edges to make \(G\) isomorphic to \(H\). We show an \(\epsilon\)tester for forestisomorphism with a query complexity \(\mathrm{polylog}(n)\) and a lower bound of \(\Omega(\sqrt{\log n/\log \log n})\). Further, with the aid of the tester, we show that every graph property is testable in the adjacency list model with \(\mathrm{polylog}(n)\) queries if the input graph is a forest.

Generalized Skew Bisubmodularity: A Characterization and a MinMax Theorem. Satoru Fujishige, Shinichi Tanigawa and Yuichi Yoshida. Discrete Optimization 2014.
Huber, Krokhin, and Powell (2013) introduced a concept of skew bisubmodularity, as a generalization of bisubmodularity, in their complexity dichotomy theorem for valued constraint satisfaction problems over the threevalue domain. In this paper we consider a natural generalization of the concept of skew bisubmodularity and show a connection between the generalized skew bisubmodularity and a convex extension over rectangles. We also analyze the dual polyhedra, called skew bisubmodular polyhedra, associated with generalized skew bisubmodular functions and derive a minmax theorem that characterizes the minimum value of a generalized skew bisubmodular function in terms of a minimumnorm point in the associated skew bisubmodular polyhedron.

A Characterization of Locally Testable AffineInvariant Properties via Decomposition Theorems. Yuichi Yoshida. STOC 2014. [arXiv] [slides]
Let \(P\) be a property of function \(\mathbb{F}^n_p \to \{0, 1\}\) for a fixed prime \(p\). An algorithm is called a tester for \(P\) if, given a query access to the input function \(f\), with high probability, it accepts when \(f\) satisfies \(P\) and rejects when \(f\) is "far" from satisfying \(P\). In this paper, we give a characterization of affineinvariant properties that are (twosided error) testable with a constant number of queries. The characterization is stated in terms of decomposition theorems, which roughly claim that any function can be decomposed into a structured part that is a function of a constant number of polynomials, and a pseudorandom part whose Gowers norm is small. We first give an algorithm that tests whether the structured part of the input function has a specific form. Then we show that an affineinvariant property is testable with a constant number of queries if and only if it can be reduced to the problem of testing whether the structured part of the input function is close to one of a constant number of candidates.

Dynamic and Historical ShortestPath Distance Queries on Large Evolving Networks by Pruned Landmark Labeling. Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. WWW 2014.
We propose two dynamic indexing schemes for shortestpath and distance queries on large timeevolving graphs, which are useful in a wide range of important applications such as realtime networkaware search and network evolution analysis. To the best of our knowledge, these methods are the first practical exact indexing methods to efficiently process distance queries and dynamic graph updates.
We first propose a dynamic indexing scheme for queries on the last snapshot. The scalability and efficiency of its offline indexing algorithm and query algorithm are competitive even with previous static methods. Meanwhile, the method is dynamic, that is, it can incrementally update indices as the graph changes over time. Then, we further design another dynamic indexing scheme that can also answer two kinds of historical queries with regard to not only the latest snapshot but also previous snapshots.
Through extensive experiments on real and synthetic evolving networks, we show the scalability and efficiency of our methods. Specifically, they can construct indices from large graphs with millions of vertices, answer queries in microseconds, and update indices in milliseconds.

Approximation Schemes via SheraliAdams Hierarchy for Dense Constraint Satisfaction Problems and Assignment Problems. Yuichi Yoshida and Yuan Zhou. ITCS 2014.
We consider approximation schemes for the maximum constraint satis faction problems and the maximum assignment problems. Though they are NPHard in general, if the instance is “dense” or “locally dense”, then they are known to have approximation schemes that run in polynomial time or quasipolynomial time. In this paper, we give a unified method of showing these approximation schemes based on the SheraliAdams linear programming relaxation hierarchy. We also use our linear programmingbased framework to show new algorithmic results on the optimization version of the hypergraph isomorphism problem.

Parameterized Testability. Kazuo Iwama and Yuichi Yoshida. ITCS 2014. [slides]
This paper studies property testing for NP optimization problems with parameter \(k\) under the general graph model with an augmentation of random edge sampling capability. It is shown that a variety of such problems, including \(k\)Vertex Cover, \(k\)Feedback Vertex Set, \(k\)Multicut, \(k\)pathfreeness and \(k\)Dominating Set, are constanttime testable if \(k\) is constant. It should be noted that the first four problems are fixed parameter tractable (FPT) and it turns out that algorithmic techniques for their FPT algorithms (branchandbound search, color coding, etc.) are also useful for our testers. \(k\)Dominating Set is \(W[2]\)hard, but we can still test the property in constant time since the definition of \(\epsilon\)farness makes the problem trivial for nonsparse graphs that are the source of hardness for the original optimization problem. We also consider \(k\)Odd Cycle Transversal, which is another wellknown FPT problem, but we only give a sublineartime tester when \(k\) is a constant.

Linear Time FPT Algorithms via Network Flow. Yoichi Iwata, Keigo Oka and Yuichi Yoshida. SODA 2014. [arXiv] [video (talk by Iwata)]
In the area of parameterized complexity, to cope with NPHard problems, we introduce a parameter \(k\) besides the input size \(n\), and we aim to design algorithms (called FPT algorithms) that run in \(O(f(k)n^d)\) time for some function \(f(k)\) and constant \(d\). Though FPT algorithms have been successfully designed for many problems, typically they are not sufficiently fast because of huge \(f(k)\) and \(d\).
In this paper, we give FPT algorithms with small \(f(k)\) and d for many important problems including Odd Cycle Transversal and Almost 2SAT. More specifically, we can choose \(f(k)\) as a single exponential \((4^k)\) and d as one, that is, linear in the input size. To the best of our knowledge, our algorithms achieve linear time complexity for the first time for these problems.
To obtain our algorithms for these problems, we consider a large class of integer programs, called BIP2. Then we show that, in linear time, we can reduce BIP2 to Vertex Cover Above LP preserving the parameter \(k\), and we can compute an optimal LP solution for Vertex Cover Above LP using network flow.
Then, we perform an exhaustive search by fixing halfintegral values in the optimal LP solution for Vertex Cover Above LP. A bottleneck here is that we need to recompute an LP optimal solution after branching. To address this issue, we exploit network flow to update the optimal LP solution in linear time.
2013

Property Testing for Cyclic Groups and Beyond. Francois Le Gall and Yuichi Yoshida. Journal of Combinatorial Optimization 2013.
This paper studies the problem of testing if an input \((\Gamma,\circ)\), where \(\Gamma\) is a finite set of unknown size and \(\circ\) is a binary operation over \(\Gamma\) given as an oracle, is close to a specified class of groups. Friedl et al. (Proc. of STOC, 2005) have constructed an efficient tester using \(\mathrm{poly}(\log\Gamma)\) queries for the class of abelian groups. We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: \(\Omega(\Gamma^{1/6})\) queries are necessary to test if the input is close to a cyclic group, and \(\Omega(\Gamma^c)\) queries for some constant c are necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed integer k≥1. We also show that knowledge of the size of the ground set \(\Gamma\) helps only for \(k=1\), in which case we construct an efficient tester using \(\mathrm{poly}(\log\Gamma)\) queries; for any other value \(k\geq 2\) the query complexity remains \(\Omega(\Gamma^c)\). All our upper and lower bounds hold for both the edit distance and the Hamming distance. These are, to the best of our knowledge, the first nontrivial lower bounds for such grouptheoretic problems in the property testing model and, in particular, they imply the first exponential separations between the classical and quantum query complexities of testing closeness to classes of groups.

Semistrong Coloring of Intersecting Hypergraphs. Eric Blais, Amit Weinstein and Yuichi Yoshida. Combinatorics Probability and Computing 2013.
For any \(c ≥ 2\), a \(c\)strong colouring of the hypergraph \(G\) is an assignment of colours to the vertices of \(G\) such that, for every edge e of \(G\), the vertices of e are coloured by at least \(\min{c,e}\) distinct colours. The hypergraph \(G\) is \(t\)intersecting if every two edges of \(G\) have at least \(t\) vertices in common.
A natural variant of a question of Erdős and Lovász is: For fixed \(c \geq 2\) and \(t \geq 1\), what is the minimum number of colours that is sufficient to \(c\)strong colour any \(t\)intersecting hypergraphs? The purpose of this note is to describe some open problems related to this question.

Fast and Scalable Reachability Queries on Graphs by Pruned Labeling with Landmarks and Paths. Yosuke Yano, Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. CIKM 2013. [slides (by Yano)] [code]
Answering reachability queries on directed graphs is ubiquitous in many applications involved with graphshaped data as one of the most fundamental and important operations. However, it is still highly challenging to efficiently process them on largescale graphs. Transitiveclosurebased methods consume prohibitively large index space, and onlinesearchbased methods answer queries too slowly. Labelingbased methods attain both small index size and query time, but previous indexing algorithms are not scalable at all for processing large graphs of the day.
In this paper, we propose new labelingbased methods for reachability queries, referred to as pruned landmark labeling and pruned path labeling. They follow the frameworks of 2hop cover and 3hop cover, but their indexing algorithms are based on the recent notion of pruned labeling and improve the indexing time by several orders of magnitude, resulting in applicability to large graphs with tens of millions of vertices and edges. Our experimental results show that they attain remarkable tradeoffs between fast query time, small index size and scalability, which previous methods have never been able to achieve. Furthermore, we also discuss the ingredients of the efficiency of our methods by a novel theoretical analysis based on the graph minor theory.

LinearTime Enumeration of Maximal kEdgeConnected Subgraphs in Large Networks by Random Contraction. Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. CIKM 2013.
Capturing sets of closely related vertices from large networks is an essential task in many applications such as social network analysis, bioinformatics, and web link research. Decomposing a graph into \(k\)core components is a standard and efficient method for this task, but obtained clusters might not be wellconnected. The idea of using maximal \(k\)edgeconnected subgraphs was recently proposed to address this issue. Although we can obtain better clusters with this idea, the stateoftheart method is not efficient enough to process large networks with millions of vertices.
In this paper, we propose a new method to decompose a graph into maximal \(k\)edgeconnected components, based on random contraction of edges. Our method is simple to implement but improves performance drastically. We experimentally show that our method can successfully decompose large networks and it is thousands times faster than the previous method. Also, we theoretically explain why our method is efficient in practice. To see the importance of maximal \(k\)edgeconnected subgraphs, we also conduct experiments using realworld networks to show that many \(k\)core components have small edgeconnectivity and they can be decomposed into a lot of maximal \(k\)edgeconnected subgraphs.

Mining for Analogous Tuples from an EntityRelation Graph. Danushka Bollegala, Mitsuru Kushimoto, Yuichi Yoshida and Kenichi Kawarabayashi. IJCAI 2013.
The ability to recognize analogies is an important factor that is closely related to human intelligence. Verbal analogies have been used for evaluating both examinees at university entrance exams as well as algorithms for measuring relational similarity. However, relational similarity measures proposed so far are confined to measuring the similarity between pairs of words. Unfortunately, such pairwise approaches ignore the rich relational structure that exists in realworld knowledge bases containing millions of entities and semantic relations. We propose a method to efficiently identify analogous entity tuples from a given entityrelation graph. First, we present an efficient approach for extracting potential analogous tuples from a given entityrelation graph. Second, to measure the structural similarity between two tuples, we propose two types of kernel functions: vertexfeature kernels, and edgefeature kernels. Moreover, we combine those kernels to construct composite kernels that simultaneously consider both vertex and edge features. Experimental results show that our proposed method accurately identifies analogous tuples and significantly outperforms a stateoftheart pairwise relational similarity measure, extended to tuples.

ConstantTime Approximation Algorithms for the Optimum Branching Problem on Sparse Graphs. Mitsuru Kusumoto, Yuichi Yoshida and Hiro Ito. International Journal of Networking and Computing 2013.
We propose a constanttime algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edgeweighted digraph \(G\) of average degree \(d\) whose weights are real values in \([0, 1]\) is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in \(G\) with an absolute error of at most εn with query complexity \(O(d/\epsilon^3)\), where \(n\) is the number of vertices. We also show a lower bound of \(\Omega(d/\epsilon^2)\). Additionally, our algorithm can be modified to run with query complexity \(O(1/\epsilon^4)\) for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with \(d = \Omega(n)\) edges. In contrast, we show that it requires \(\Omega(n)\) queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.

Testing LinearInvariant Function Isomorphism. Karl Wimmer and Yuichi Yoshida. ICALP 2013.
A function \(f:\mathbb{F}^n_2 \to \{1,1\}\) is called linearisomorphic to \(g\) if \(f = g \circ A\) for some nonsingular matrix \(A\). In the \(g\)isomorphism problem, we want a randomized algorithm that distinguishes whether an input function \(f\) is linearisomorphic to \(g\) or far from being so.
We show that the query complexity to test \(g\)isomorphism is essentially determined by the spectral norm of \(g\). That is, if \(g\) is close to having spectral norm \(s\), then we can test gisomorphism with \(\mathrm{poly}(s)\) queries, and if \(g\) is far from having spectral norm \(s\), then we cannot test \(g\)isomorphism with \(o(\log s)\) queries. The upper bound is almost tight since there is indeed a function \(g\) close to having spectral norm s whereas testing \(g\)isomorphism requires \(\Omega(s)\) queries. As far as we know, our result is the first characterization of this type for functions. Our upper bound is essentially the KushilevitzMansour learning algorithm, modified for use in the implicit setting.
Exploiting our upper bound, we show that any property is testable if it can be wellapproximated by functions with small spectral norm. We also extend our algorithm to the setting where \(A\) is allowed to be singular.

An Algebraic Characterization of Testable CSPs. Arnab Bhattacharyya and Yuichi Yoshida. ICALP 2013. [slides]
Given an instance \(\mathcal{I}\) of a CSP, a tester for \(\mathcal{I}\) distinguishes assignments satisfying \(\mathcal{I}\) from those which are far from any assignment satisfying \(\mathcal{I}\). The efficiency of a tester is measured by its query complexity, the number of variable assignments queried by the algorithm. In this paper, we characterize the hardness of testing Boolean CSPs in terms of the algebra generated by the relations used to form constraints. In terms of computational complexity, we show that if a nontrivial Boolean CSP is sublinearquery testable (resp., not sublinearquery testable), then the CSP is in NL (resp., Pcomplete, ⊕Lcomplete or NLcomplete) and that if a sublinearquery testable Boolean CSP is constantquery testable (resp., not constantquery testable), then counting the number of solutions of the CSP is in P (resp., #Pcomplete).
Also, we conjecture that a CSP instance is testable in sublinear time if its Gaifman graph has bounded treewidth. We confirm the conjecture when a nearunanimity operation is a polymorphism of the CSP.

Fast Exact ShortestPath Distance Queries on Large Networks by Pruned Landmark Labeling. Takuya Akiba, Yoichi Iwata and Yuichi Yoshida. SIGMOD 2013. [arXiv]
We propose a new exact method for shortestpath distance queries on largescale networks. Our method precomputes distance labels for vertices by performing a breadthfirst search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadthfirst searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadthfirst searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of largescale realworld networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

Testing SubdivisionFreeness: – Property Testing Meets Structural Graph Theory –. Kenichi Kawarabayashi and Yuichi Yoshida. STOC 2013.
Testing a property \(P\) of graphs in the boundeddegree model deals with the following problem: given a graph \(G\) of bounded degree \(d\), we should distinguish (with probability \(2/3\), say) between the case that \(G\) satisfies \(P\) and the case that one should add/remove at least \(\epsilon dn\) edges of \(G\) to make it satisfy \(P\). In sharp contrast to property testing of dense graphs, which is relatively well understood, only few properties are known to be testable with a constant number of queries in the boundeddegree model. In particular, no global monotone (i.e,_{closed} under edge deletions) property that expander graphs can satisfy has been shown to be testable in constant time so far.
In this paper, we identify for the first time a natural family of global monotone property that expander graphs can satisfy and can be efficiently tested in the bounded degree model. Specifically, we show that, for any integer \(t \geq 1\), \(K_t\)subdivisionfreeness is testable with a constant number of queries in the boundeddegree model. This property was not previously known to be testable even with \(o(n)\) queries. Note that an expander graph with all degree less than \(t1\) does not have a \(K_t\)subdivision.
The proof is based on a novel combination of some results that develop the framework of partitioning oracles, together with structural graph theory results that develop the seminal graph minor theory by Robertson and Seymour. As far as we aware, this is the first result that bridges property testing and structural graph theory. Although we know a rough structure for graphs without \(H\)minors from the famous graph minor theory by Robertson and Seymour, there is no corresponding structure theorem for graphs without \(H\)subdivisions so far, even \(K_5\)subdivisionfree graphs. Therefore, subdivisions and minors are very different in a graph structural sense.

Exact and Approximation Algorithms for the Constraint Satisfaction Problem over the Point Algebra. Iwata Yoichi and Yuichi Yoshida. STACS 2013.
We study the constraint satisfaction problem over the point algebra. In this problem, an instance consists of a set of variables and a set of binary constraints of forms \((x < y)\), \((x \leq y)\), \((x \neq y)\) or \((x = y)\). Then, the objective is to assign integers to variables so as to satisfy as many constraints as possible.This problem contains many important problems such as Correlation Clustering, Maximum Acyclic Subgraph, and Feedback Arc Set. We first give an exact algorithm that runs in \(O^*(3^{\frac{\log 5}{\log 6}n})\) time, which improves the previous best \(O^*(3^n)\) obtained by a standard dynamic programming. Our algorithm combines the dynamic programming with the splitandlist technique. The splitandlist technique involves matrix products and we make use of sparsity of matrices to speed up the computation. As for approximation, we give a \(0.4586\)approximation algorithm when the objective is maximizing the number of satisfied constraints, and give an \(O(\log n \log \log n)\)approximation algorithm when the objective is minimizing the number of unsatisfied constraints.
2012

ConstantTime Approximation Algorithms for the Optimum Branching Problem on Sparse Graphs. Mitsuru Kusumoto, Yuichi Yoshida and Hiro Ito. ICNC 2012.
We propose a constanttime algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edgeweighted digraph \(G\) of average degree \(d\) whose weights are real values in \([0, 1]\) is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in \(G\) with an absolute error of at most εn with query complexity \(O(d/\epsilon^3)\), where \(n\) is the number of vertices. We also show a lower bound of \(\Omega(d/\epsilon^2)\). Additionally, our algorithm can be modified to run with query complexity \(O(1/\epsilon^4)\) for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with \(d = \Omega(n)\) edges. In contrast, we show that it requires \(\Omega(n)\) queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.

Partially Symmetric Functions are Efficiently IsomorphismTestable. Eric Blais, Amit Weinstein and Yuichi Yoshida. FOCS 2012. [arXiv] [slides (by Weinstein)] [video (talk by Weinstein)]
Given a Boolean function \(f\), the \(f\)isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to \(f\) up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions \(f\) we can test \(f\)isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functions  functions invariant to the reordering of all but a constant number of their variables  are efficiently isomorphismtestable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphismtestable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearlyoptimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testingbyimplicitlearning approach to complete the proof that partially symmetric functions are efficiently isomorphismtestable.

Approximation Guarantees for the Minimum Linear Arrangement Problem by Higher Eigenvalues. Suguru Tamaki and Yuichi Yoshida. APPROX 2012.
Given an undirected graph \(G = (V,E)\) and positive edge weights \(\{w_e\}_{e \in E}\), a linear arrangement is a permutation \(\pi: V \to [n]\). The value of the arrangement is \(\mathrm{val}(G,\pi):=\frac{1}{n}\sum_{e=\{u,v\}}w_e\pi(u)\pi(v)\). In the minimum linear arrangement problem (MLA), the goal is to find a linear arrangement \(\pi^*\) that achieves \(\mathrm{val}(G,\pi^*) = \mathrm{MLA}(G): = \min_{\pi}\mathrm{val}(G,\pi)\).
In this paper, we show that for any \(\epsilon > 0\) and positive integer \(r\), there is an \(O(nr/\epsilon)\)time randomized algorithm which, given a graph \(G\), returns a permutation \(\pi\) such that \[ \mathrm{val}(G,\pi) \leq \left(1 + \frac{2}{(1\epsilon)\lambda_{r+1}(\mathcal{L})}\right)\mathrm{MLA}(G) + O\left(\frac{\log n}{\sqrt{n}}\sum_{e \in E}w_e\right) \] with high probability. Here \(\mathcal{L}\) is the normalized Laplacian of \(G\) and \(\lambda_r(\mathcal{L})\) is the \(r\)th eigenvalue of \(\mathcal{L}\). Our algorithm gives a constant factor approximation for regular graphs that are weak expanders.

ConstantTime Algorithms for Sparsity Matroids. Hiro Ito, Shinichi Tanigawa and Yuichi Yoshida. ICALP 2012. [slides]
A graph \(G = (V, E)\) is called \((k, \ell)\)sparse if \(F ≤ kV(F) \ell\) for any \(F \subseteq E\) with \(F \neq \emptyset\). Here, \(V(F)\) denotes the set of vertices incident to \(F\). A graph \(G = (V,E)\) is called \((k,\ell)\)full if \(G\) contains a \((k,\ell)\)sparse subgraph with \(V\) vertices and \(kV  \ell\) edges. The family of edge sets of \((k,\ell)\)sparse subgraphs forms a family of independent sets of a matroid on \(E\), known as the sparsity matroid of \(G\). In this paper, we give a constanttime algorithm that approximates the rank of the sparsity matroid associated with a degreebounded undirected graph. This algorithm leads to a constanttime tester for \((k,\ell)\)fullness in the boundeddegree model, (i.e., we can decide with high probability whether the input graph satisfies a property or far from it). Depending on the values of \(k\) and \(\ell\), our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed in a unified manner.
Based on this result, we also propose a constanttime tester for \((k,\ell)\)edgeconnectedorientability in the boundeddegree model, where an undirected graph \(G\) is called \((k,\ell)\)edgeconnectedorientable if there exists an orientation \(\vec{G}\) of \(G\) with a vertex \(r\in V\) such that \(\vec{G}\) contains \(k\) arcdisjoint dipaths from r to each vertex \(v \in V\) and \(\ell\) arcdisjoint dipaths from each vertex \(v \in V\) to \(r\).
A tester is called a onesided error tester for \(P\) if it always accepts a graph satisfying \(P\). We show, for any \(k \geq 2\) and (proper) \(\ell \geq 0\), every onesided error tester for \((k,\ell)\)fullness and \((k,\ell)\)edgeconnectedorientability requires \(\Omega(n)\) queries.

Testing List HHomomorphisms. Yuichi Yoshida. CCC 2012. [slides]
Let \(H\) be an undirected graph. In the List \(H\)Homomorphism Problem, given an undirected graph \(G\) with a list constraint \(L(v) \subseteq V(H)\) for each variable \(v∈V(G)\), the objective is to find a list \(H\)homomorphism \(f:V(G)\to V(H)\), that is, \(f(v)\in L(v)\) for every \(v\in V(G)\) and \((f(u),f(v)) \in E(H)\) whenever \((u,v)\in E(G)\). We consider the following problem: given a map \(f:V(G) \to V(H)\) as an oracle access, the objective is to decide with high probability whether \(f\) is a list \(H\)homomorphism or \textit{far} from any list \(H\)homomorphisms. The efficiency of an algorithm is measured by the number of accesses to \(f\). In this paper, we classify graphs \(H\) with respect to the query complexity for testing list \(H\)homomorphisms and show the following trichotomy holds: (i) List \(H\)homomorphisms are testable with a constant number of queries if and only if \(H\) is a reflexive complete graph or an irreflexive complete bipartite graph. (ii) List \(H\)homomorphisms are testable with a sublinear number of queries if and only if \(H\) is a biarc graph. (iii) Testing list \(H\)homomorphisms requires a linear number of queries if \(H\) is not a biarc graph.

ConstantTime Approximation Algorithms for the Knapsack Problem. Hiro Ito, Susumu Kiyoshima and Yuichi Yoshida. TAMC 2012.
In this paper, we give a constanttime approximation algorithm for the knapsack problem. Using weighted sampling, with which we can sample items with probability proportional to their profits, our algorithm runs with query complexity \(O(\epsilon^{ 4} \log \epsilon^{1})\), and it approximates the optimal profit with probability at least \(2/3\) up to error at most an \(\epsilon\)fraction of the total profit. For the subset sum problem, which is a special case of the knapsack problem, we can improve the query complexity to \(O(\epsilon^{1} \log \epsilon^{1})\).

Algorithms and Complexity of Generalized River Crossing Problems. Stefan Langerman, Hiro Ito and Yuichi Yoshida. FUN 2012.
Three men, each with a sister, must cross a river using a boat which can carry only two people, so that a woman whose brother is not present is never left in the company of another man. This is a very famous problem appeared in Latin book “Problems to Sharpen the Young,” one of the earliest collections on recreational mathematics. This paper considers a generalization of such “RiverCrossing Problems.” It shows that the problem is NPhard if the boat size is three, and a large class of subproblems can be solved in polynomial time if the boat size is two. It’s also conjectured that determining whether a river crossing problem has a solution without bounding the number of transportations, can be solved in polynomial time even when the size of the boat is large.

An Almost Optimal Algorithm for Winkler’s Sorting Pairs in Bins. Hiro Ito, Junichi Teruyama and Yuichi Yoshida. Progress in Informatics 2012.
We investigate the following sorting problem: We are given n bins with two balls in each bin. Balls in the \(i\)th bin are numbered \(n + 1  i\). We can swap two balls from adjacent bins. How many number of swaps are needed in order to sort balls, i.e., move every ball to the bin with the same number. For this problem the best known solution requires almost \(\frac{4n^2}{5}\) swaps. In this paper, we show an algorithm which solves this problem using less than \(\frac{2n^2}{3}\) swaps. Since it is known that the lower bound of the number of swapsi s \(\lceil {2n \choose 2}/3 \rceil = \lceil \frac{2n^2}{3}  \frac{n}{3}\rceil\), our resultis is almost tight. Furthermore, we show that for \(n = 2^m + 1 (m \geq 0)\) the algorithm is optimal.

Testing the (s, t)Disconnectivity of Graphs and Digraphs. Yusuke Kobayashi and Yuichi Yoshida. Theoretical Computer Science 2012.
Property testing is concerned with constanttime algorithms for deciding whether a given object satisfies a predetermined property or is far from satisfying it. In this paper, we consider testing properties related to the connectivity of two vertices in sparse graphs.
We present onesided error testers for \((s,t)\)disconnectivity with query complexity \(2^{O(1/\epsilon}\) for digraphs and \(O(1/\epsilon^2)\) for graphs, where \(\epsilon\) is an error parameter. Furthermore, we show that these algorithms are the best possible in view of query complexity, i.e., we give matching lower bounds for twosided error testers for both cases.
We also give a constanttime algorithm for testing the \((s,t)\)disconnectivity of a directed boundeddegree hypergraph, which can be used to test the satisfiability of Horn SAT.

Property Testing on kVertex Connectivity of Graphs. Yuichi Yoshida and Hiro Ito. Algorithmica, 2012.
We present an algorithm for testing the \(k\)vertexconnectivity of graphs with the given maximum degree. The time complexity of the algorithm is independent of the number of vertices and edges of graphs. Fixed degree bound \(d\), a graph \(G\) with \(n\) vertices and a maximum degree at most \(d\) is called \(\epsilon\)far from \(k\)vertexconnectivity when at least \(\frac{\epsilon dn}{2}\) edges must be added to or removed from \(G\) to obtain a \(k\)vertexconnected graph with a maximum degree at most \(d\). The algorithm always accepts every graph that is \(k\)vertexconnected and rejects every graph that is \(\epsilon\)far from \(k\)vertexconnectivity with a probability of at least \(2/3\). The algorithm runs in \(O(d(\frac{c}{\epsilon d})^k \log\frac{1}{\epsilon d})\) time (\(c>1\) is a constant) for \((k−1)\)vertexconnected graphs, and in \(O(d(\frac{ck}{\epsilon d})^k \log\frac{k}{\epsilon d})\) time (\(c>1\) is a constant) for general graphs. It is the first constanttime \(k\)vertexconnectivity testing algorithm for general \(k \geq 4\).

Linear programming, width1 CSPs, and robust satisfaction. Gabor Kun, Ryan O’Donnell, Suguru Tamaki, Yuichi Yoshida and Yuan Zhou. ITCS 2012.
We say that an algorithm robustly decides a constraint satisfaction problem \(\Pi\) if it distinguishes atleast\((1\epsilon)\)satisfiable instances from lessthan\((1  r(\epsilon))\)satisfiable instances for some function \(r(\epsilon)\) with \(r(\epsilon) \to 0\) as \(\epsilon \to 0\). In this paper we show that the canonical linear programming relaxation robustly decides \(\Pi\) if and only if \(\Pi\) has "width 1" (in the sense of Feder and Vardi).

Studies on ConstantTime Algorithms for BoundedDegree Graphs and Constraint Satisfaction Problems. Yuichi Yoshida. Ph.D. thesis, Kyoto University, 2012.
We are concerned with constanttime algorithms, which aim for solving decision problems or optimization problems in time independent of sizes of inputs. To obtain such efficient algorithms, we relax the concept of decision and optimization as follows. An algorithm is said to test a property if, with high probability, it decides whether an input satisfies the property or we must modify an \(\epsilon\)fraction of the input to make it satisfy the property. An algorithm is said to \((\alpha, \epsilon n)\)approximate an optimization problem if, with high probability, it outputs an \((\alpha, \epsilon n)\)approximation to the optimal value. Here, a value \(x\) is called an \((\alpha, \epsilon n)\)approximation to \(x^*\) if \(\alpha x^*  \epsilon n \leq x \leq x^*\) where \(n\) is the size of an input. By relaxing this way, various problems become solvable in constant time.
Constanttime algorithms were first studied for algebraic properties of functions for applications to computational complexity. However, because of their theoretical richness and potential to practical applications, they have been studied for many different areas in the last decade. In this thesis, we concentrate on problems on boundeddegree graphs and constraint satisfaction problems (CSPs). In this setting, each vertex (or variable) is incident to a constant number of edges (or constraints). Boundeddegree instances are practically important since realworld problems often give rise to such instances, and they are also theoretically important since we become able to study various fundamental problems that are trivial in general instances.
We first give a constanttime \((1, \epsilon n)\)approximation algorithm for the maximum matching problem. Compared to previous works, our algorithm runs exponentially faster with respect to \(1/\epsilon\). Using the algorithm as a building block, we show various related problems are also testable in constant time. A graph \(G = (V , E )\) is called \((k,l)\)sparse if \(F \leq kV(F)l\) for any nonempty \(F \subseteq E\). Here, \(V(F)\) denotes the set of vertices incident to \(F\). We give a constanttime algorithm that tests a graph \(G = (V, E)\) has a \((k, l)\)sparse subgraph with \(V\) vertices and \(kV  − l\) edges. Depending on values of \(k\) and \(l\), our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed. Based on this result, we also give a constanttime testing algorithm for (k, l)edgeconnected orientability, where a graph \(G\) is called \((k, l)\)edgeconnectedorientable if there exists an orientation \(\vec{G}\) of \(G\) with a vertex \(r \in V\) such that \(\vec{G}\) contains \(k\) arcdisjoint dipaths from \(r\) to each vertex \(v \in V\) and \(l\) arcdisjoint dipaths from each vertex \(v \in V\) to \(r\).
Secondly, we consider constanttime algorithms for CSPs. Let \(\Lambda\) be a set of relations. Then, an instance of \(\mathrm{CSP}(\Lambda)\) is a pair of variables and constraints over the variables such that each constraint belongs to \(\Lambda\). In the problem, called \(\mathrm{MaxCSP}(\Lambda)\), given an instance of \(\mathrm{CSP}(\Lambda)\), we want to compute an assignment that maximizes the number of satisfied constraints. Here, we give the “best” constanttime approxi mation algorithm for every \(\mathrm{MaxCSP}(\Lambda)\). Specifically, for each \(\Lambda\), there exists some constant \(\alpha_\Lambda\) with the following properties. (1) for any \(\epsilon > 0\), there exists a constanttime \((\alpha_\Lambda  \epsilon, \epsilon n)\)approximation algorithm for \(\mathrm{MaxCSP}(\Lambda)\). (2) for any \(\epsilon > 0\), there exists \(\delta > 0\) such that no constanttime \((\alpha_\Lambda + \epsilon, \delta n)\)approximation algorithm for \(\mathrm{MaxCSP}(\Lambda)\) exists. Here, \(\alpha_\Lambda\) coincides with the integrality gap of a standard LP relaxation for \(\mathrm{MaxCSP}(\Lambda)\).
We consider testing satisfiability of CSP instances. Using the result above, we can derive the characterization of the constanttime testability of \(\mathrm{CSP}(\Lambda)\). Let \(\mathrm{opt}(I)\) and \(\mathrm{lp}(I)\) denote the optimal value and the LP value for an instance \(I\), respectively. Then, we have the followings. (i) if any instance of \(\mathrm{CSP}(\Lambda)\) with \(lp(I) \geq 1−\epsilon\) satisfies \(\mathrm{opt}(I) \geq 1 − r(\epsilon)\) where \(\lim_{\epsilon \to 0} r(\epsilon) = 0\), then \(CSP(\Lambda)\) is testable in constant time. (ii) if there is an instance \(I\) of \(\mathrm{CSP}(\Lambda)\) with \(\mathrm{lp}(I) = 1\) and \(\mathrm{opt}(I) < 1\), then \(\mathrm{CSP}(\Lambda)\) is not testable in constant time.
To achieve a combinatorial characterization of the testability of \(\mathrm{CSP}(\Lambda)\), we show a close connection between constanttime testability and the notion of width 1. We say \(\mathrm{CSP}(\Lambda)\) has width 1 if satisfiability of instances of \(\mathrm{CSP}(\Lambda)\) can be decided by a certain simple propagation algorithm. Then, we prove that any CSP of width 1 satisfies (i), which means that any such CSP is testable in constant time.
In a subsequent work (Kun et al. ITCS 2012), which includes the part of the present thesis, we have shown that any CSP of width more than 1 satisfies (ii). Thus, a CSP is constanttime testable if and only if it has width 1.

Improved ConstantTime Approximation Algorithms for Maximum Matchings and Other Optimization Problems. Yuichi Yoshida, Masaki Yamamoto and Hiro Ito. Siam Journal on Computing 2012.
We study constanttime approximation algorithms for boundeddegree graphs, which run in time independent of the number of vertices \(n\). We present an algorithm that decides whether a vertex is contained in a some fixed maximal independent set with expected query complexity \(O(d^2)\), where \(d\) is the degree bound. Using this algorithm, we show constanttime approximation algorithms with certain multiplicative error and additive error \(\epsilon n\) for many other problems, e.g., the maximum matching problem, the minimum vertex cover problem, and the minimum set cover problem, that run exponentially faster than existing algorithms with respect to \(d\) and \(\frac{1}{\epsilon}\). Our approximation algorithm for the maximum matching problem can be transformed to a twosided error tester for the property of having a perfect matching. On the contrary, we show that every onesided error tester for the property requires at least \(\Omega(n)\) queries.

On the Distance Between Nonisomorphic Groups. Gábor Ivanyos, François Le Gall and Yuichi Yoshida. European Journal of Combinatorics 2012. [arXiv]
A result of BenOr, Coppersmith, Luby and Rubinfeld on testing whether a map be two groups is close to a homomorphism implies a tight lower bound on the distance between the multiplication tables of two nonisomorphic groups.
2011

Algorithms for Finding a Maximum NonkLinked Graph. Yusuke Kobayashi and Yuichi Yoshida. ESA 2011.
A graph with at least \(2k\) vertices is said to be klinked if for any ordered \(k\)tuples \((s_1, \ldots,s_k)\) and \((t_1, \ldots, t_k)\) of \(2k\) distinct vertices, there exist pairwise vertexdisjoint paths \(P_1,\ldots, P_k\) such that \(P_i\) connects si and ti for \(i = 1, \ldots, k\). For a given graph \(G\), we consider the problem of finding a maximum induced subgraph of \(G\) that is not \(k\)linked. This problem is a common generalization of computing the vertexconnectivity and testing the \(k\)linkedness of \(G\), and it is closely related to the concept of \(H\)linkedness. In this paper, we give the first polynomialtime algorithm for the case of \(k = 2\), whereas a similar problem that finds a maximum induced subgraph without 2vertexdisjoint paths connecting fixed terminal pairs is NPhard. For the case of general \(k\), we give an \((8k − 2)\)additive approximation algorithm. We also investigate the computational complexities of the edgedisjoint case and the directed case.

Property Testing for Cyclic Groups and Beyond. Francois Le Gall and Yuichi Yoshida. COCOON 2011.
This paper studies the problem of testing if an input (Γ,¿), where Γ is a finite set of unknown size and ¿ is a binary operation over Γ given as an oracle, is close to a specified class of groups. Friedl et al. (Proc. of STOC, 2005) have constructed an efficient tester using poly(logΓ) queries for the class of abelian groups. We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: Ω(Γ1/6) queries are necessary to test if the input is close to a cyclic group, and Ω(Γ c ) queries for some constant c are necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed integer k¿1. We also show that knowledge of the size of the ground set Γ helps only for k=1, in which case we construct an efficient tester using poly(logΓ) queries; for any other value k¿2 the query complexity remains Ω(Γ c ). All our upper and lower bounds hold for both the edit distance and the Hamming distance. These are, to the best of our knowledge, the first nontrivial lower bounds for such grouptheoretic problems in the property testing model and, in particular, they imply the first exponential separations between the classical and quantum query complexities of testing closeness to classes of groups.

Lower Bounds on the Query Complexity for Testing Bounded Degree CSPs. Yuichi Yoshida. CCC 2011. [slides]
In this paper, we consider lower bounds on the query complexity for testing CSPs in the boundeddegree model. We mainly consider Boolean CSPs allowing literals. First, for any "symmetric" predicate P : {0, 1}k → {0,1} except EQU where k ≥ 3, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances (P1 (0)/2k  ϵ)far from satisfiability requires Ω(n1/2+δ) queries where n is the number of variables and δ >; 0 is a constant that depends on P and e. This breaks a natural lower bound Ω(n1/2), which is obtained by the birthday paradox. We also show that every onesided error tester requires Ω(n) queries for such P. These results are hereditary in the sense that the same results hold for any predicate Q such that P1 (1) ⊆ Q1(1). For EQU, we give a onesided error tester whose query complexity is O̅(n1/2). Also, for 2XOR (or, equivalently E2LIN2), we show an Ω(n1/2+δ) lower bound for distinguishing instances between eclose to and (1/2 ϵ)far from satisfiability. Next, for the general kCSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances (1  2k/2k  ϵ)far from satisfiability requires Ω(n) queries. The matching NPhardness is not known, even assuming the Unique Games Conjecture or the dto1 Conjecture. As a corollary, for Maximum Independent Set on graphs with n vertices and a degree bound d, we show that every approximation algorithm within a factor d/poly log d and an additive error of ϵn requires Ω(n) queries. Previously, only superconstant lower bounds were known.

Raghavendra (STOC 2008) gave an elegant and surprising result: if Khot's Unique Games Conjecture (STOC 2002) is true, then for every constraint satisfaction problem (CSP), the best approximation ratio is attained by a certain simple semidefinite programming and a rounding scheme for it.
In this paper, we show that similar results hold for constanttime approximation algorithms in the boundeddegree model. Specifically, we present the following: (i) For every CSP, we construct an oracle that serves an access, in constant time, to a nearly optimal solution to a basic LP relaxation of the CSP. (ii) Using the oracle, we give a constanttime rounding scheme that achieves an approximation ratio coincident with the integrality gap of the basic LP. (iii) Finally, we give a generic conversion from integrality gaps of basic LPs to hardness results. All of those results are unconditional. Therefore, for every boundeddegree CSP, we give the best constanttime approximation algorithm among all.
A CSP instance is called εfar from satisfiability if we must remove at least an εfraction of constraints to make it satisfiable. A CSP is called testable if there is a constanttime algorithm that distinguishes satisfiable instances from εfar instances with probability at least 2/3. Using the results above, we also derive, under a technical assumption, an equivalent condition under which a CSP is testable in the boundeddegree model.
2010

A Query Efficient NonAdaptive Long Code Test with Perfect Completeness. Suguru Tamaki and Yuichi Yoshida. RANDOM 2010.
Long Code testing is a fundamental problem in the area of property testing and hardness of approximation. In this paper, we study how small the soundness \(s\) of the Long Code test with perfect completeness can be by using nonadaptive \(q\) queries. We show that \(s = (2q + 3)/2^q\) is possible, where \(q\) is of the form math formula for \(2^{k}1\) any integer \(k > 2\).

Testing Outerplanarity of Bounded Degree Graphs. Yuichi Yoshida and Hiro Ito. RANDOM 2010.
We present an efficient algorithm for testing outerplanarity of graphs in the boundeddegree model. In this model, given a graph \(G\) with degree bound \(d\), we should distinguish with high probability the case that \(G\) is outerplanar from the case that modifying at least an \(\epsilon\)fraction of the edge set is necessary to make \(G\) outerplanar. Our algorithm runs in time polynomial in \(d\) and \(1/\epsilon\) only. To achieve the time complexity, we exploit the treelike structure inherent to an outerplanar graph using the microtree/macrotree decomposition of a tree. As a corollary, we show that the property of being a cactus is testable in time polynomial in \(d\) and \(1/\epsilon\).

Conjunctive Filter: Breaking the Entropy Barrier. Daisuke Okanohara and Yuichi Yoshida. ALENEX 2010.
We consider a problem for storing a map that associates a key with a set of values. To store \(n\) values from the universe of size \(m\), it requires log 2(m/n) bits of space, which can be approximated as \((1.44 + n) \log 2 m/n\) bits when n m. If we allow ε fraction of errors in outputs, we can store it with roughly n log21/ε bits, which matches the entropy bound. Bloom filter is a wellknown example for such data structures. Our objective is to break this entropy bound and construct more spaceefficient data structures. In this paper, we propose a novel data structure called a conjunctive filter, which supports conjunctive queries on k distinct keys for fixed k. Although a conjunctive filter cannot return the set of values itself associated with a queried key, it can perform conjunctive queries with O(1/√m) fraction of errors. Also, the consumed space is n/k log2m bits and it is significantly smaller than the entropy bound n/2 log2m when k ≥ 3. We will show that many problems can be solved by using a conjunctive filter such as fulltext search and database join queries. Also, we conducted experiments using a realworld data set, and show that a conjunctive filter answers conjunctive queries almost correctly using about 1/2 ∼ 1/4 space as the entropy bound.

QueryNumber Preserving Reductions and Linear Lower Bounds for Testing. Yuichi Yoshida and Hiro Ito. IEICE 2010.
In this paper, we study lower bounds on the query complexity of testing algorithms for various problems. Given an oracle that returns information of an input object, a testing algorithm distinguishes the case that the object has a given property \(P\) from the case that it has a large distance to having \(P\) with probability at least \(2/3\). The query complexity of an algorithm is measured by the number of accesses to the oracle. We introduce two reductions that preserve the query complexity. One is derived from the gappreserving local reduction and the other is from the Lreduction. By using the former reduction, we show linear lower bounds on the query complexity for testing basic NPcomplete properties, i.e., 3edgecolorability, directed Hamiltonian path/cycle, undirected Hamiltonian path/cycle, 3dimensional matching and NPcomplete generalized satisfiability problems. Also, using the second reduction, we show a linear lower bound on the query complexity of approximating the size of the maximum 3dimensional matching.

Testing kEdgeConnectivity of Digraphs. Yuichi Yoshida and Hiro Ito. Journal of System Science and Complexity 2010.
This paper presents an algorithm that tests whether a given degreebounded digraph is kedgeconnected or \(\epsilon\)far from \(k\)edgeconnectivity. This is the first testing algorithm for \(k\)edgeconnectivity of digraphs whose running time is independent of the number of vertices and edges. A digraph of \(n\) vertices with degree bound \(d\) is \(\epsilon\)far from \(k\)edgeconnectivity if at least \(\epsilon dn\) edges have to be added or deleted to make the digraph \(k\)edgeconnected, preserving the degree bound. Given a constant error parameter \(\epsilon\) and a degree bound \(d\), our algorithm always accepts all \(k\)edgeconnected digraphs and rejects all digraphs that is \(\epsilon\)far from \(k\)edgeconnectivity with probability at least \(2/3\). It runs in \(O\left(d\left(\frac{c}{εd}\right)^k \log\frac{1}{\epsilon d}\right)\) (\(c > 1\) is a constant) time when input digraphs are restricted to be \((k1)\)edge connected and runs in \(O\left(d\left(\frac{ck}{\epsilon d}\right)^k\log\frac{k}{\epsilon d}\right)\) (\(c > 1\) is a constant) time for general digraphs.
2009

An Improved ConstantTime Approximation Algorithm for Maximum Matchings. Yuichi Yoshida, Masaki Yamamoto and Hiro Ito. STOC 2009
This paper studies approximation algorithms for problems on degreebounded graphs. Let \(n\) and \(d\) be the number of vertices and the degree bound, respectively. This paper presents an algorithm to approximate the size of some maximal independent set with additive error \(\epsilon n\) whose running time is \(O(d^2)\). Using this algorithm, it also shows that there are approximation algorithms for many other problems, e.g., the maximum matching problem, the minimum vertex cover problem, and the minimum set cover problem, that run exponentially faster than existing algorithms with respect to d and 1/ε. Its approximation algorithm for the maximum matching problem can be transformed to a testing algorithm for the property of having a perfect matching with twosided error. On the contrary, it also shows that every onesided error tester for the property requires at least \(\Omega(n)\) queries.

Fast Calculation of Levenshtein Distance on the Cell Processor. Shun Sakuraba and Yuichi Yoshida. SACSIS 2009. (in Japanese)
Cell Challenge 2009の規定課題において、Cell Broadband Engineを用いて文字列の編集距 離を計算するプログラムを作成するという課題が出題された。本稿では我々が規定課題において考案した最適化技法について報告する。基本となるアルゴリズムはビット並列化法と呼ばれるものであり、実行委員会によるサンプルプログラムと比べて約 315 倍の高速化を達成した。
2008

Property Testing on kVertex Connectivity of Graphs. Yuichi Yoshida and Hiro Ito. ICALP 2008.
Wepresentanalgorithmfortestingthekvertexconnectivity of graphs with given maximum degree. The time complexity of the algorithm is independent of the number of vertices and edges of graphs. A graph \(G\) with \(n\) vertices and maximum degree at most \(d\) is called \(\epsilon\)far from \(k\)vertexconnectivity when at least \(\frac{\epsilon dn}{2}\) edges must be added to or removed from \(G\) to obtain a \(k\)vertexconnected graph with maximum degree at most \(d\). The algorithm always accepts every graph that is \(k\)vertexconnected and rejects every graph that is \(\epsilon\)far from \(k\)vertexconnectivity with a probability of at least \(2/3\). The algorithm runs in \(O\left(d \left(\frac{c}{\epsilon d}\right)^k \log \frac{1}{\epsilon d}\right)\) time (\(c > 1\) is a constant) for given \((k − 1)\)vertexconnected graphs, and \(O\left( d \left(\frac{ck}{\epsilon d}\right)^k \log \frac{k}{\epsilon d}\right)\) time (\(c > 1\) is a constant) for given general graphs. It is the first constanttime \(k\)vertexconnectivity testing algorithm for general \(k \geq 4\).