Publications | Yuichi Yoshida

Publications

2017

  • An Iterative Approach for the Global Estimation of Sentence Similarity. Tomoyuki Kajiwara, Danushka Bollegala, Yuichi Yoshida and Ken-ichi Kawarabayashi. PLOS ONE 2017.

    Measuring the similarity between two sentences is often difficult due to their small lexical overlap. Instead of focusing on the sets of features in two given sentences between which we must measure similarity, we propose a sentence similarity method that considers two types of constraints that must be satisfied by all pairs of sentences in a given corpus. Namely, (a) if two sentences share many features in common, then it is likely that the remaining features in each sentence are also related, and (b) if two sentences contain many related features, then those two sentences are themselves similar. The two constraints are utilized in an iterative bootstrapping procedure that simultaneously updates both word and sentence similarity scores. Experimental results on SemEval 2015 Task 2 dataset show that the proposed iterative approach for measuring sentence semantic similarity is significantly better than the non-iterative counterparts.

    Natural Language Processing
  • Streaming Algorithms for Maximizing Monotone Submodular Functions under a Knapsack Constraint. Chien-Chung Huang, Naonori Kakimura and Yuichi Yoshida. APPROX 2017.

    In this paper, we consider the problem of maximizing a monotone submodular function subject to a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access only to a small fraction of the data stored in primary memory. For this problem, we propose a \((0.363 - \epsilon)\)-approximation algorithm, requiring only a single pass through the data; moreover, we propose a \((0.4 - \epsilon)\)-approximation algorithm requiring a constant number of passes through the data. The required memory space of both algorithms depends only on the size of the knapsack capacity and ε.

    Submodular Functions
  • Landmark indexing for Evaluation of Label-Constrained Reachability Queries. Lucien Valstar, George Fletcher and Yuichi Yoshida. SIGMOD 2017. [slides] [code]

    Consider a directed edge-labeled graph, such as a social net- work or a citation network. A fundamental query on such data is to determine if there is a path in the graph from a given source vertex to a given target vertex, using only edges with labels in a restricted subset of the edge labels in the graph. Such label-constrained reachability (LCR) queries play an important role in graph analytics, for example, as a core fragment of the so-called regular path queries which are supported in practical graph query languages such as the W3C’s SPARQL 1.1, Neo4j’s Cypher, and Oracle’s PGQL. Current solutions for LCR evaluation, however, do not scale to large graphs which are increasingly common in a broad range of application domains. In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmark-based indexes for large graphs. We show through extensive experiments that our indexes are significantly smaller than state-of-the-art LCR indexing techniques, while supporting up to orders of magnitude faster query evaluation times. Our complete C++ codebase is available as open source for further research.

    Network Algorithms
  • Portfolio Optimization for Influence Spread. Naoto Ohsaka and Yuichi Yoshida. WWW 2017. [slides (by Ohsaka)]

    Motivated by viral marketing, stochastic diffusion processes that model influence spread on a network have been studied intensively. The primary interest in such models has been to find a seed set of a fixed size that maximizes the expected size of the cascade from it. Practically, however, it is not desirable to have the risk of ending with a small cascade, even if the expected size of the cascade is large. To address this issue, we adopt conditional value at risk (CVaR) as a risk measure, and propose an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR. Using real-world social networks, we demonstrate that the portfolio computed by our algorithm has a significantly better CVaR than seed sets computed by other baseline methods.

    Network Algorithms,Submodular Functions
  • Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective func-tions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a nontrivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable trade-off between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.

    Submodular Functions
  • Non-monotone DR-Submodular Function Maximization. Tasuku Soma and Yuichi Yoshida. AAAI 2017. [arXiv] [poster]

    We consider non-monotone DR-submodular function maximization, where DR-submodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing non-monotone DR-submodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a \(1/(2+\epsilon)\)-approximation algorithm with a running time of roughly \(O(n/\epsilon \log^2B)\), where \(n\) is the size of the ground set, B is the maximum value of a coordinate, and \(\epsilon > 0\) is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and real-world datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.

    Submodular Functions
  • Computing Least Cores of Supermodular Cooperative Games. Daisuke Hatano and Yuichi Yoshida. AAAI 2017

    One of the goals of a cooperative game is to compute a valuedivision to the players from which they have no incentive todeviate. This concept is formalized as the notion of the core.To obtain a value division that motivates players to cooperate to a greater extent or that is more robust under noise, the notions of the strong least core and the weak least core have been considered. In this paper, we characterize the strong and the weak least cores of supermodular cooperative games using the theory of minimizing crossing submodular functions. We then apply our characterizations to two representative supermodular cooperative games, namely, the induced subgraph game generalized to hypergraphs and the airport game. For these games, we derive explicit forms of the strong and weak least core values, and provide polynomial-time algorithms that compute value divisions in the strong and weak least cores.

    Submodular Functions
  • Random-Radius Ball Method for Estimating Closeness Centrality. Wataru Inariba, Takuya Akiba and Yuichi Yoshida. AAAI 2017.

    In the analysis of real-world complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distance-based centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the random-radius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on real-world networks.

    Network Algorithms

2016

  • Minimizing Quadratic Functions in Constant Time. Kohei Hayashi and Yuichi Yoshida. NIPS 2016. [arXiv] [poster]

    A sampling-based optimization method for quadratic functions is proposed. Our method approximately solves the following \(n\)-dimensional quadratic minimization problem in constant time, which is independent of \(n\): \(z^*=\min_{\mathbf{v} \in \mathbb{R}^n}\langle \mathbf{v},A\mathbf{v}\rangle+n\langle \mathbf{v}\mathrm{diag}(\mathbf{d})\mathbf{v}\rangle+n\langle \mathbf{b},\mathbf{v}\rangle \), where \(A \in \mathbb{R}^{n \times n}\) a matrix and \(\mathbf{d},\mathbf{b} \in \mathbb{R}^n\) are vectors. Our theoretical analysis specifies the number of samples \(k(\delta,\epsilon)\) such that the approximated solution \(z\) satisfies \(|z-z^*|=O(\epsilon n^2)\) with probability \(1-\delta\). The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.

    Sublinear-Time Algorithms
  • Testing List H-Homomorphisms. Yuichi Yoshida. Computational Complexity 2016. [slides]

    In the List \(H\)-Homomorphism Problem, for a graph \(H\) that is a parameter of the problem, an instance consists of an undirected graph \(G\) with a list constraint \(L(v) \subseteq V(H)\) for each variable \(v \in V(G)\), and the objective is to determine whether there is a list \(H\)-homomorphism \(f:V(G) \to V(H)\), that is, \(f(v) \in L(v)\) for every \(v \in V(G)\) and \((f(u),f(v)) \in E(H)\) whenever \((u,v) \in E(G)\). We consider the problem of testing list \(H\)-homomorphisms in the following weighted setting: An instance consists of an undirected graph \(G\), list constraints \(L\), weights imposed on the vertices of \(G\), and a map \(f:V(G) \to V(H)\) given as an oracle access. The objective is to determine whether \(f\) is a list \(H\)-homomorphism or far from any list \(H\)-homomorphism. The farness is measured by the total weight of vertices \(v \in V(G)\) for which \(f(v)\) must be changed so as to make \(f\) a list \(H\)-homomorphism. In this paper, we classify graphs H with respect to the number of queries to \(f\) required to test the list \(H\)-homomorphisms. Specifically, we show that (i) list \(H\)-homomorphisms are testable with a constant number of queries if and only if \(H\) is a reflexive complete graph or an irreflexive complete bipartite graph and (ii) list \(H\)-homomorphisms are testable with a sublinear number of queries if and only if \(H\) is a bi-arc graph.

    Constraint Satisfaction Problems,Sublinear-Time Algorithms,Graphs
  • Cycle and Flow Trusses in Directed Networks. Taro Takaguchi and Yuichi Yoshda. Royal Society Open Science 2016. [arXiv] [code]

    When we represent real-world systems as networks, the directions of links often convey valuable information. Finding module structures that respect link directions is one of the most important tasks for analysing directed networks. Although many notions of a directed module have been proposed, no consensus has been reached. This lack of consensus results partly because there might exist distinct types of modules in a single directed network, whereas most previous studies focused on an independent criterion for modules. To address this issue, we propose a generic notion of the so-called truss structures in directed networks. Our definition of truss is able to extract two distinct types of trusses, named the cycle truss and the flow truss, from a unified framework. By applying the method for finding trusses to empirical networks obtained from a wide range of research fields, we find that most real networks contain both cycle and flow trusses. In addition, the abundance of (and the overlap between) the two types of trusses may be useful to characterize module structures in a wide variety of empirical networks. Our findings shed light on the importance of simultaneously considering different types of modules in directed networks.

    Network Algorithms
  • Testing Assignments to Constraint Satisfaction Problems. Hubie Chen, Matt Valeriote and Yuichi Yoshida. FOCS 2016. [arXiv] [slides]

    For a finite relational structure \(A\), let \(\mathrm{CSP}(A)\) denote the CSP instances whose constraint relations are taken from \(A\). The resulting family of problems \(\mathrm{CSP}(A)\) has been considered heavily in a variety of computational contexts. In this article, we consider this family from the perspective of property testing: given an instance of a CSP and query access to an assignment, one wants to decide whether the assignment satisfies the instance, or is far from so doing. While previous work on this scenario studied concrete templates or restricted classes of structures, this article presents comprehensive classification theorems. Our first contribution is a dichotomy theorem completely characterizing the structures \(A\) such that \(\mathrm{CSP}(A)\) is constant-query testable: (i) If \(A\) has a majority polymorphism and a Maltsev polymorphism, then \(\mathrm{CSP}(A)\) is constant-query testable with one-sided error. (ii) Else, testing \(\mathrm{CSP}(A)\) requires a super-constant number of queries. Let \(\exists\mathrm{CSP}(A)\) denote the extension of \(\mathrm{CSP}(A)\) to instances which may include existentially quantified variables. Our second contribution is to classify all structures \(A\) in terms of the number of queries needed to test assignments to instances of \(\exists\mathrm{CSP}(A)\), with one-sided error. More specifically, we show the following trichotomy (i) If \(A\) has a majority polymorphism and a Maltsev polymorphism, then \(\exists\mathrm{CSP}(A)\) is constant-query testable with one-sided error. (ii) Else, if A has a \((k + 1)\)-ary near-unanimity polymorphism for some \(k \geq 2\), and no Maltsev polymorphism then \(\exists\mathrm{CSP}(A)\) is not constant-query testable (even with two-sided error) but is sublinear-query testable with one-sided error. (iii) Else, testing \(\exists\mathrm{CSP}(A)\) with one-sided error requires a linear number of queries.

    Constraint Satisfaction Problems,Sublinear-Time Algorithms
  • Testing Properties of Functions on Finite Groups. Kenta Oono and Yuichi Yoshida. Random Structures & Algorithms 2016. [arXiv]

    We study testing properties of functions on finite groups. First we consider functions of the form math formula, where \(G\) is a finite group. We show that conjugate invariance, homomorphism, and the property of being proportional to an irreducible character is testable with a constant number of queries to \(f\), where a character is a crucial notion in representation theory. Our proof relies on representation theory and harmonic analysis on finite groups. Next we consider functions of the form math formula, where \(d\) is a fixed constant and math formula is the family of \(d\) by \(d\) matrices with each element in math formula. For a function math formula, we show that the unitary isomorphism to \(g\) is testable with a constant number of queries to \(f\), where we say that \(f\) and \(g\) are unitary isomorphic if there exists a unitary matrix \(U\) such that math formula for any math formula.

    Boolean Functions,Sublinear-Time Algorithms,Group Theory
  • Half-integrality, LP-branching and FPT Algorithms. Yoichi Iwata, Magnus Wahlström and Yuichi Yoshida. SIAM Journal on Computing 2016. [arXiv]

    A recent trend in parameterized algorithms is the application of polytope tools to fixed-parameter tractable (FPT) algorithms [e.g., Cygan et al., 2011; Narayanaswamy et al., 2012]. Although this approach has yielded significant speedups for a range of important problems, it requires the underlying polytope to have very restrictive properties, including half-integrality and Nemhauser--Trotter-style persistence properties. To date, these properties are essentially known to hold only for two classes of polytopes, covering the cases of Vertex Cover [Nemhauser and Trotter, 1975] and Node Multiway Cut [Garg et al., 2004]. Taking a slightly different approach, we view half-integrality as a discrete relaxation of a problem, e.g., a relaxation of the search space from \(\{0,1\}^V\) to \(\{0,1/2,1\}^V\) such that the new problem admits a polynomial-time exact solution. Using tools from constraint satisfaction problems [in particular Thapper and Živný, 2012] to study the existence of such relaxations, we are able to provide a much broader class of half-integral polytopes with the required properties. Our results unify and significantly extend the previously known cases, and yield a range of new and improved FPT algorithms, including an \(O^*(|\Sigma|^{2k})\)-time algorithm for node-deletion Unique Label Cover and an \(O^*(4^k)\)-time algorithm for Group Feedback Vertex Set where the group is given by oracle access. The latter result also implies the first single-exponential time FPT algorithm for Subset Feedback Vertex Set, answering an open question of Cygan et al., 2016]. Additionally, we propose a network-flow-based approach to solve several cases of the relaxation problem. This gives the first linear-time FPT algorithm to edge-deletion Unique Label Cover.

    Fixed Parameter Tractability,Graphs
  • Dynamic Influence Analysis in Evolving Networks. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and Ken-ichi Kawarabayashi. Proceedings of the VLDB Endowment 2016. [slides (by Ohsaka)] [code]

    We propose the first real-time fully-dynamic index data structure designed for influence analysis on evolving networks. With this aim, we carefully redesign the data structure of the state-of-the-art sketching method introduced by Borgs et al., and construct corresponding update algorithms. Using this index, we present algorithms for two kinds of queries, influence estimation and influence maximization, which are strongly motivated by practical applications, such as viral marketing. We provide a thorough theoretical analysis, which guarantees the non-degeneracy of the solution accuracy after an arbitrary number of updates. Furthermore, we introduce a reachability-tree-based technique and a skipping method, which greatly reduce the time consumption required for edge/vertex deletions and vertex additions, respectively, and counter-based random number generators, which improve the space efficiency. Experimental evaluations using real dynamic networks with tens of millions of edges demonstrate the efficiency, scalability, and accuracy of our proposed indexing scheme. Specifically, it can reflect a graph modification within a time of several orders of magnitude smaller than that required to reconstruct an index from scratch, estimate the influence spread of a vertex set accurately within a millisecond, and select highly influential vertices at least ten times faster than state-of-the-art static algorithms.

    Network Algorithms
  • Efficient Algorithms for Spanning Tree Centrality. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. IJCAI 2016.

    In a connected graph, spanning tree centralities of a vertex and an edge measure how crucial they are for the graph to be connected. In this paper, we propose efficient algorithms for estimating spanning tree centralities with theoretical guarantees on their accuracy. We experimentally demonstrate that our methods are orders of magnitude faster than previous methods. Then, we propose a novel centrality notion of a vertex, called aggregated spanning tree centrality, which also considers the number of connected components obtained by removing the vertex. We also give an efficient algorithm for estimating aggregated spanning tree cen- trality. Finally, we experimentally show that those spanning tree centralities are useful to identify vulnerable edges and vertices in infrastructure networks.

    Network Algorithms
  • The problem of maximizing non-negative monotone submodular functions under a certain constraint has been intensively studied in the last decade. In this paper, we address the problem for functions defined over the integer lattice. Suppose that a non-negative monotone submodular function \(f:\mathbb{Z}^n_+ \to \mathbb{R}_+\) is given via an evaluation oracle. Assume further that \(f\) satisfies the diminishing return property, which is not an immediate consequence of the submodularity when the domain is the integer lattice. Then, we show polynomial-time \((1-1/e-\epsilon)\)-approximation algorithm for cardinality constraints, polymatroid constraints, and knapsack constraints. For a cardinality constraint, we also show a \((1-1/e-\epsilon)\)-approximation algorithm with slightly worse time complexity that does not rely on the diminishing return property. Our algorithms for a polymatroid constraint and a knapsack constraint first extend the domain of the objective function to the Euclidean space and then run the continuous greedy algorithm. We give two different kinds of continuous extensions, one is for polymatroid constraints and the other is for knapsack constraints, which might be of independent interest.

    Submodular Functions
  • In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.

    Network Algorithms
  • We consider stable signal recovery in \(\ell_q\) quasi-norm for \(0 < q \leq 1\). In this problem, given a measurement vector \(y = Ax\) for some unknown signal vector \(x \in \mathbb{R}^n\) and a known matrix \(A \in \mathbb{R}^{m \times n}\), we want to recover \(z \in \mathbb{R}^n\) with \(\|x - z\|_q = O(\|x - x^*\|_q)\) from a measurement vector, where \(x^*\) is the s-sparse vector closest to \(x\) in \(\ell_q\) quasi-norm. Although a small value of \(q\) is favorable for measuring the distance to sparse vectors, previous methods for \(q < 1\) involve \(\ell_q\) quasi-norm minimization which is computationally intractable. In this paper, we overcome this issue by using the sum-of-squares method, and give the first polynomial-time stable recovery scheme for a large class of matrices \(A\) in \(\ell_q\) quasi-norm for any fixed constant \(0 < q \leq 1\).

    Approximation
  • Let \(\{f_i:\mathbb{F}_p^i\to \{0,1\}\}\) be a sequence of functions, where \(p\) is a fixed prime and \(\mathbb{F}_p\) is the finite field of order \(p\). The limit of the sequence can be syntactically defined using the notion of ultralimit. Inspired by the Gowers norm, we introduce a metric over limits of function sequences, and study properties of it. One application of this metric is that it provides a simpler characterization of affine-invariant parameters of functions that are constant-query estimable than the previous one obtained by Yoshida (STOC'14). Using this characterization, we show that the property of being a function of a constant number of low-degree polynomials and a constant number of factored polynomials (of arbitrary degrees) is constant-query testable if it is closed under blowing-up. Examples of this property include the property of having a constant spectral norm and degree-structural properties with rank conditions.

    Boolean Functions,Sublinear-Time Algorithms
  • Improved Approximation Algorithms for k-Submodular Function Maximization. Satoru Iwata, Shin-ichi Tanigawa and Yuichi Yoshida. SODA 2016. [arXiv]

    This paper presents a polynomial-time \(1/2\)-approximation algorithm for maximizing nonnegative k-submodular functions. This improves upon the previous \(\max\{1/3, 1/(1 + a)\}\)-approximation by Ward and Živný [18], where \(a = \max\{1,\sqrt{(k-1)/4}\}\). We also show that for monotone \(k\)-submodular functions there is a polynomial-time \(k/(2k – 1)\)-approximation algorithm while for any \(\epsilon > 0\) a \(((k + 1)/2k + \epsilon)\)-approximation algorithm for maximizing monotone \(k\)-submodular functions would require exponentially many queries. In particular, our hardness result implies that our algorithms are asymptotically tight. We also extend the approach to provide constant factor approximation algorithms for maximizing skewbisubmodular functions, which were recently introduced as generalizations of bisubmodular functions.

    Submodular Functions
  • Coverage Centralities for Temporal Networks. Taro Takaguchi, Yosuke Yano and Yuichi Yoshida. The European Physical Journal B 2016. [arXiv]

    Structure of real networked systems, such as social relationship, can be modeled as temporal networks in which each edge appears only at the prescribed time. Understanding the structure of temporal networks requires quantifying the importance of a temporal vertex, which is a pair of vertex index and time. In this paper, we define two centrality measures of a temporal vertex based on the fastest temporal paths which use the temporal vertex. The definition is free from parameters and robust against the change in time scale on which we focus. In addition, we can efficiently compute these centrality values for all temporal vertices. Using the two centrality measures, we reveal that distributions of these centrality values of real-world temporal networks are heterogeneous. For various datasets, we also demonstrate that a majority of the highly central temporal vertices are located within a narrow time window around a particular time. In other words, there is a bottleneck time at which most information sent in the temporal network passes through a small number of temporal vertices, which suggests an important role of these temporal vertices in spreading phenomena.

    Network Algorithms

2015

  • We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly \(O(n\log(nr)\log r)\), where \(n\) is the size of the ground set and \(r\) is the maximum value of a coordinate. The dependency on \(r\) is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.

    Submodular Functions
  • A \(k\)-submodular function is a generalization of a submodular function, where the input consists of \(k\) disjoint subsets, instead of a single subset, of the domain. Many machine learning problems, including influence maximization with \(k\) kinds of topics and sensor placement with \(k\) kinds of sensors, can be naturally modeled as the problem of maximizing monotone \(k\)-submodular functions.In this paper, we give constant-factor approximation algorithms for maximizing monotone \(k\)-submodular functions subject to several size constraints.The running time of our algorithms are almost linear in the domain size.We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.

    Submodular Functions
  • Testing Outerplanarity of Bounded Degree Graphs. Hiro Ito and Yuichi Yoshida. Algorithmica 2015.

    We present an efficient algorithm for testing outerplanarity of graphs in the bounded-degree model. In this model, given a graph \(G\) with degree bound \(d\), we should distinguish with high probability the case that \(G\) is outerplanar from the case that modifying at least an \(\epsilon\)-fraction of the edge set is necessary to make \(G\) outerplanar. Our algorithm runs in time polynomial in \(d\) and \(1/\epsilon\) only. To achieve the time complexity, we exploit the tree-like structure inherent to an outerplanar graph using the microtree/macrotree decomposition of a tree. As a corollary, we show that the property of being a cactus is testable in time polynomial in \(d\) and \(1/\epsilon\).

    Sublinear-Time Algorithms
  • Fully Dynamic Betweenness Centrality Maintenance on Massive Networks. Takanori Hayashi, Takuya Akiba and Yuichi Yoshida. Proceedings of the VLDB Endowment 2015. [code]

    Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Among several importance measures, betweenness centrality, in particular, plays key roles in many real applications. Considerable effort has been made for developing algorithms for static settings. However, real networks today are highly dynamic and are evolving rapidly, and scalable dynamic methods that can instantly reflect graph changes into centrality values are required. In this paper, we present the first fully dynamic method for managing betweenness centrality of all vertices in a large dynamic network. Its main data structure is the weighted hyperedge representation of shortest paths called hypergraph sketch. We carefully design dynamic update procedure with theoretical accuracy guarantee. To accelerate updates, we further propose two auxiliary data structures called two-ball index and special-purpose reachability index. Experimental results using real networks demonstrate its high scalability and efficiency. In particular, it can reflect a graph change in less than a millisecond on average for a large-scale web graph with 106M vertices and 3.7B edges, which is several orders of magnitude larger than the limits of previous dynamic methods.

    Network Algorithms
  • A Query Efficient Non-Adaptive Long Code Test with Perfect Completeness. Suguru Tamaki and Yuichi Yoshida. Random Structures & Algorithms 2015.

    Long Code testing is a fundamental problem in the area of property testing and hardness of approximation. In this paper, we study how small the soundness s of the Long Code test with perfect completeness can be by using non-adaptive \(q\) queries. We show that \(s = (2q + 3)/2^q\) is possible, where \(q\) is of the form \(2^k-1\) for any integer \(k > 2\).

    Boolean Functions,Constraint Satisfaction Problems,Sublinear-Time Algorithms,Approximation
  • On the Equivalence among Problems of Bounded Width. Yoichi Iwata and Yuichi Yoshida. ESA 2015. [arXiv]

    In this paper, we introduce a methodology, called decomposition-based reductions, for showing the equivalence among various problems of bounded-width. First, we show that the following are equivalent for any \(α > 0\):

    • SAT can be solved in \(O^*(2^{\alpha \mathrm{tw}})\) time,
    • 3-SAT can be solved in \(O^*(2^{\alpha \mathrm{tw}})\) time,
    • Max 2-SAT can be solved in \(O^*(2^{\alpha\mathrm{tw}})\) time,
    • Independent Set can be solved in \(O^*(2^{\alpha\mathrm{tw}})\) time, and
    • Independent Set can be solved in \(O^*(2^{\alpha\mathrm{cw}})\) time,

    where \(\mathrm{tw}\) and \(\mathrm{cw}\) are the tree-width and clique-width of the instance, respectively. Then, we introduce a new parameterized complexity class EPNL, which includes Set Cover and TSP, and show that SAT, 3-SAT, Max 2-SAT, and Independent Set parameterized by path-width are EPNL-complete. This implies that if one of these \textup{EPNL}-complete problems can be solved in \(O^*(c^k)\) time, then any problem in \textup{EPNL} can be solved in \(O^*(c^k)\) time.

    Exact Algorithms
  • Partially Symmetric Functions are Efficiently Isomorphism-Testable. Eric Blais, Amit Weinstein and Yuichi Yoshida. SIAM Journal on Computing 2015. [arXiv] [slides (by Weinstein)] [video (talk by Weinstein)]

    Given a Boolean function \(f\), the \(f\)-isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to \(f\) up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions \(f\) we can test \(f\)-isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functions---functions invariant to the reordering of all but a constant number of their variables---are efficiently isomorphism testable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphism-testable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearly optimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testing-by-implicit-learning approach to complete the proof that partially symmetric functions are efficiently isomorphism testable.

    Boolean Functions,Sublinear-Time Algorithms
  • Testing Supermodular-cut Condition. Shin-ichi Tanigawa and Yuichi Yoshida. Algorithmica 2015.

    For a function \(f:2^V \to \mathbb{Z}_+\) on a finite set \(V\) with \(f(\emptyset)=f(V)=0\), a digraph \(D=(V,A)\) is called \(f\)-connected if it satisfies the \(f\)-cut condition, that is, \(\delta_D(X) \geq f(X)\) for any \(X \subseteq V\), where \(\delta_D(X)\) is the number of arcs from \(X\) to \(V \setminus X\). We show that, for any crossing supermodular function \(f\), the \(f\)-connectivity can be tested with a constant number of queries in the general digraph model with average degree bound. As immediate corollaries, we obtain constant-time testers for \(k\)-edge-connectivity, rooted-\((k,l)\)-edge-connectivity, and the property of having \(k\) arc-disjoint arborescences. We also give a corresponding result for the undirected case.

    Sublinear-Time Algorithms,Submodular Functions
  • Generalized River Crossing Problems. Hiro Ito, Stefan Langerman and Yuichi Yoshida. Theory of Computing Systems 2015.

    Three men, each with a sister, must cross a river using a boat that can carry only two people in such a way that a sister is never left in the company of another man if her brother is not present. This very famous problem appeared in the Latin book “Problems to Sharpen the Young,” one of the earliest collections of recreational mathematics. This paper considers a generalization of such “river crossing problems” and provides a new formulation that can treat wide variations. The main result is that, if there is no upper bound on the number of transportations (river crossings), a large class of subproblems can be solved in polynomial time even when the passenger capacity of the boat is arbitrarily large. The authors speculated this fact at FUN 2012. On the other hand, this paper also demonstrates that, if an upper bound on the number of transportations is given, the problem is NP-hard even when the boat capacity is three, although a large class of subproblems can be solved in polynomial time if the boat capacity is two.

    Misc
  • Learning Word Representations from Relational Graphs. Danushka Bollegala, Takanori Maehara, Yuichi Yoshida and Ken-ichi Kawarabayashi. AAAI 2015. [arXiv]

    Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter-connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.

    Natural Language Processing
  • Distributed Multiplicative Weights Methods for DCOP. Daisuke Hatano and Yuichi Yoshida. AAAI 2015.

    We introduce a new framework for solving distributed constraint optimization problems that extend the domain of each variable into a simplex.We propose two methods for searching the extended domain for good assignments.The first one relaxes the problem using linear programming, finds the optimum LP solution, and rounds it to an assignment.The second one plays a cost-minimization game, finds a certain kind of equilibrium, and rounds it to an assignment.Both methods are realized by performing the multiplicative weights method in a distributed manner.We experimentally demonstrate that our methods have good scalability,and in particular, the second method outperforms existing algorithms in terms of solution quality and efficiency.

    Constraint Satisfaction Problems
  • Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling. Takuya Akiba, Takanori Hayashi, Nozomi Nori, Yoichi Iwata and Yuichi Yoshida. AAAI 2015.

    We propose an indexing scheme for top-\(k\) shortest-path distance queries on graphs, which is useful in a wide range of important applications such as network-aware search and link prediction. While considerable effort has been made for efficiently answering standard (top-1) distance queries, none of previous methods can be directly extended for top-k distance queries. We propose a new framework for top-\(k\) distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the simple but effective recent notion of pruned landmark labeling. Extensive experimental results on real social and web graphs show the scalability, efficiency and robustness of our method. Moreover, we demonstrate the usefulness of top-\(k\) distance queries through an application to link prediction.

    Network Algorithms

2014

  • Robust Approximation of Temporal CSP. Suguru Tamaki and Yuichi Yoshida. APPROX 2014.

    A temporal constraint language \(\Gamma\) is a set of relations with first-order definitions in \((\mathbb{Q};<)\). Let \(\mathrm{CSP}(\Gamma)\) denote the set of constraint satisfaction problem instances with relations from \(\Gamma\). \(\mathrm{CSP}(\Gamma)\) admits robust approximation if, for any \(\varepsilon \geq 0\), given a \((1-\varepsilon)\)-satisfiable instance of \(\mathrm{CSP}(\Gamma)\), we can compute an assignment that satisfies at least a \((1-f(\varepsilon))\)-fraction of constraints in polynomial time. Here, \(f(\varepsilon)\) is some function satisfying \(f(0)=0\) and \(\lim_{\varepsilon \to 0}f(\varepsilon)=0\).

    Firstly, we give a qualitative characterization of robust approximability: Assuming the Unique Games Conjecture, we give a necessary and sufficient condition on \(\Gamma\) under which \(\mathrm{CSP}(\Gamma)\) admits robust approximation. Secondly, we give a quantitative characterization of robust approximability: Assuming the Unique Games Conjecture, we precisely characterize how \(f(\varepsilon)\) depends on \(\varepsilon\) for each \(\Gamma\). We show that our robust approximation algorithms can be run in almost linear time.

    Approximation,Constraint Satisfaction Problems
  • Betweenness centrality measures the importance of a vertex by quantifying the number of times it acts as a midpoint of the shortest paths between other vertices. This measure is widely used in network analysis. In many applications, we wish to choose the \(k\) vertices with the maximum adaptive betweenness centrality, which is the betweenness centrality without considering the shortest paths that have been taken into account by already-chosen vertices.

    All previous methods are designed to compute the betweenness centrality in a fixed graph. Thus, to solve the above task, we have to run these methods \(k\) times. In this paper, we present a method that directly solves the task, with an almost linear runtime no matter how large the value of \(k\). Our method first constructs a hypergraph that encodes the betweenness centrality, and then computes the adaptive betweenness centrality by examining this graph. Our technique can be utilized to handle other centrality measures.

    We theoretically prove that our method is very accurate, and experimentally confirm that it is three orders of magnitude faster than previous methods. Relying on the scalability of our method, we experimentally demonstrate that strategies based on adaptive betweenness centrality are effective in important applications studied in the network science and database communities.

    Network Algorithms
  • Suppressing Epidemics on Networks by Exploiting Observer Nodes. Taro Takaguchi, Takehisa Hasegawa and Yuichi Yoshida. Physical Review E 2014. [arXiv]

    To control infection spreading on networks, we investigate the effect of observer nodes that recognize infection in a neighboring node and make the rest of the neighbor nodes immune. We numerically show that random placement of observer nodes works better on networks with clustering than on locally treelike networks, implying that our model is promising for realistic social networks. The efficiency of several heuristic schemes for observer placement is also examined for synthetic and empirical networks. In parallel with numerical simulations of epidemic dynamics, we also show that the effect of observer placement can be assessed by the size of the largest connected component of networks remaining after removing observer nodes and links between their neighboring nodes.

    Network Algorithms
  • Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations. Naoto Ohsaka, Takuya Akiba, Yuichi Yoshida and Ken-Ichi Kawarabayashi. AAAI 2014. [slides (by Ohsaka)] [code]

    Influence maximization is a problem to find small sets of highly influential individuals in a social network to maximize the spread of influence under stochastic cascade models of propagation. Although the problem has been well-studied, it is still highly challenging to find solutions of high quality in large-scale networks of the day. While Monte-Carlo-simulation-based methods produce near-optimal solutions with a theoretical guarantee, they are prohibitively slow for large graphs. As a result, many heuristic methods without any theoretical guarantee have been developed, but all of them substantially compromise solution quality. To address this issue, we propose a new method for the influence maximization problem. Unlike other recent heuristic methods, the proposed method is a Monte-Carlo-simulation-based method, and thus it consistently produces solutions of high quality with the theoretical guarantee. On the other hand, unlike other previous Monte-Carlo-simulation-based methods, it runs as fast as other state-of-the-art methods, and can be applied to large networks of the day. Through our extensive experiments, we demonstrate the scalability and the solution quality of the proposed method.

    Network Algorithms
  • We consider the problem of testing if two input forests are isomorphic or are far from being so. An algorithm is called an \(\epsilon\)-tester for forest-isomorphism if given an oracle access to two forests \(G\) and \(H\) in the adjacency list model, with high probability, accepts if \(G\) and \(H\) are isomorphic and rejects if we must modify at least \(\epsilon n\) edges to make \(G\) isomorphic to \(H\). We show an \(\epsilon\)-tester for forest-isomorphism with a query complexity \(\mathrm{polylog}(n)\) and a lower bound of \(\Omega(\sqrt{\log n/\log \log n})\). Further, with the aid of the tester, we show that every graph property is testable in the adjacency list model with \(\mathrm{polylog}(n)\) queries if the input graph is a forest.

    Sublinear-Time Algorithms,Graphs
  • Generalized Skew Bisubmodularity: A Characterization and a Min-Max Theorem. Satoru Fujishige, Shin-ichi Tanigawa and Yuichi Yoshida. Discrete Optimization 2014.

    Huber, Krokhin, and Powell (2013) introduced a concept of skew bisubmodularity, as a generalization of bisubmodularity, in their complexity dichotomy theorem for valued constraint satisfaction problems over the three-value domain. In this paper we consider a natural generalization of the concept of skew bisubmodularity and show a connection between the generalized skew bisubmodularity and a convex extension over rectangles. We also analyze the dual polyhedra, called skew bisubmodular polyhedra, associated with generalized skew bisubmodular functions and derive a min-max theorem that characterizes the minimum value of a generalized skew bisubmodular function in terms of a minimum-norm point in the associated skew bisubmodular polyhedron.

    Submodular Functions
  • Let \(P\) be a property of function \(\mathbb{F}^n_p \to \{0, 1\}\) for a fixed prime \(p\). An algorithm is called a tester for \(P\) if, given a query access to the input function \(f\), with high probability, it accepts when \(f\) satisfies \(P\) and rejects when \(f\) is "far" from satisfying \(P\). In this paper, we give a characterization of affine-invariant properties that are (two-sided error) testable with a constant number of queries. The characterization is stated in terms of decomposition theorems, which roughly claim that any function can be decomposed into a structured part that is a function of a constant number of polynomials, and a pseudo-random part whose Gowers norm is small. We first give an algorithm that tests whether the structured part of the input function has a specific form. Then we show that an affine-invariant property is testable with a constant number of queries if and only if it can be reduced to the problem of testing whether the structured part of the input function is close to one of a constant number of candidates.

    Boolean Functions,Sublinear-Time Algorithms
  • We propose two dynamic indexing schemes for shortest-path and distance queries on large time-evolving graphs, which are useful in a wide range of important applications such as real-time network-aware search and network evolution analysis. To the best of our knowledge, these methods are the first practical exact indexing methods to efficiently process distance queries and dynamic graph updates.

    We first propose a dynamic indexing scheme for queries on the last snapshot. The scalability and efficiency of its offline indexing algorithm and query algorithm are competitive even with previous static methods. Meanwhile, the method is dynamic, that is, it can incrementally update indices as the graph changes over time. Then, we further design another dynamic indexing scheme that can also answer two kinds of historical queries with regard to not only the latest snapshot but also previous snapshots.

    Through extensive experiments on real and synthetic evolving networks, we show the scalability and efficiency of our methods. Specifically, they can construct indices from large graphs with millions of vertices, answer queries in microseconds, and update indices in milliseconds.

    Network Algorithms
  • We consider approximation schemes for the maximum constraint satis- faction problems and the maximum assignment problems. Though they are NP-Hard in general, if the instance is “dense” or “locally dense”, then they are known to have approximation schemes that run in polynomial time or quasi-polynomial time. In this paper, we give a unified method of showing these approximation schemes based on the Sherali-Adams linear programming relaxation hierarchy. We also use our linear programming-based framework to show new algorithmic results on the optimization version of the hypergraph isomorphism problem.

    Approximation,Constraint Satisfaction Problems
  • Parameterized Testability. Kazuo Iwama and Yuichi Yoshida. ITCS 2014. [slides]

    This paper studies property testing for NP optimization problems with parameter \(k\) under the general graph model with an augmentation of random edge sampling capability. It is shown that a variety of such problems, including \(k\)-Vertex Cover, \(k\)-Feedback Vertex Set, \(k\)-Multicut, \(k\)-path-freeness and \(k\)-Dominating Set, are constant-time testable if \(k\) is constant. It should be noted that the first four problems are fixed parameter tractable (FPT) and it turns out that algorithmic techniques for their FPT algorithms (branch-and-bound search, color coding, etc.) are also useful for our testers. \(k\)-Dominating Set is \(W[2]\)-hard, but we can still test the property in constant time since the definition of \(\epsilon\)-farness makes the problem trivial for non-sparse graphs that are the source of hardness for the original optimization problem. We also consider \(k\)-Odd Cycle Transversal, which is another well-known FPT problem, but we only give a sublinear-time tester when \(k\) is a constant.

    Sublinear-Time Algorithms,Fixed Parameter Tractability,Graphs
  • Linear Time FPT Algorithms via Network Flow. Yoichi Iwata, Keigo Oka and Yuichi Yoshida. SODA 2014. [arXiv] [video (talk by Iwata)]

    In the area of parameterized complexity, to cope with NP-Hard problems, we introduce a parameter \(k\) besides the input size \(n\), and we aim to design algorithms (called FPT algorithms) that run in \(O(f(k)n^d)\) time for some function \(f(k)\) and constant \(d\). Though FPT algorithms have been successfully designed for many problems, typically they are not sufficiently fast because of huge \(f(k)\) and \(d\).

    In this paper, we give FPT algorithms with small \(f(k)\) and d for many important problems including Odd Cycle Transversal and Almost 2-SAT. More specifically, we can choose \(f(k)\) as a single exponential \((4^k)\) and d as one, that is, linear in the input size. To the best of our knowledge, our algorithms achieve linear time complexity for the first time for these problems.

    To obtain our algorithms for these problems, we consider a large class of integer programs, called BIP2. Then we show that, in linear time, we can reduce BIP2 to Vertex Cover Above LP preserving the parameter \(k\), and we can compute an optimal LP solution for Vertex Cover Above LP using network flow.

    Then, we perform an exhaustive search by fixing half-integral values in the optimal LP solution for Vertex Cover Above LP. A bottleneck here is that we need to recompute an LP optimal solution after branching. To address this issue, we exploit network flow to update the optimal LP solution in linear time.

    Fixed Parameter Tractability,Graphs

2013

  • Property Testing for Cyclic Groups and Beyond. Francois Le Gall and Yuichi Yoshida. Journal of Combinatorial Optimization 2013.

    This paper studies the problem of testing if an input \((\Gamma,\circ)\), where \(\Gamma\) is a finite set of unknown size and \(\circ\) is a binary operation over \(\Gamma\) given as an oracle, is close to a specified class of groups. Friedl et al. (Proc. of STOC, 2005) have constructed an efficient tester using \(\mathrm{poly}(\log|\Gamma|)\) queries for the class of abelian groups. We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: \(\Omega(|\Gamma|^{1/6})\) queries are necessary to test if the input is close to a cyclic group, and \(\Omega(|\Gamma|^c)\) queries for some constant c are necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed integer k≥1. We also show that knowledge of the size of the ground set \(\Gamma\) helps only for \(k=1\), in which case we construct an efficient tester using \(\mathrm{poly}(\log|\Gamma|)\) queries; for any other value \(k\geq 2\) the query complexity remains \(\Omega(|\Gamma|^c)\). All our upper and lower bounds hold for both the edit distance and the Hamming distance. These are, to the best of our knowledge, the first nontrivial lower bounds for such group-theoretic problems in the property testing model and, in particular, they imply the first exponential separations between the classical and quantum query complexities of testing closeness to classes of groups.

    Sublinear-Time Algorithms,Group Theory
  • Semi-strong Coloring of Intersecting Hypergraphs. Eric Blais, Amit Weinstein and Yuichi Yoshida. Combinatorics Probability and Computing 2013.

    For any \(c ≥ 2\), a \(c\)-strong colouring of the hypergraph \(G\) is an assignment of colours to the vertices of \(G\) such that, for every edge e of \(G\), the vertices of e are coloured by at least \(\min{c,|e|}\) distinct colours. The hypergraph \(G\) is \(t\)-intersecting if every two edges of \(G\) have at least \(t\) vertices in common.

    A natural variant of a question of Erdős and Lovász is: For fixed \(c \geq 2\) and \(t \geq 1\), what is the minimum number of colours that is sufficient to \(c\)-strong colour any \(t\)-intersecting hypergraphs? The purpose of this note is to describe some open problems related to this question.

    Misc
  • Answering reachability queries on directed graphs is ubiquitous in many applications involved with graph-shaped data as one of the most fundamental and important operations. However, it is still highly challenging to efficiently process them on large-scale graphs. Transitive-closure-based methods consume prohibitively large index space, and online-search-based methods answer queries too slowly. Labeling-based methods attain both small index size and query time, but previous indexing algorithms are not scalable at all for processing large graphs of the day.

    In this paper, we propose new labeling-based methods for reachability queries, referred to as pruned landmark labeling and pruned path labeling. They follow the frameworks of 2-hop cover and 3-hop cover, but their indexing algorithms are based on the recent notion of pruned labeling and improve the indexing time by several orders of magnitude, resulting in applicability to large graphs with tens of millions of vertices and edges. Our experimental results show that they attain remarkable trade-offs between fast query time, small index size and scalability, which previous methods have never been able to achieve. Furthermore, we also discuss the ingredients of the efficiency of our methods by a novel theoretical analysis based on the graph minor theory.

    Network Algorithms
  • Capturing sets of closely related vertices from large networks is an essential task in many applications such as social network analysis, bioinformatics, and web link research. Decomposing a graph into \(k\)-core components is a standard and efficient method for this task, but obtained clusters might not be well-connected. The idea of using maximal \(k\)-edge-connected subgraphs was recently proposed to address this issue. Although we can obtain better clusters with this idea, the state-of-the-art method is not efficient enough to process large networks with millions of vertices.

    In this paper, we propose a new method to decompose a graph into maximal \(k\)-edge-connected components, based on random contraction of edges. Our method is simple to implement but improves performance drastically. We experimentally show that our method can successfully decompose large networks and it is thousands times faster than the previous method. Also, we theoretically explain why our method is efficient in practice. To see the importance of maximal \(k\)-edge-connected subgraphs, we also conduct experiments using real-world networks to show that many \(k\)-core components have small edge-connectivity and they can be decomposed into a lot of maximal \(k\)-edge-connected subgraphs.

    Network Algorithms
  • Mining for Analogous Tuples from an Entity-Relation Graph. Danushka Bollegala, Mitsuru Kushimoto, Yuichi Yoshida and Ken-ichi Kawarabayashi. IJCAI 2013.

    The ability to recognize analogies is an important factor that is closely related to human intelligence. Verbal analogies have been used for evaluating both examinees at university entrance exams as well as algorithms for measuring relational similarity. However, relational similarity measures proposed so far are confined to measuring the similarity between pairs of words. Unfortunately, such pairwise approaches ignore the rich relational structure that exists in real-world knowledge bases containing millions of entities and semantic relations. We propose a method to efficiently identify analogous entity tuples from a given entity-relation graph. First, we present an efficient approach for extracting potential analogous tuples from a given entity-relation graph. Second, to measure the structural similarity between two tuples, we propose two types of kernel functions: vertex-feature kernels, and edge-feature kernels. Moreover, we combine those kernels to construct composite kernels that simultaneously consider both vertex and edge features. Experimental results show that our proposed method accurately identifies analogous tuples and significantly outperforms a state-of-the-art pairwise relational similarity measure, extended to tuples.

    Natural Language Processing
  • Constant-Time Approximation Algorithms for the Optimum Branching Problem on Sparse Graphs. Mitsuru Kusumoto, Yuichi Yoshida and Hiro Ito. International Journal of Networking and Computing 2013.

    We propose a constant-time algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edge-weighted digraph \(G\) of average degree \(d\) whose weights are real values in \([0, 1]\) is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in \(G\) with an absolute error of at most εn with query complexity \(O(d/\epsilon^3)\), where \(n\) is the number of vertices. We also show a lower bound of \(\Omega(d/\epsilon^2)\). Additionally, our algorithm can be modified to run with query complexity \(O(1/\epsilon^4)\) for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with \(d = \Omega(n)\) edges. In contrast, we show that it requires \(\Omega(n)\) queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.

    Sublinear-Time Algorithms,Graphs
  • Testing Linear-Invariant Function Isomorphism. Karl Wimmer and Yuichi Yoshida. ICALP 2013.

    A function \(f:\mathbb{F}^n_2 \to \{-1,1\}\) is called linear-isomorphic to \(g\) if \(f = g \circ A\) for some non-singular matrix \(A\). In the \(g\)-isomorphism problem, we want a randomized algorithm that distinguishes whether an input function \(f\) is linear-isomorphic to \(g\) or far from being so.

    We show that the query complexity to test \(g\)-isomorphism is essentially determined by the spectral norm of \(g\). That is, if \(g\) is close to having spectral norm \(s\), then we can test g-isomorphism with \(\mathrm{poly}(s)\) queries, and if \(g\) is far from having spectral norm \(s\), then we cannot test \(g\)-isomorphism with \(o(\log s)\) queries. The upper bound is almost tight since there is indeed a function \(g\) close to having spectral norm s whereas testing \(g\)-isomorphism requires \(\Omega(s)\) queries. As far as we know, our result is the first characterization of this type for functions. Our upper bound is essentially the Kushilevitz-Mansour learning algorithm, modified for use in the implicit setting.

    Exploiting our upper bound, we show that any property is testable if it can be well-approximated by functions with small spectral norm. We also extend our algorithm to the setting where \(A\) is allowed to be singular.

    Boolean Functions,Sublinear-Time Algorithms
  • An Algebraic Characterization of Testable CSPs. Arnab Bhattacharyya and Yuichi Yoshida. ICALP 2013. [slides]

    Given an instance \(\mathcal{I}\) of a CSP, a tester for \(\mathcal{I}\) distinguishes assignments satisfying \(\mathcal{I}\) from those which are far from any assignment satisfying \(\mathcal{I}\). The efficiency of a tester is measured by its query complexity, the number of variable assignments queried by the algorithm. In this paper, we characterize the hardness of testing Boolean CSPs in terms of the algebra generated by the relations used to form constraints. In terms of computational complexity, we show that if a non-trivial Boolean CSP is sublinear-query testable (resp., not sublinear-query testable), then the CSP is in NL (resp., P-complete, ⊕L-complete or NL-complete) and that if a sublinear-query testable Boolean CSP is constant-query testable (resp., not constant-query testable), then counting the number of solutions of the CSP is in P (resp., #P-complete).

    Also, we conjecture that a CSP instance is testable in sublinear time if its Gaifman graph has bounded treewidth. We confirm the conjecture when a near-unanimity operation is a polymorphism of the CSP.

    Constraint Satisfaction Problems,Sublinear-Time Algorithms
  • We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

    Network Algorithms
  • Testing a property \(P\) of graphs in the bounded-degree model deals with the following problem: given a graph \(G\) of bounded degree \(d\), we should distinguish (with probability \(2/3\), say) between the case that \(G\) satisfies \(P\) and the case that one should add/remove at least \(\epsilon dn\) edges of \(G\) to make it satisfy \(P\). In sharp contrast to property testing of dense graphs, which is relatively well understood, only few properties are known to be testable with a constant number of queries in the bounded-degree model. In particular, no global monotone (i.e,closed under edge deletions) property that expander graphs can satisfy has been shown to be testable in constant time so far.

    In this paper, we identify for the first time a natural family of global monotone property that expander graphs can satisfy and can be efficiently tested in the bounded degree model. Specifically, we show that, for any integer \(t \geq 1\), \(K_t\)-subdivision-freeness is testable with a constant number of queries in the bounded-degree model. This property was not previously known to be testable even with \(o(n)\) queries. Note that an expander graph with all degree less than \(t-1\) does not have a \(K_t\)-subdivision.

    The proof is based on a novel combination of some results that develop the framework of partitioning oracles, together with structural graph theory results that develop the seminal graph minor theory by Robertson and Seymour. As far as we aware, this is the first result that bridges property testing and structural graph theory. Although we know a rough structure for graphs without \(H\)-minors from the famous graph minor theory by Robertson and Seymour, there is no corresponding structure theorem for graphs without \(H\)-subdivisions so far, even \(K_5\)-subdivision-free graphs. Therefore, subdivisions and minors are very different in a graph structural sense.

    Sublinear-Time Algorithms,Graphs
  • We study the constraint satisfaction problem over the point algebra. In this problem, an instance consists of a set of variables and a set of binary constraints of forms \((x < y)\), \((x \leq y)\), \((x \neq y)\) or \((x = y)\). Then, the objective is to assign integers to variables so as to satisfy as many constraints as possible.This problem contains many important problems such as Correlation Clustering, Maximum Acyclic Subgraph, and Feedback Arc Set. We first give an exact algorithm that runs in \(O^*(3^{\frac{\log 5}{\log 6}n})\) time, which improves the previous best \(O^*(3^n)\) obtained by a standard dynamic programming. Our algorithm combines the dynamic programming with the split-and-list technique. The split-and-list technique involves matrix products and we make use of sparsity of matrices to speed up the computation. As for approximation, we give a \(0.4586\)-approximation algorithm when the objective is maximizing the number of satisfied constraints, and give an \(O(\log n \log \log n)\)-approximation algorithm when the objective is minimizing the number of unsatisfied constraints.

    Approximation,Constraint Satisfaction Problems,Exact Algorithms

2012

  • We propose a constant-time algorithm for approximating the weight of the maximum weight branching in the general graph model. A directed graph is called a branching if it is acyclic and each vertex has at most one incoming edge. An edge-weighted digraph \(G\) of average degree \(d\) whose weights are real values in \([0, 1]\) is given as an oracle access, and we are allowed to ask degrees and incoming edges for any vertex through the oracle. Then, with high probability, our algorithm estimates the weight of the maximum weight branching in \(G\) with an absolute error of at most εn with query complexity \(O(d/\epsilon^3)\), where \(n\) is the number of vertices. We also show a lower bound of \(\Omega(d/\epsilon^2)\). Additionally, our algorithm can be modified to run with query complexity \(O(1/\epsilon^4)\) for unweighted digraphs, i.e., it runs in time independent of the input size even for digraphs with \(d = \Omega(n)\) edges. In contrast, we show that it requires \(\Omega(n)\) queries to approximate the weight of the minimum (or maximum) spanning arborescence in a weighted digraph.

    Sublinear-Time Algorithms,Graphs
  • Given a Boolean function \(f\), the \(f\)-isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to \(f\) up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions \(f\) we can test \(f\)-isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functions -- functions invariant to the reordering of all but a constant number of their variables -- are efficiently isomorphism-testable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphism-testable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearly-optimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testing-by-implicit-learning approach to complete the proof that partially symmetric functions are efficiently isomorphism-testable.

    Boolean Functions,Sublinear-Time Algorithms
  • Given an undirected graph \(G = (V,E)\) and positive edge weights \(\{w_e\}_{e \in E}\), a linear arrangement is a permutation \(\pi: V \to [n]\). The value of the arrangement is \(\mathrm{val}(G,\pi):=\frac{1}{n}\sum_{e=\{u,v\}}w_e|\pi(u)-\pi(v)|\). In the minimum linear arrangement problem (MLA), the goal is to find a linear arrangement \(\pi^*\) that achieves \(\mathrm{val}(G,\pi^*) = \mathrm{MLA}(G): =  \min_{\pi}\mathrm{val}(G,\pi)\).

    In this paper, we show that for any \(\epsilon > 0\) and positive integer \(r\), there is an \(O(nr/\epsilon)\)-time randomized algorithm which, given a graph \(G\), returns a permutation \(\pi\) such that \[ \mathrm{val}(G,\pi) \leq \left(1 + \frac{2}{(1-\epsilon)\lambda_{r+1}(\mathcal{L})}\right)\mathrm{MLA}(G) + O\left(\frac{\log n}{\sqrt{n}}\sum_{e \in E}w_e\right) \] with high probability. Here \(\mathcal{L}\) is the normalized Laplacian of \(G\) and \(\lambda_r(\mathcal{L})\) is the \(r\)-th eigenvalue of \(\mathcal{L}\). Our algorithm gives a constant factor approximation for regular graphs that are weak expanders.

    Approximation,Graphs
  • Constant-Time Algorithms for Sparsity Matroids. Hiro Ito, Shin-ichi Tanigawa and Yuichi Yoshida. ICALP 2012. [slides]

    A graph \(G = (V, E)\) is called \((k, \ell)\)-sparse if \(|F| ≤ k|V(F)| -\ell\) for any \(F \subseteq E\) with \(F \neq \emptyset\). Here, \(V(F)\) denotes the set of vertices incident to \(F\). A graph \(G = (V,E)\) is called \((k,\ell)\)-full if \(G\) contains a \((k,\ell)\)-sparse subgraph with \(|V|\) vertices and \(k|V| - \ell\) edges. The family of edge sets of \((k,\ell)\)-sparse subgraphs forms a family of independent sets of a matroid on \(E\), known as the sparsity matroid of \(G\). In this paper, we give a constant-time algorithm that approximates the rank of the sparsity matroid associated with a degree-bounded undirected graph. This algorithm leads to a constant-time tester for \((k,\ell)\)-fullness in the bounded-degree model, (i.e., we can decide with high probability whether the input graph satisfies a property or far from it). Depending on the values of \(k\) and \(\ell\), our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed in a unified manner.

    Based on this result, we also propose a constant-time tester for \((k,\ell)\)-edge-connected-orientability in the bounded-degree model, where an undirected graph \(G\) is called \((k,\ell)\)-edge-connected-orientable if there exists an orientation \(\vec{G}\) of \(G\) with a vertex \(r\in V\) such that \(\vec{G}\) contains \(k\) arc-disjoint dipaths from r to each vertex \(v \in V\) and \(\ell\) arc-disjoint dipaths from each vertex \(v \in V\) to \(r\).

    A tester is called a one-sided error tester for \(P\) if it always accepts a graph satisfying \(P\). We show, for any \(k \geq 2\) and (proper) \(\ell \geq 0\), every one-sided error tester for \((k,\ell)\)-fullness and \((k,\ell)\)-edge-connected-orientability requires \(\Omega(n)\) queries.

    Sublinear-Time Algorithms,Submodular Functions,Graphs
  • Testing List H-Homomorphisms. Yuichi Yoshida. CCC 2012. [slides]

    Let \(H\) be an undirected graph. In the List \(H\)-Homomorphism Problem, given an undirected graph \(G\) with a list constraint \(L(v) \subseteq V(H)\) for each variable \(v∈V(G)\), the objective is to find a list \(H\)-homomorphism \(f:V(G)\to V(H)\), that is, \(f(v)\in L(v)\) for every \(v\in V(G)\) and \((f(u),f(v)) \in E(H)\) whenever \((u,v)\in E(G)\). We consider the following problem: given a map \(f:V(G) \to V(H)\) as an oracle access, the objective is to decide with high probability whether \(f\) is a list \(H\)-homomorphism or \textit{far} from any list \(H\)-homomorphisms. The efficiency of an algorithm is measured by the number of accesses to \(f\). In this paper, we classify graphs \(H\) with respect to the query complexity for testing list \(H\)-homomorphisms and show the following trichotomy holds: (i) List \(H\)-homomorphisms are testable with a constant number of queries if and only if \(H\) is a reflexive complete graph or an irreflexive complete bipartite graph. (ii) List \(H\)-homomorphisms are testable with a sublinear number of queries if and only if \(H\) is a bi-arc graph. (iii) Testing list \(H\)-homomorphisms requires a linear number of queries if \(H\) is not a bi-arc graph.

    Constraint Satisfaction Problems,Sublinear-Time Algorithms
  • Constant-Time Approximation Algorithms for the Knapsack Problem. Hiro Ito, Susumu Kiyoshima and Yuichi Yoshida. TAMC 2012.

    In this paper, we give a constant-time approximation algorithm for the knapsack problem. Using weighted sampling, with which we can sample items with probability proportional to their profits, our algorithm runs with query complexity \(O(\epsilon^{- 4} \log \epsilon^{-1})\), and it approximates the optimal profit with probability at least \(2/3\) up to error at most an \(\epsilon\)-fraction of the total profit. For the subset sum problem, which is a special case of the knapsack problem, we can improve the query complexity to \(O(\epsilon^{-1} \log \epsilon^{-1})\).

    Sublinear-Time Algorithms
  • Algorithms and Complexity of Generalized River Crossing Problems. Stefan Langerman, Hiro Ito and Yuichi Yoshida. FUN 2012.

    Three men, each with a sister, must cross a river using a boat which can carry only two people, so that a woman whose brother is not present is never left in the company of another man. This is a very famous problem appeared in Latin book “Problems to Sharpen the Young,” one of the earliest collections on recreational mathematics. This paper considers a generalization of such “River-Crossing Problems.” It shows that the problem is NP-hard if the boat size is three, and a large class of sub-problems can be solved in polynomial time if the boat size is two. It’s also conjectured that determining whether a river crossing problem has a solution without bounding the number of transportations, can be solved in polynomial time even when the size of the boat is large.

    Misc
  • An Almost Optimal Algorithm for Winkler’s Sorting Pairs in Bins. Hiro Ito, Junichi Teruyama and Yuichi Yoshida. Progress in Informatics 2012.

    We investigate the following sorting problem: We are given n bins with two balls in each bin. Balls in the \(i\)th bin are numbered \(n + 1 - i\). We can swap two balls from adjacent bins. How many number of swaps are needed in order to sort balls, i.e., move every ball to the bin with the same number. For this problem the best known solution requires almost \(\frac{4n^2}{5}\) swaps. In this paper, we show an algorithm which solves this problem using less than \(\frac{2n^2}{3}\) swaps. Since it is known that the lower bound of the number of swapsi s \(\lceil {2n \choose 2}/3 \rceil = \lceil \frac{2n^2}{3} - \frac{n}{3}\rceil\), our resultis is almost tight. Furthermore, we show that for \(n = 2^m + 1 (m \geq 0)\) the algorithm is optimal.

    Misc
  • Testing the (s, t)-Disconnectivity of Graphs and Digraphs. Yusuke Kobayashi and Yuichi Yoshida. Theoretical Computer Science 2012.

    Property testing is concerned with constant-time algorithms for deciding whether a given object satisfies a predetermined property or is far from satisfying it. In this paper, we consider testing properties related to the connectivity of two vertices in sparse graphs.

    We present one-sided error testers for \((s,t)\)-disconnectivity with query complexity \(2^{O(1/\epsilon}\) for digraphs and \(O(1/\epsilon^2)\) for graphs, where \(\epsilon\) is an error parameter. Furthermore, we show that these algorithms are the best possible in view of query complexity, i.e., we give matching lower bounds for two-sided error testers for both cases.

    We also give a constant-time algorithm for testing the \((s,t)\)-disconnectivity of a directed bounded-degree hypergraph, which can be used to test the satisfiability of Horn SAT.

    Sublinear-Time Algorithms,Graphs
  • Property Testing on k-Vertex Connectivity of Graphs. Yuichi Yoshida and Hiro Ito. Algorithmica, 2012.

    We present an algorithm for testing the \(k\)-vertex-connectivity of graphs with the given maximum degree. The time complexity of the algorithm is independent of the number of vertices and edges of graphs. Fixed degree bound \(d\), a graph \(G\) with \(n\) vertices and a maximum degree at most \(d\) is called \(\epsilon\)-far from \(k\)-vertex-connectivity when at least \(\frac{\epsilon dn}{2}\) edges must be added to or removed from \(G\) to obtain a \(k\)-vertex-connected graph with a maximum degree at most \(d\). The algorithm always accepts every graph that is \(k\)-vertex-connected and rejects every graph that is \(\epsilon\)-far from \(k\)-vertex-connectivity with a probability of at least \(2/3\). The algorithm runs in \(O(d(\frac{c}{\epsilon d})^k \log\frac{1}{\epsilon d})\) time (\(c>1\) is a constant) for \((k−1)\)-vertex-connected graphs, and in \(O(d(\frac{ck}{\epsilon d})^k \log\frac{k}{\epsilon d})\) time (\(c>1\) is a constant) for general graphs. It is the first constant-time \(k\)-vertex-connectivity testing algorithm for general \(k \geq 4\).

    Sublinear-Time Algorithms,Graphs
  • Linear programming, width-1 CSPs, and robust satisfaction. Gabor Kun, Ryan O’Donnell, Suguru Tamaki, Yuichi Yoshida and Yuan Zhou. ITCS 2012.

    We say that an algorithm robustly decides a constraint satisfaction problem \(\Pi\) if it distinguishes at-least-\((1-\epsilon)\)-satisfiable instances from less-than-\((1 - r(\epsilon))\)-satisfiable instances for some function \(r(\epsilon)\) with \(r(\epsilon) \to 0\) as \(\epsilon \to 0\). In this paper we show that the canonical linear programming relaxation robustly decides \(\Pi\) if and only if \(\Pi\) has "width 1" (in the sense of Feder and Vardi).

    Approximation,Constraint Satisfaction Problems
  • Studies on Constant-Time Algorithms for Bounded-Degree Graphs and Constraint Satisfaction Problems. Yuichi Yoshida. Ph.D. thesis, Kyoto University, 2012.

    We are concerned with constant-time algorithms, which aim for solving decision problems or optimization problems in time independent of sizes of inputs. To obtain such efficient algorithms, we relax the concept of decision and optimization as follows. An algorithm is said to test a property if, with high probability, it decides whether an input satisfies the property or we must modify an \(\epsilon\)-fraction of the input to make it satisfy the property. An algorithm is said to \((\alpha, \epsilon n)\)-approximate an optimization problem if, with high probability, it outputs an \((\alpha, \epsilon n)\)-approximation to the optimal value. Here, a value \(x\) is called an \((\alpha, \epsilon n)\)-approximation to \(x^*\) if \(\alpha x^* - \epsilon n \leq x \leq x^*\) where \(n\) is the size of an input. By relaxing this way, various problems become solvable in constant time.

    Constant-time algorithms were first studied for algebraic properties of functions for applications to computational complexity. However, because of their theoretical richness and potential to practical applications, they have been studied for many different areas in the last decade. In this thesis, we concentrate on problems on bounded-degree graphs and constraint satisfaction problems (CSPs). In this setting, each vertex (or variable) is incident to a constant number of edges (or constraints). Bounded-degree instances are practically important since real-world problems often give rise to such instances, and they are also theoretically important since we become able to study various fundamental problems that are trivial in general instances.

    We first give a constant-time \((1, \epsilon n)\)-approximation algorithm for the maximum matching problem. Compared to previous works, our algorithm runs exponentially faster with respect to \(1/\epsilon\). Using the algorithm as a building block, we show various related problems are also testable in constant time. A graph \(G = (V , E )\) is called \((k,l)\)-sparse if \(|F| \leq k|V(F)|-l\) for any non-empty \(F \subseteq E\). Here, \(V(F)\) denotes the set of vertices incident to \(F\). We give a constant-time algorithm that tests a graph \(G = (V, E)\) has a \((k, l)\)-sparse subgraph with \(|V|\) vertices and \(k|V | − l\) edges. Depending on values of \(k\) and \(l\), our algorithm can test various properties of graphs such as connectivity, rigidity, and how many spanning trees can be packed. Based on this result, we also give a constant-time testing algorithm for (k, l)-edge-connected- orientability, where a graph \(G\) is called \((k, l)\)-edge-connected-orientable if there exists an orientation \(\vec{G}\) of \(G\) with a vertex \(r \in V\) such that \(\vec{G}\) contains \(k\) arc-disjoint dipaths from \(r\) to each vertex \(v \in V\) and \(l\) arc-disjoint dipaths from each vertex \(v \in V\) to \(r\).

    Secondly, we consider constant-time algorithms for CSPs. Let \(\Lambda\) be a set of relations. Then, an instance of \(\mathrm{CSP}(\Lambda)\) is a pair of variables and constraints over the variables such that each constraint belongs to \(\Lambda\). In the problem, called \(\mathrm{MaxCSP}(\Lambda)\), given an instance of \(\mathrm{CSP}(\Lambda)\), we want to compute an assignment that maximizes the number of satisfied constraints. Here, we give the “best” constant-time approxi- mation algorithm for every \(\mathrm{MaxCSP}(\Lambda)\). Specifically, for each \(\Lambda\), there exists some constant \(\alpha_\Lambda\) with the following properties. (1) for any \(\epsilon > 0\), there exists a constant-time \((\alpha_\Lambda - \epsilon, \epsilon n)\)-approximation algorithm for \(\mathrm{MaxCSP}(\Lambda)\). (2) for any \(\epsilon > 0\), there exists \(\delta > 0\) such that no constant-time \((\alpha_\Lambda + \epsilon, \delta n)\)-approximation algorithm for \(\mathrm{MaxCSP}(\Lambda)\) exists. Here, \(\alpha_\Lambda\) coincides with the integrality gap of a standard LP relaxation for \(\mathrm{MaxCSP}(\Lambda)\).

    We consider testing satisfiability of CSP instances. Using the result above, we can derive the characterization of the constant-time testability of \(\mathrm{CSP}(\Lambda)\). Let \(\mathrm{opt}(I)\) and \(\mathrm{lp}(I)\) denote the optimal value and the LP value for an instance \(I\), respectively. Then, we have the followings. (i) if any instance of \(\mathrm{CSP}(\Lambda)\) with \(lp(I) \geq 1−\epsilon\) satisfies \(\mathrm{opt}(I) \geq 1 − r(\epsilon)\) where \(\lim_{\epsilon \to 0} r(\epsilon) = 0\), then \(CSP(\Lambda)\) is testable in constant time. (ii) if there is an instance \(I\) of \(\mathrm{CSP}(\Lambda)\) with \(\mathrm{lp}(I) = 1\) and \(\mathrm{opt}(I) < 1\), then \(\mathrm{CSP}(\Lambda)\) is not testable in constant time.

    To achieve a combinatorial characterization of the testability of \(\mathrm{CSP}(\Lambda)\), we show a close connection between constant-time testability and the notion of width 1. We say \(\mathrm{CSP}(\Lambda)\) has width 1 if satisfiability of instances of \(\mathrm{CSP}(\Lambda)\) can be decided by a certain simple propagation algorithm. Then, we prove that any CSP of width 1 satisfies (i), which means that any such CSP is testable in constant time.

    In a subsequent work (Kun et al. ITCS 2012), which includes the part of the present thesis, we have shown that any CSP of width more than 1 satisfies (ii). Thus, a CSP is constant-time testable if and only if it has width 1.

    Sublinear-Time Algorithms
  • Improved Constant-Time Approximation Algorithms for Maximum Matchings and Other Optimization Problems. Yuichi Yoshida, Masaki Yamamoto and Hiro Ito. Siam Journal on Computing 2012.

    We study constant-time approximation algorithms for bounded-degree graphs, which run in time independent of the number of vertices \(n\). We present an algorithm that decides whether a vertex is contained in a some fixed maximal independent set with expected query complexity \(O(d^2)\), where \(d\) is the degree bound. Using this algorithm, we show constant-time approximation algorithms with certain multiplicative error and additive error \(\epsilon n\) for many other problems, e.g., the maximum matching problem, the minimum vertex cover problem, and the minimum set cover problem, that run exponentially faster than existing algorithms with respect to \(d\) and \(\frac{1}{\epsilon}\). Our approximation algorithm for the maximum matching problem can be transformed to a two-sided error tester for the property of having a perfect matching. On the contrary, we show that every one-sided error tester for the property requires at least \(\Omega(n)\) queries.

    Sublinear-Time Algorithms
  • On the Distance Between Non-isomorphic Groups. Gábor Ivanyos, François Le Gall and Yuichi Yoshida. European Journal of Combinatorics 2012. [arXiv]

    A result of Ben-Or, Coppersmith, Luby and Rubinfeld on testing whether a map be two groups is close to a homomorphism implies a tight lower bound on the distance between the multiplication tables of two non-isomorphic groups.

    Group Theory

2011

  • Algorithms for Finding a Maximum Non-k-Linked Graph. Yusuke Kobayashi and Yuichi Yoshida. ESA 2011.

    A graph with at least \(2k\) vertices is said to be k-linked if for any ordered \(k\)-tuples \((s_1, \ldots,s_k)\) and \((t_1, \ldots, t_k)\) of \(2k\) distinct vertices, there exist pairwise vertex-disjoint paths \(P_1,\ldots, P_k\) such that \(P_i\) connects si and ti for \(i = 1, \ldots, k\). For a given graph \(G\), we consider the problem of finding a maximum induced subgraph of \(G\) that is not \(k\)-linked. This problem is a common generalization of computing the vertex-connectivity and testing the \(k\)-linkedness of \(G\), and it is closely related to the concept of \(H\)-linkedness. In this paper, we give the first polynomial-time algorithm for the case of \(k = 2\), whereas a similar problem that finds a maximum induced subgraph without 2-vertex-disjoint paths connecting fixed terminal pairs is NP-hard. For the case of general \(k\), we give an \((8k − 2)\)-additive approximation algorithm. We also investigate the computational complexities of the edge-disjoint case and the directed case.

    Graphs
  • Property Testing for Cyclic Groups and Beyond. Francois Le Gall and Yuichi Yoshida. COCOON 2011.

    This paper studies the problem of testing if an input (Γ,¿), where Γ is a finite set of unknown size and ¿ is a binary operation over Γ given as an oracle, is close to a specified class of groups. Friedl et al. (Proc. of STOC, 2005) have constructed an efficient tester using poly(log|Γ|) queries for the class of abelian groups. We focus in this paper on subclasses of abelian groups, and show that these problems are much harder: Ω(|Γ|1/6) queries are necessary to test if the input is close to a cyclic group, and Ω(|Γ| c ) queries for some constant c are necessary to test more generally if the input is close to an abelian group generated by k elements, for any fixed integer k¿1. We also show that knowledge of the size of the ground set Γ helps only for k=1, in which case we construct an efficient tester using poly(log|Γ|) queries; for any other value k¿2 the query complexity remains Ω(|Γ| c ). All our upper and lower bounds hold for both the edit distance and the Hamming distance. These are, to the best of our knowledge, the first nontrivial lower bounds for such group-theoretic problems in the property testing model and, in particular, they imply the first exponential separations between the classical and quantum query complexities of testing closeness to classes of groups.

    Sublinear-Time Algorithms,Group Theory
  • In this paper, we consider lower bounds on the query complexity for testing CSPs in the bounded-degree model. We mainly consider Boolean CSPs allowing literals. First, for any "symmetric" predicate P : {0, 1}k → {0,1} except EQU where k ≥ 3, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances (|P-1 (0)|/2k - ϵ)-far from satisfiability requires Ω(n1/2+δ) queries where n is the number of variables and δ >; 0 is a constant that depends on P and e. This breaks a natural lower bound Ω(n1/2), which is obtained by the birthday paradox. We also show that every one-sided error tester requires Ω(n) queries for such P. These results are hereditary in the sense that the same results hold for any predicate Q such that P-1 (1) ⊆ Q-1(1). For EQU, we give a one-sided error tester whose query complexity is O̅(n1/2). Also, for 2-XOR (or, equivalently E2LIN2), we show an Ω(n1/2+δ) lower bound for distinguishing instances between e-close to and (1/2 -ϵ)-far from satisfiability. Next, for the general k-CSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances (1 - 2k/2k - ϵ)-far from satisfiability requires Ω(n) queries. The matching NP-hardness is not known, even assuming the Unique Games Conjecture or the d-to-1 Conjecture. As a corollary, for Maximum Independent Set on graphs with n vertices and a degree bound d, we show that every approximation algorithm within a factor d/poly log d and an additive error of ϵn requires Ω(n) queries. Previously, only super-constant lower bounds were known.

    Sublinear-Time Algorithms,Constraint Satisfaction Problems
  • Raghavendra (STOC 2008) gave an elegant and surprising result: if Khot's Unique Games Conjecture (STOC 2002) is true, then for every constraint satisfaction problem (CSP), the best approximation ratio is attained by a certain simple semidefinite programming and a rounding scheme for it.

    In this paper, we show that similar results hold for constant-time approximation algorithms in the bounded-degree model. Specifically, we present the following: (i) For every CSP, we construct an oracle that serves an access, in constant time, to a nearly optimal solution to a basic LP relaxation of the CSP. (ii) Using the oracle, we give a constant-time rounding scheme that achieves an approximation ratio coincident with the integrality gap of the basic LP. (iii) Finally, we give a generic conversion from integrality gaps of basic LPs to hardness results. All of those results are unconditional. Therefore, for every bounded-degree CSP, we give the best constant-time approximation algorithm among all.

    A CSP instance is called ε-far from satisfiability if we must remove at least an ε-fraction of constraints to make it satisfiable. A CSP is called testable if there is a constant-time algorithm that distinguishes satisfiable instances from ε-far instances with probability at least 2/3. Using the results above, we also derive, under a technical assumption, an equivalent condition under which a CSP is testable in the bounded-degree model.

    Sublinear-Time Algorithms,Constraint Satisfaction Problems

2010

  • A Query Efficient Non-Adaptive Long Code Test with Perfect Completeness. Suguru Tamaki and Yuichi Yoshida. RANDOM 2010.

    Long Code testing is a fundamental problem in the area of property testing and hardness of approximation. In this paper, we study how small the soundness \(s\) of the Long Code test with perfect completeness can be by using non-adaptive \(q\) queries. We show that \(s = (2q + 3)/2^q\) is possible, where \(q\) is of the form math formula for \(2^{k}-1\) any integer \(k > 2\).

    Approximation,Sublinear-Time Algorithms,Boolean Functions
  • Testing Outerplanarity of Bounded Degree Graphs. Yuichi Yoshida and Hiro Ito. RANDOM 2010.

    We present an efficient algorithm for testing outerplanarity of graphs in the bounded-degree model. In this model, given a graph \(G\) with degree bound \(d\), we should distinguish with high probability the case that \(G\) is outerplanar from the case that modifying at least an \(\epsilon\)-fraction of the edge set is necessary to make \(G\) outerplanar. Our algorithm runs in time polynomial in \(d\) and \(1/\epsilon\) only. To achieve the time complexity, we exploit the tree-like structure inherent to an outerplanar graph using the microtree/macrotree decomposition of a tree. As a corollary, we show that the property of being a cactus is testable in time polynomial in \(d\) and \(1/\epsilon\).

    Sublinear-Time Algorithms,Graphs
  • Conjunctive Filter: Breaking the Entropy Barrier. Daisuke Okanohara and Yuichi Yoshida. ALENEX 2010.

    We consider a problem for storing a map that associates a key with a set of values. To store \(n\) values from the universe of size \(m\), it requires log 2(m/n) bits of space, which can be approximated as \((1.44 + n) \log 2 m/n\) bits when n m. If we allow ε fraction of errors in outputs, we can store it with roughly n log21/ε bits, which matches the entropy bound. Bloom filter is a wellknown example for such data structures. Our objective is to break this entropy bound and construct more space-efficient data structures. In this paper, we propose a novel data structure called a conjunctive filter, which supports conjunctive queries on k distinct keys for fixed k. Although a conjunctive filter cannot return the set of values itself associated with a queried key, it can perform conjunctive queries with O(1/√m) fraction of errors. Also, the consumed space is n/k log2m bits and it is significantly smaller than the entropy bound n/2 log2m when k ≥ 3. We will show that many problems can be solved by using a conjunctive filter such as full-text search and database join queries. Also, we conducted experiments using a real-world data set, and show that a conjunctive filter answers conjunctive queries almost correctly using about 1/2 ∼ 1/4 space as the entropy bound.

    Misc
  • In this paper, we study lower bounds on the query complexity of testing algorithms for various problems. Given an oracle that returns information of an input object, a testing algorithm distinguishes the case that the object has a given property \(P\) from the case that it has a large distance to having \(P\) with probability at least \(2/3\). The query complexity of an algorithm is measured by the number of accesses to the oracle. We introduce two reductions that preserve the query complexity. One is derived from the gap-preserving local reduction and the other is from the L-reduction. By using the former reduction, we show linear lower bounds on the query complexity for testing basic NP-complete properties, i.e., 3-edge-colorability, directed Hamiltonian path/cycle, undirected Hamiltonian path/cycle, 3-dimensional matching and NP-complete generalized satisfiability problems. Also, using the second reduction, we show a linear lower bound on the query complexity of approximating the size of the maximum 3-dimensional matching.

    Sublinear-Time Algorithms,Graphs
  • Testing k-Edge-Connectivity of Digraphs. Yuichi Yoshida and Hiro Ito. Journal of System Science and Complexity 2010.

    This paper presents an algorithm that tests whether a given degree-bounded digraph is k-edge-connected or \(\epsilon\)-far from \(k\)-edge-connectivity. This is the first testing algorithm for \(k\)-edgeconnectivity of digraphs whose running time is independent of the number of vertices and edges. A digraph of \(n\) vertices with degree bound \(d\) is \(\epsilon\)-far from \(k\)-edge-connectivity if at least \(\epsilon dn\) edges have to be added or deleted to make the digraph \(k\)-edge-connected, preserving the degree bound. Given a constant error parameter \(\epsilon\) and a degree bound \(d\), our algorithm always accepts all \(k\)-edge-connected digraphs and rejects all digraphs that is \(\epsilon\)-far from \(k\)-edge-connectivity with probability at least \(2/3\). It runs in \(O\left(d\left(\frac{c}{εd}\right)^k \log\frac{1}{\epsilon d}\right)\) (\(c > 1\) is a constant) time when input digraphs are restricted to be \((k-1)\)-edge connected and runs in \(O\left(d\left(\frac{ck}{\epsilon d}\right)^k\log\frac{k}{\epsilon d}\right)\) (\(c > 1\) is a constant) time for general digraphs.

    Sublinear-Time Algorithms,Graphs

2009

  • An Improved Constant-Time Approximation Algorithm for Maximum Matchings. Yuichi Yoshida, Masaki Yamamoto and Hiro Ito. STOC 2009

    This paper studies approximation algorithms for problems on degree-bounded graphs. Let \(n\) and \(d\) be the number of vertices and the degree bound, respectively. This paper presents an algorithm to approximate the size of some maximal independent set with additive error \(\epsilon n\) whose running time is \(O(d^2)\). Using this algorithm, it also shows that there are approximation algorithms for many other problems, e.g., the maximum matching problem, the minimum vertex cover problem, and the minimum set cover problem, that run exponentially faster than existing algorithms with respect to d and 1/ε. Its approximation algorithm for the maximum matching problem can be transformed to a testing algorithm for the property of having a perfect matching with two-sided error. On the contrary, it also shows that every one-sided error tester for the property requires at least \(\Omega(n)\) queries.

    Sublinear-Time Algorithms,Graphs
  • Fast Calculation of Levenshtein Distance on the Cell Processor. Shun Sakuraba and Yuichi Yoshida. SACSIS 2009. (in Japanese)

    Cell Challenge 2009の規定課題において、Cell Broadband Engineを用いて文字列の編集距 離を計算するプログラムを作成するという課題が出題された。本稿では我々が規定課題において考案した最適化技法について報告する。基本となるアルゴリズムはビット並列化法と呼ばれるものであり、実行委員会によるサンプルプログラムと比べて約 315 倍の高速化を達成した。

    Misc

2008

  • Property Testing on k-Vertex Connectivity of Graphs. Yuichi Yoshida and Hiro Ito. ICALP 2008.

    Wepresentanalgorithmfortestingthek-vertex-connectivity of graphs with given maximum degree. The time complexity of the algorithm is independent of the number of vertices and edges of graphs. A graph \(G\) with \(n\) vertices and maximum degree at most \(d\) is called \(\epsilon\)-far from \(k\)-vertex-connectivity when at least \(\frac{\epsilon dn}{2}\) edges must be added to or removed from \(G\) to obtain a \(k\)-vertex-connected graph with maximum degree at most \(d\). The algorithm always accepts every graph that is \(k\)-vertex-connected and rejects every graph that is \(\epsilon\)-far from \(k\)-vertex-connectivity with a probability of at least \(2/3\). The algorithm runs in \(O\left(d \left(\frac{c}{\epsilon d}\right)^k \log \frac{1}{\epsilon d}\right)\) time (\(c > 1\) is a constant) for given \((k − 1)\)-vertex-connected graphs, and \(O\left( d \left(\frac{ck}{\epsilon d}\right)^k \log \frac{k}{\epsilon d}\right)\) time (\(c > 1\) is a constant) for given general graphs. It is the first constant-time \(k\)-vertex-connectivity testing algorithm for general \(k \geq 4\).

    Sublinear-Time Algorithms,Graphs