tge readme

TGE: subTree/subGraph/connected components Enumeration algorithm

10/Jan/2005 Takeaki Uno
- connected component is a vertex subset such that its induced subgraph is connected

This program is available for only academic use, basically. Anyone can modify this program, but he/she has to write down the change of the modification on the top of the source code. Neither contact nor appointment to Takeaki Uno is needed. If one wants to re-distribute this code, do not forget to refer the newest code, and show the link to homepage of Takeaki Uno, to notify the news about codes for the users. For the commercial use, please make a contact to Takeaki Uno.

Usage
InputFileFormat
ransformingothergraphfileformat
Algorithm
Performance
Reference

Usage

The program is written in C code (gcc). It uses only the basic library, so you can compile it in any environment. To compile the program, first put all the source files in a directly, and just execute

   % make

Then, you can execute "tge". The format of the input parameter is,

tge command [options] input-filename [root-vertex] [output-filename]

"tge" is the execution file name of tge, and command is the letters composed of letters each of which specifies some function. Letters are some of [tTcCgGdDqnblL]. Input file name is the name of the file representing the input graph. The file format is described below. Note that the input file name cannot start from '-', otherwise it will be regarded as specifying option. [root-vertex] is the vertex which has to be specified when we enumerate rooted tree, etc. The output-filename is the name of the file to be output. If the output file name is "-", the solutions will be output to the standard output.

The functions of the commands are the followings. The command is composed of these letters.

   t: enumerate (directed) subtrees
   T: enumerate subtrees including [root vertex]
     for directed graph, enumerate directed trees rooted at [root vertex]
   c: enumerate connected components
   C: enumerate connected components including [root vertex]
   g: enumerate subgraphs
   G: enumerate subgraphs including [root vertex]
   d: enumerate directed graph
   D: enumerate directed graph of the reverse directions of all the arcs
   q: do not print #solutions of each size, to the standard output
   V: show progress of the computation

Between command and input-filename, we can give some options which can give some constraints.

   -l [num]:output trees/graphs with at least [num] vertices
   -u [num]:output trees/graphs with at most [num] vertices
   -S [num]:stop after outputting [num] solutions
   -, [char]: give the separator of the numbers in the output
     the numbers in the output file are separated by the given
     character [char].
   -Q [filename]: replace the output numbers
     according to the permutation table written in the file of
     [filename], replace the numbers in the output. The numbers in the
     file can be separated by any non-numeric character such as newline
     character.

   EXAMPLE of execution)
   % tge Tn g.grh 0
   % tge -l 2 -u 5 cqdL H20.grh 20 resultfile

Input File Format

Vertex set have to be 0,...,n. Multiple edges and self-loops are allowed. The k-th line of the file is the list of vertices adjacent to vertex (k-1), separated non-numerical characters, such as ",", " ", ":". For edge (i,j), do not write both "i" in (j+1)th row and "j" in (i+1)th row, otherwise, two edges (i,j) and (j,i) are generated. A line with no number would occur. the newline-code is not necessary at the end of file.

EXAMPLE)
vertex set = {0,1,2,3,4}
Edge set = {(0,1), (0,3), (0,4), (1,2), (1,4), (2,3), (3,4)} ([EOF] is the end-of-file)

========
1,3,4
2
3

2,3
[EOF]
==========

=== output file format ===

In the case of counting the solutions, each line of the output corresponds to the number of trees/subgraphs/components of each size. For example, if we enumerate the number of trees, and the output is as follows,

0
0
1
5
3

then, it means that

0 <= #trees of 0 edges
0 <= #trees of 1 edges
1 <= #trees of 2 edges
5 <= #trees of 3 edges
3 <= #trees of 4 edges.

In the case of enumerating trees/subgraphs, each line of the output shows the edges in each path/cycle, separated by space " ". By giving "-," option, we can change the separator. For example, if we enumerate trees and the output is as follows,

(0,1) (3,4), (1,4)
(0,2) (0,3)

it means that there is a tree of edges (0,1) (3,4), (1,4), and tree of edges (0,2) (0,3).

In the case of enumerating connected components, each line of the output shows the vertices of each component, separated by space " ". For example, if the output is as follows,

0 1 4 5 2
0 3 2

it means that there is a connected component of vertices 0,1,4,5,2, and that of vertices 0,3,2.

ransforming other graph file format

For the use of other formats for input graph files, we have several scripts. We explain the functions of the Perl scripts in the following.

- compgrh.pl [b] [separator] < input-file > output-file
Write to output-file the complement graph of the graph read from the input-file. If you specify "b" option, then the input-file is regarded as a bipartite graph.

Ex.)
% compgrh.pl b "," < test.grh > test2.grh

- transnum.pl table-file [separator] < input-file > output-file
Read file from standard input, and give a unique number to each name written by general strings (such as ABC, ttt), and transform every string name to a number, and output it to standard output. The mapping from string names to numbers is output to table-file. The default character for the separator of the string names is " "(space). It can be changed by giving a character for the option [separator]. For example, A,B is a string name, if the separator is space, but if we set the separator to ",", it is regarded as two names A and B. This is executed by "transnum.pl table-file "," < input-file...".

- untransnum.pl table-file < input-file > output-file
According to the table-file output by transnum.pl, un-transform numbers to string names. The output of the program is composed of numbers, thus it is used if we want to transform to the original string names. It reads file from standard output, and output to the standard output.

- transgrh.pl [Bb] [separator] < input-file > output-file
Transform a file in the format of that every line writes two end vertices of an edge, to the format of this program. For example, the file

0,2
0,1
1,4
3,4
1,3
[EOF]

, representing the graph with edge (0,2), edge (0,1) and..., is transformed to

1,2
3,4

4
[EOF]

If the name of vertices are written in general strings, transform them to numbers by transnum.pl before the execution.

When we give parameter D or d, the input graph is regarded as a directed graph. If we give d, the first number is the origin of a directed edge, and if we give D, the second number is the origin.

Bipartite graph can be transformed by giving options b or B. Suppose that we have a bipartite graph with vertex sets A and B both indices start from 0. If we transform it by the above way, vertex i in A and i in B are considered to be the same vertex. By giving b, the second number in each row is added by a constant so that no two vertices have the same index. In the case of B, the first number is added.

When we transform the above graph by ( transgrh.pl b "," < input > output ), we have a graph in the format

5,6
7,8

8
[EOF]

- sortgrhid
Transform a graph file format such that each row corresponds to a vertex. The first number of each row is the ID of the vertex, and the following numbers are the vertex ID's to which the vertex adjacent. The script sorts the row according to the ID's, and remove the ID from the file. For example, the file,

4,2,3
1,2
0,1,3,4
3
2,3
[EOF]

is transformed to

1,3,4
2
3

2,3
[EOF]

If the ID's are given in general strings, use transnum.pl before the execution.

- example of the usage: when transform the file test.grh which is edge list
format with general string vertex names with separator ",", and enumerate chordless cycles:

transnum.pl table "," < test.grh > tmp.grh transgrh.pl < tmp.grh > tmp2.grh
tge C tmp2.grh > out
untransnum.pl table < out > out2

Algorithm

The algorithm works for both directed/undirected case. Simply dealing the edges as directed edges (arcs), we can enumerate directed trees.

For a graph G and its subtree T, consider the problem of enumerating all subtrees including T. Let G-v be the graph obtained by removing v and all edges incident to v from G, and G-(v,u) be the graph obtained by removing edge (u,v) from G. For an edge (u,v) such that u is in T and v is not in T, The subtrees including T are classified into two groups; those including T and (u,v), and those not including (u,v). The subtrees in the first group can be enumerated by enumerating subtrees including T\cup (u,v), and the subtrees in the second group can be enumerated by enumerating subtrees including T in G-(u,v). Both can be done by recursively solving the same problem with a small graph, or a larger subtree to be included in, thus we have the following pseudo code.

   Cypath (G=(V,E), T)
   (u,v) := an edge such that u is in T and v is not in T
   If such (u,v) does not exist then return
   output T\cup (u,v); call Cypath (G, T\cup (u,v))
   call Cypath (G-(u,v), T)

We can enumerate subgraphs in the same way.

When we enumerate subtrees with at least size k, if the size of connected component including T in G-(u,v) is less than k-|T|, there exists no subtree not including (u,v) with size at least k. In this case we execute recursive call only for those including (u,v).

For the enumeration of connected components, we maintain the vertex set of a connected component instead of subtree T. In each iteration, we choose a vertex not in S and adjacent to at least one vertex in S, and generate two recursive calls for enumerating connected components including S and v, and enumerating connected components including S in G-v.

Performance

This algorithm is polynomial delay, taking at most O(D) time for each tree/subgraph/component, where D is the maximum degree in G (theoretical fastest is O(1) for each). Practical computation time is constant for each tree/subgraph/component, and finds up to 1,000,000 trees/subgraphs/components in a second, on average. If the minimum length is large, and the input graph is sparse and low connectivity, then sometimes the computation time increases.

Reference

R. C. Read and R. E. Tarjan, "Bounds on Backtrack Algorithms for Listing Cycles, Paths, and Spanning Trees", Networks 5, 237-252 (1975). (enumeration algorithm for spanning trees)

H. N. Kapoor and H. Ramesh, "Algorithms for Generating All Spanning Trees of Undirected, Directed and Weighted Graphs", Lecture Notes in Computer Science 519, Springer-Verlag, pp.461-472 (1992).
(enumeration algorithms for spanning trees and directed spanning trees)

Akiyoshi Shioura, Akihisa Tamura, Takeaki Uno, "An Optimal Algorithm for Scanning all Spanning Trees of Undirected Graphs," SIAM Journal on Computing, Vol.26, pp.678-692, 1997.
(enumeration algorithm for spanning trees)

Takeaki Uno, "An Algorithm for Enumerating All Directed Spanning Trees in a Directed Graph," Lecture Note in Computer Science 1178, Springer-Verlag, Algorithms and Computation, (Proceeding of ISAAC96), pp. 166-173, 1996. (enumeration algorithm for directed spanning trees)

Takeaki Uno, "A New Approach for Speeding Up Enumeration Algorithms," Lecture Note in Computer Science 1533, Springer-Verlag, Algorithms and Computation (Proceeding of ISAAC98), pp. 287-296, 1998. (enumeration algorithm for directed spanning trees)