cypath readme

CYPATH: st path, cycle, chordless st path, chordless cycle enumeration

3/Aug/2005 Takeaki Uno
homepage: http://research.nii.ac.jp/~uno/index.html

This program is available for only academic use, basically. Anyone can modify this program, but he/she has to write down the change of the modification on the top of the source code. Neither contact nor appointment to Takeaki Uno is needed. If one wants to re-distribute this code, do not forget to refer the newest code, and show the link to homepage of Takeaki Uno, to notify the news about the code for the users. For the commercial use, please make a contact to Takeaki Uno.

ProblemDefinition
Usage
InputFileFormat
ransformingothergraphfileformat
Performance
Algorithm
References

Problem Definition

For a give graph G=(V,E), a path is a sequence of vertices without duplications such that any kth vertex and k+1th vertex in the sequence is connected by an edge. The first/last vertices of the path are called starting/ending vertices. If the graph is directed, and there is an arc from kth vertex and k+1th vertex in the sequence for any k, the path is called directed path. If the starting vertex and the ending vertex are identical (same), the path/directed path is called cycle/directed cycle. An edge is a chord (shortcut) of a path/cycle if the edge connecting the
vertices in the path/cycle which are not consecutive in the sequence.
In the case of directed paths, an arc from kth vertex to lth vertex is
not a chord if k>l.
A (directed) path/cycle is called chordless if it has no chord.

The problem this program deals with is to enumerate all the
(chordless) (directed) paths/cycles of the given (directed) graph.

Usage

cypath command input-filename [starting vertex s] [ending vertex t] [max length] [output-file]

"cypath" is the execution file name of tge, and command is the letters composed of letters each of which specifies some function. Letters are some of [cpCPdDqnl]. Input file name is the name of the file representing the input graph. The file format is described below. Note that the input file name cannot start from '-', otherwise it will be regarded as specifying option. [root-vertex] is the vertex which has to be specified when we enumerate rooted tree, etc. The output-filename is the name of the file to be output. If the output file name is "-", the solutions will be output to the standard output.

The functions of the commands are the followings. The command is composed of these letters.

   c: enumerate cycles
   C: enumerate chordless cycles
   p: enumerate paths from [vertex s] to [vertex t]
   P: enumerate chordless paths from [vertex s] to [vertex t]
   d: inputs a directed graph, and enumerate directed paths/cycles
   D: inputs a directed graph, and reverse the direction of each arc
   q: do not output the number of solutions of each size to standard output
   V: show progress of the computation

Between command and input-filename, we can give some options which can give some constraints.

   -l [num]:output trees/graphs with at least [num] vertices
   -u [num]:output trees/graphs with at most [num] vertices
   -r [num]:output cycles including vertex [num]\n\
   -S [num]:stop after outputting [num] solutions
   -, [char]: give the separator of the numbers in the output
     the numbers in the output file are separated by the given
     character [char].
   -Q [filename]: replace the output numbers
     according to the permutation table written in the file of
     [filename], replace the numbers in the output. The numbers in the
     file can be separated by any non-numeric character such as newline
     character.

EXAMPLE of execution)
% cypath Pn testg.grh 0 9
% cypath Cqd -l 2 -u 5 testgg.grh resultfile

The enumerated path/cycles are output to the standard output
such that each line is the sequence of vertices of the path/cycle

Input File Format

Vertex set has to be {0,...,n}.
Multiple edges and self-loops are allowed
   (but self-loops never appear in paths/cycles).
The k-th line of the file is the list of vertices adjacent to vertex (k-1),
     separated non-numerical characters, such as ",", " ", ":".
For edge (i,j), do not write both "i" in (j+1)th row and "j" in (i+1)th row,
     otherwise, two edges (i,j) and (j,i) are generated.
A line with no number would occur.
the newline-code is not necessary at the end of file. In the case of directed graph, write on the kth line vertices adjacent to
   the vertex k-1 by "out-going" arcs.

EXAMPLE)
vertex set = {0,1,2,3,4}
edge set = {(0,1), (0,3), (0,4), (1,2), (2,4), (2,3), (3,4)}

([EOF] is the end-of-file)

========
1,3,4
2
3

2,3
[EOF]
==========

This file can be recognized as a directed graph with
arc set = {(0,1), (0,3), (0,4), (1,2), (2,3), (4,2), (4,3)}

=== output file format ===
In the case of counting the solutions, each line of the output corresponds to the number of paths/cycles of each length. For example, if we count the number of paths, and the output is as follows,

0
0
1
5
3

then, it means that

0 <= #paths of 0 vertices
0 <= #paths of 1 vertices
1 <= #paths of 2 vertices
5 <= #paths of 3 vertices
3 <= #paths of 4 vertices.

In the case of enumerating paths/cycles, each line of the output shows the vertex sequence of each path/cycle, separated by space " ". By giving "-," option we can change the separator. For example, if we enumerate paths and the output is as follows,

0 1 4 5 2
0 3 2

it means that there is a path of vertex sequence 0,1,4,5,2, and that of vertex sequence 0,3,2.

ransforming other graph file format

For the use of other formats for input graph files, we have several scripts. We explain the functions of the Perl scripts in the following.

- compgrh.pl [b] [separator] < input-file > output-file
Write to output-file the complement graph of the graph read from the input-file. If you specify "b" option, then the input-file is regarded as a bipartite graph.

Ex.)
% compgrh.pl b "," < testg.grh > testg2.grh

- transnum.pl table-file [separator] < input-file > output-file
Read file from standard input, and give a unique number to each name written by general strings (such as ABC, ttt), and transform every string name to a number, and output it to standard output. The mapping from string names to numbers is output to table-file. The default character for the separator of the string names is " "(space). It can be changed by giving a character for the option [separator]. For example, A,B is a string name, if the separator is space, but if we set the separator to ",", it is regarded as two names A and B. This is executed by "transnum.pl table-file "," < input-file...".

- untransnum.pl table-file < input-file > output-file
According to the table-file output by transnum.pl, un-transform numbers to string names. The output of the program is composed of numbers, thus it is used if we want to transform to the original string names. It reads file from standard output, and output to the standard output.

- transgrh.pl [Bb] [separator] < input-file > output-file
Transform a file in the format of that every line writes two end vertices of an edge, to the format of this program. For example, the file

0,2
0,1
1,4
3,4
1,3
[EOF]

, representing the graph with edge (0,2), edge (0,1) and..., is transformed to

1,2
3,4

4
[EOF]

If the name of vertices are written in general strings, transform them to numbers by transnum.pl before the execution.
When we give parameter D or d, the input graph is regarded as a directed
graph. If we give d, the first number is the origin of a directed edge, and if we give D, the second number is the origin.
Bipartite graph can be transformed by giving options b or B. Suppose that
we have a bipartite graph with vertex sets A and B both indices start from 0. If we transform it by the above way, vertex i in A and i in B are considered to be the same vertex. By giving b, the second number in each row is added by a constant so that no two vertices have the same index. In the case of B, the first number is added.
When we transform the above graph by ( transgrh.pl b "," < input > output ), we have a graph in the format

5,6
7,8

8
[EOF]

- sortgrhid
Transform a graph file format such that each row corresponds to a vertex. The first number of each row is the ID of the vertex, and the following numbers are the vertex ID's to which the vertex adjacent. The script sorts the row according to the ID's, and remove the ID from the file. For example, the file,

4,2,3
1,2
0,1,3,4
3
2,3
[EOF]

is transformed to

1,3,4
2
3

2,3
[EOF]

If the ID's are given in general strings, use transnum.pl before the execution.

- example of the usage: when transform the file testg.grh which is edge list

format with general string vertex names with separator ",", and enumerate
chordless cycles:

transnum.pl table "," < test.grh > tmp.grh transgrh.pl < tmp.grh > tmp2.grh
cypath C tmp2.grh > out
untransnum.pl table < out > out2

Performance

This algorithm is polynomial delay, taking at most O(|V||E|) time for each path/cycle, where |V| and |E| are the number of vertices, and the number of edges. (the current theoretical fastest one is O(|V|) for each) Practical computation time is constant for each path/cycle, and finds 100,000 paths/cycles in a second, on average. If the maximum length is small, the computation time increases.

Algorithm

=== Enumeration of Paths ===
For a graph G and its vertex v, let G-v be the graph obtained by removing
   v and all edges incident to v from G, and G-(v,u) be the graph obtained by
removing edge (u,v) from G. Paths from vertex s to vertex t are classified into two groups, those including an edge (s,h) and those not. The former paths are the union of s and paths connecting h and t in G-v, and the latter paths are paths from s to t in G-(s,h). Hence, recursively, we can enumerate the paths recursively. If there is no path from h to t (respectively, from s to t in G-(s,h)), then there is no need to
   enumerate recursively. This can be checked by breadth-first or depth-first
   search from starting at t.
We have the following pseudo code, which enumerate the path from s to t in G
   by calling EnumPath (G, s, t, a s-t path, \emptyset).

EnumPath (G=(V,E), s, t, P, I)
   If s = t then output I\cup {s} ; return
   h := the next vertex to s in P
   Breadth first search starting from t in G-(s,h)
   If a path Q from s to t exists then call EnumPath (G-(s,h), s, t, Q, I)
   call EnumPath (G-s, h, t, P, I\cup {s})

=== Enumeration of Cycles ===
Let (t,s) be an edge of G. Then, the cycles in G are classified into two groups: those including (t,s) in the sequence, and those not. A cycle including (t,s) is the union of (t,s) and a s-t path. A cycle not including (t,s) is a cycle in G-(t,s), where G-(t,s) is a graph obtained by removing (t,s) from G. Hence, we can enumerate cycles by enumerating s-t paths.

EnumCycle (G=(V,E))
     For each edge (t,s)
     Remove (t,s) from G
     Call EnumPath (G,s,t,\emptyset)
     End for

=== Enumeration of Directed Paths/Cycles ===
The basic idea is almost the same. Simply dealing the edges as directed edges
(arcs), we can enumerate directed paths/cycles.

=== Enumeration of Chordless Paths ===
Let G-(v+h) be the graph obtained by removing v and all the adjacent vertices to v except for h, and all edges incident to these vertices. A chordless path including an edge (s,h) satisfies that its subpath from h to t is a chordless path in G-(s+h). A chordless path not including (s,h) is a chordless path from s to t in G-h.

EnumChordlessPath (G=(V,E), s, t, P, I)
   If s = t then output I\cup {s} ; return
   h := the next vertex to s in P
   Breadth first search starting from t in G-h
   If a path Q from s to t exists then call EnumPath (G-h, s, t, Q, I)
   call EnumPath (G-(s+h), v, t, P, I\cup {s})

Chordless cycles can be enumerated in the same way as simple cycles.

=== Time Complexity ===
The time complexity of these algorithms are O(|V|(|E|+|V|)) for each path/cycle since one iteration takes up to O(|V|+|E|) time, and the depth of the recursion is O(|V|). However, by a simple modification, we can reduce the time complexity. By shrinking the second recursive call of EnumPath, we have the
following algorithm.

EnumPath (G=(V,E), s, t, P, I)
     Output P
     Let P = {v1=s,...,vk=t}
     Remove from G all vertices {v1,...,vk-1}, I := I\cup {v1,...,vk}
     For i=k-1 to 1
     Recover (un-delete) vertex vi in G; I:= I\setminus {vi}
     remove (vi,vi+1) from G
     Breadth first search starting from t in G
     If a path Q from vi to t exists then call EnumPath (G, vi, t, Q, I)
     End for

This algorithm performs breadth first search k-1 times, but we can re-use the result of the previous search. Suppose that we have marked the vertices which have paths to t by breadth first search in the loop of i=j. Then, in the loop of i=j-1, we do not have to search the marked vertices again. Thus, total computation time of the breadth first search in this loop is
   O(|V|+|E|), and we can reduce the time complexity.

References

R. C. Read and R. E. Tarjan, "Bounds on Backtrack Algorithms for Listing Cycles, Paths, and Spanning Trees", Networks 5, 237-252 (1975).
   (enumeration algorithms for s-t paths and cycles)

Takeaki UNO, "An Output Linear Time Algorithm for Enumerating Chordless Cycles", 92th SIGAL of Information Processing Society Japan, 47-53, 2003 (in JAPANESE)
   (enumeration algorithms for chordless s-t paths and chordless cycles)