KUBIC-NII Joint Seminar on Bioinformatics
organized by
Institute for Chemical Research, Kyoto University and
National Institute of Informatics

======================================================================
TITLE:
"A Bisection-Type Algorithm for Grammar-Based Compression of Ordered and Unordered Trees"

SPEAKER:
Tatsuya Akutsu
======================================================================
ABSTRACT:
Grammar-based compression} is a kind of data compression method, in which a small size grammar is to be found that generates a given string. In this talk, we consider grammar-based compression for tree structured data. For that purpose, we define an elementary ordered tree grammar (EOTG) by extending the context-free grammar, and then present a polynomial time algorithm which approximates the smallest EOTG within a factor of $O(n^{5/6})$, where $n$ is the size of an input rooted ordered tree. We also show that the grammar and algorithm can be modified for unordered trees of bounded degree. We discuss possible applications of the proposed approach to analysis of biological data.


======================================================================
TITLE:
"Understanding the Causes of Genetic Difference in Humans"

SPEAKER:
James Ray Wagner
======================================================================
Gene regulation is guided by a complex interplay of epigenetic signals (such as DNA methylation and chromatin states) and sequence specific transcription factors, which reside in the vicinity (in cis) of the gene and may differ between parental alleles. At the individual level one parental allele may be silenced and at the population level certain sequences may drive expression more efficiently, these can be identified as epigenetic and heritable variation in cis-regulation. While such differences can be detected for individual genes by various methods or indirectly in larger scale by expression profiling, until recently there have been no approaches to detect such changes specifically and comprehensively across human genes. In this presentation I will introduce recently developed methods and their associated challenges of profiling gene regulation including datasets examining allelic imbalance in human cell lines.


======================================================================
TITLE:
"Measuring the Similarity of Protein Structures and Biological Networks using Compression Algorithms"

SPEAKER:
Morihiro Hayashida
======================================================================
ABSTRACT:
Developing algorithms for comparing various kinds of biological data is one of the important topics in bioinformatics and systems biology. Compression algorithms can be used for measuring similarities because the similarity of two objects can be estimated from Kolmogorov complexity between them. In this talk, we propose two compression algorithms for comparing protein structures and biological networks. For protein structures, we use image compression algorithms because distance matrices between C-alpha atoms are considered as images. For biological networks, we use graph-based compression algorithms. Finally, we show some results for some proteins and metabolic networks.


======================================================================
TITLE:
"Identifying Necessary Reactions in Metabolic Pathways by Minimal Model Generation"

AUTHORS:
Takehide Soh* and Katsumi Inoue
======================================================================
ABSTRACT:
In systems biology, identifying vital functions like glycolysis from a given metabolic pathway is important to understand living organisms. In this paper, we particularly focus on the problem of finding minimal sub-pathways producing target metabolites from source metabolites. We represent laws of biochemical reactions in propositional formulas and use a minimal model generator based on a state-of-the-art SAT solver. An advantage of our method is that it can treat reversible reactions represented in cycles. Moreover recent advances of SAT technologies enables us to obtain solutions for large pathways. We have applied our method to a whole Escherichia coli metabolic pathway. As a result, we found 5 sets of reactions including the conventional glycolysis sub-pathway described in a biological database EcoCyc.


======================================================================
TITLE:
"RactIP: Fast and Accurate Prediction of RNA-RNA Interaction using Integer Programming"

SPEAKER:
Yuki Kato
======================================================================
ABSTRACT:
Considerable attention has been focused on predicting RNA-RNA interaction since it is a key to identifying possible targets of noncoding small RNAs that regulate gene expression post-transcriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a trade-off between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited. In this talk, we present RactIP, a fast and accurate prediction method for RNA-RNA interaction of general type based on integer programming (IP). RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the IP objective function using posterior internal and external base paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA-RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures.


======================================================================
TITLE:
"On Improving the Classification Accuracy of Machine Learning Methods on Gene Expression Data"

SPEAKER:
Matej Holec
======================================================================
ABSTRACT:
Gene expression analysis of microarray data is daily problem of biologists. State-of-the-art approaches consist in set enrichment method based on testing of simple statistical hypotheses like different expression between classes of samples. Due nature of the data isn't suitable to use generic machine learning methods able naturally to incorporate background knowledge and automatically generate and test more sophisticated hypotheses. We propose an improving predictive accuracy of the machine-learning algorithms by simplifying the data by exploiting similarities among samples in gene sets (e.g. metabolic pathway). Furthermore we suggest possible explanation of the result by using logical abduction and incorporating the background knowledge.


======================================================================
TITLE:
"Integer Programming-based Method for Completing Signaling Pathways and its Application to Analysis of Colorectal Cancer"

SPEAKER:
Takeyuki Tamura
======================================================================
ABSTRACT:
Signaling pathways are often represented by networks where each node corresponds to a protein and each edge corresponds to a relationship between nodes such as activation, inhibition, binding etc. However, such signaling pathways in a cell may be affected by genetic and epigenetic alteration. Some edges may be deleted and some edges may be newly added. The current knowledge about known signaling pathways is available on some public databases, but most of the signaling pathways including changes upon the cell state alterations remain largely unknown. In this paper, we develop an integer programming-based method for inferring such changes by using gene expression data. We test our method on its ability to reconstruct the pathway of colorectal cancer in the KEGG database.


======================================================================
TITLE:
"Reasoning about Signaling Networks by Meta-level Abduction"

SPEAKER:
Katsumi Inoue
======================================================================
ABSTRACT:
Meta-level abduction has been proposed to discover missing links and unknown nodes from incomplete network data to account for observations. In this work, we extend applicability of meta-leve abduction to deal with networks containing both positive and negative causal effects. Such networks appear in many biological domains, where inhibitory effects are important in signaling and metabolic pathways. We show that meta-level abduction can consistently produce both positive and negative causal relations as well as invented nodes. As a case study, we show an application of meta-level abduction to a p53 signal network by abducing causal rules that explain how a tumor suppressor works.


Contact

Takehide Soh: soh at nii.ac.jp

Back
Webmaster: Takehide Soh (E-mail soh at nii.ac.jp)