Some Research Topics for Intern Students

Zhenjiang Hu welcomes excellent students to do short-term intern research at NII on the following specific topics:

Systematic Parallel Programming using Hadoop

Hadoop is a software framework that supports data-intensive distributed applications, which enables applications to work with thousands of nodes and petabytes of data.

One key idea of Hadoop is MapReduce, the de facto standard for large scale data-intensive applications. It can be seen as a skeletal parallel programming model with two skeletons: map and reduce. By making use of the skeletal parallelism, users can program their applications with ready-made skeletons without considering low level details of parallelization, data distribution, load balancing and fault tolerance.

Despite the simplicity of MapReduce, it is still a challenge for a programmer to systematically solve his or her (nontrivial) problems. Consider the maximum prefix sum problem of a sequence. For instance, if the input sequence is

3, -1, 4, 1, -5, 9, 2, -6, 5

the maximum of the prefix sums should be 13 to which the underlined prefix corresponds. It is not obvious how to solve this problem efficiently with MapReduce (and we encourage the reader to pause to think how to solve this). Such problems widely exist in the real world, e.g, financial time series analysis.

We have proposed two systematic approach to solving problems with MapReduce. One idea is to ask the users to write two efficient sequential programs, one for solving the problem itself and the other for solving the inverse problem, from which an efficient MapReduce program can be automatically generated. The related work can be found in the following paper:

The other approach is a new framework to synthesize efficient MapReduce parallel program for a general class of problems which can be specified in the following generate-test-and-aggregate form:

aggregate . test . generate

Problems that match this specification can be solved by first generating possible solution candidates, then keeping those candidates that pass a test of a certain condition, and finally selecting a valid solution or making a summary of valid solutions with an aggregating computation. We have shown that this framework can be efficiently implemented on Hadoop as a Java library. Users of our library can concentrate on specifying naive generate-test-and-aggregate algorithms in Java, and obtain efficient MapReduce programs for free.

The purpose of this intern research is to evaluate and extend the above approaches through more practical examples.


Tree Computation with MapReduce

The purpose of this intern research follows similar motivation as the above on
systematic parallel programming on Hadoop. It would be interesting to see how computation on trees can be systematically mapped to MapReduce. As the first step, we would like to see how tree skeletons can be efficiently implemented with MapReduce. One related paper is:


Co-Evolution of Models and Codes using Bidirectional Transformation

Bidirectional transformations provide a novel mechanism for synchronizing and maintaining the consistency of information between input and output. They consist of a pair of well-behaved transformations: forward transformation is used to produce a target view from a source, while the backward transformation is used to reflect modification on the view to the source. This pair of forward and backward transformations should satisfy certain bidirectional properties. Bidirectional transformations are indeed pervasive and can be seen in many interesting applications, including the synchronization of replicated data in different formats, presentation-oriented structured document development, interactive user interface design, coupled software transformation, and the well-known view updating mechanism which has been intensively studied in the database community.

The purpose of this intern research is to investigate how to use the bidirectional graph (model) transformation system GRoundTram for co-evolution (synchronization) of models and codes in model-driven software development.


A Generic GUI for Supporting Bidirectional Model-driven Software Development

As discussed
above, we have develop a bidirectional graph transformation system GRoundTram with a simple GUI, which can deal with edge-labelled graphs in the dot format.

The purpose of this intern research is to develop a generic GUI (in Java) upon the current GRoundTram system, which can deal with various kinds of graphs such as UML, feature models, control flow graphs, etc. This needs us to express relationship between the edge-labelled graphs and other graphs. For instance, the relationship between the edge-labelled graphs and the UML graphs is discussed in the following paper:


Maintained by Zhenjiang Hu. Last modified on 2011.